Friday, September 3. 2010
$2.07 billion for 3PAR - a company just working in the storage area - ... now i'm sure that $7.4 billion for Sun was a steal.
Thursday, September 2. 2010
I sat a on my fingers for all the weekend but now i can't take all this comments in the blogs and the mailing lists any longer. Ten years ago, i would have gone ballistic long ago... but the fuse got longer of the past few years. However i'm still able to detonate, however despite the undirected explosion 10 years ago, it's more directed today.
And this article is such a directed explosion. I worked a while on this article and this was one of the reasons why it was relatively silent on this blog the last few days. At first i didn't wanted to publish it, but some events of today led me to think otherwise. I don't know if this is a wise move, because the torch of this rant will burn down some beards. On the other side, i think the article is worth the publication, but you have to judge about that.
However i try to prevent me from exploding thus i used this weekend to play with my new toy (i purchased an iPad last week and it's really fscking cool, i just thought "Like Enterprise ... i just wouldn't gave them away like they do in TNG" ) and worked a lot on the planing on the modifications on my newly purchased building.
However the mail by Garrett d'Amore asking about the removal of SVM was a drop. It was one of those drops, that led to a severe spill-over. So this blog article will be a long rant. If you don't like rants, just skip this.
And keep in mind, it's a rant ... it's not meant to be fair or objective ... think of it as a way to write down my frustration about a lot of events in the last few weeks and especially in the last few days.
Continue reading "A really long rant ..."
Wednesday, September 1. 2010
Just in case you wonder about the relative silence in this blog ... i wrote a longer article in the last few days circling about Illumos, OpenSolaris, the Java lawsuit, some people in the Solaris community, an article about the live, the universe and all the rest ... it was a large big rant. I've decided not to put it on the blog ... a lot of work in the evenings for the bucket ... but believe me ... it's better that way
Tuesday, August 31. 2010
Last week i've reported about the point that the X-Force numbers regarding unpatched disclosures could be sorted in a different way to yield a completely different view on the data. More interesting is a recent development: After reassessing the data, many of the vulnerabilities had to be sorted into different categories. So the numbers were fundamentally incorrect as well.
The list changed a lot due to this changes: Sun went from 9% high+critical to 0%. IBM leads the pack with 29% unpatched high+critical vulnerabilities without patches. However 22% for Oracle doesn't look that good as well. You will find the updated list in the blog entry " Mid-Year 2010 X-Force Trend and Risk Report - Update to Unpatched Vulnerabilities Chart".
Friday, August 27. 2010
At the moment you read a lot about this X-Force report and Sun is said to keep more vulnerabilities unpatched. But More interesting than the number of unpatched number of patches is the number of "Percentage of Critical and High 2010 H1 Disclosures with no patch" on page 20 in this report.
1. Google: 33%
2. IBM: 29% (the owner of X-Force)
3. Oracle: 22%
4. Linux: 20%
5. Microsoft: 11%
6. Novell: 10%
7. Sun: 8%
Friday, August 27. 2010
Chris Wong writes in "Why Java needs Oracle": Java itself succeeded because of Sun's corporate backing. Today, Java still needs a sponsor, and that appears to be Oracle. It was either Oracle or IBM: two Old Ones who are very much invested in Java's success. Both vied to acquire Sun, and the decision was made for us. For better or for worse, Oracle is now Java's champion and protector. The legal landscape is too dangerous out there for a major platform to be without one.
Friday, August 27. 2010
Some code is so old that it predates the "invention" (  ) of free software by the FSF. Such an example is the RPC code provided by Sun to the world under a permissive license in 1984 and it was use in many implementations like the one used in Linux to provide NFS services. This was possible due to the licensing Sun choose at that time: Sun RPC is a product of Sun Microsystems, Inc. and is provided for
unrestricted use provided that this legend is included on all tape
media and as a part of the software program in whole or part. Users
may copy or modify Sun RPC without charge, but are not authorized
to license or distribute it to anyone else except as part of a product or
program developed by the user.
However it isn't technically free software, as it was freeded before free software was formally defined ... and out of some strange reasons, people found the license now free enough. In some Linux distributions this situation was considered as a serious bug (I would file a bug against something different in regard of this, but that's a different story  ) However due to whatever reasons, this particular issue wasn't resolved for years ...
As Tom Callaway wrote in a recent blog entry, this situation has been resolved now: So, we restarted the effort with Oracle, and on August 18, 2010, Wim Coekaerts, on behalf of Oracle America, gave permission for the remaining files that we knew about under the Sun RPC license (netkit-rusers, krb5, and glibc) to be relicensed under the 3 clause BSD license.
Wednesday, August 25. 2010
Die Videoaufzeichnung von meinem Vortrag auf der Froscon ist nunmehr online: "Was treibt eigentlich mein Unix?". Man verzeihe die Ähs und den dicken Typen da vorne. Die Tonspur ist leider nur auf deutsch erhältlich.
Update: Der Link ist nunmehr repariert ...
Monday, August 23. 2010
The OGB has pulled the trigger today. The members resigned today as reported by the last meeting minutes ever of the OGB . After the the developments of the last few weeks this was just a formal, but inevitable step. Will it change something? Don't think so ...
Monday, August 23. 2010
Sunday, August 22. 2010
Sometimes charts look to perfect to be measured. I had this feeling when i saw the rperf numbers of the 795 and put them into a chart. I'm a very visual person, so i put everything in a chart just to get a feeling for numbers.
At first i thought i was paranoid, but then my colleague Jan Brosowski mailed to me that he had thought the same, albeit he approached the problem from the mathematical point of view. Okay ... that left me with a lot of questions and so i did some quick bullshit-testing math on the datapoints.
Some mathAfter reading his mail i wanted to do some tests on my own. So i did a short test with the numbers. At first i've put the data of the 3,7 Ghz P7 into my favorite statistical programm R.
> procs <- c(24,48,72,96,120,144,168,192)
> rperf <- c(273.51,547.02,820.53,1094.04,1367.55,1641.06,1914.57,2188.08)
> fm <- lm (rperf ~ procs)
> fitted.values(fm)
1 2 3 4 5 6 7 8
273.51 547.02 820.53 1094.04 1367.55 1641.06 1914.57 2188.08
> residuals(fm)
1 2 3 4 5
-7.886829e-14 -1.045378e-14 6.540129e-14 6.414338e-14 3.446377e-14
6 7 8
-2.363755e-14 -2.489546e-14 -2.615336e-14
> coefficients(fm)
(Intercept) procs
1.607775e-13 1.139625e+01
> summary(fm)
Call:
lm(formula = rperf ~ procs)
Residuals:
Min 1Q Median 3Q Max
-7.887e-14 -2.521e-14 -1.705e-14 4.188e-14 6.540e-14
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.608e-13 4.241e-14 3.791e+00 0.00906 **
procs 1.140e+01 3.499e-16 3.257e+16 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.442e-14 on 6 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 1.061e+33 on 1 and 6 DF, p-value: < 2.2e-16
Strange ... the linear model leads to coeffcients able to predict the rperf value per core with minimal residuals. And i learned not to trust data with a R-squared of 1. Okay ... let's check for the 4.0 GHz P7 perf numbers: > procs <- c(32,64,96,128,160,192,224,256)
> rperf <- c(372.27,744.54,1116.81,1489.08,1861.35,2233.62,2605.89,2978.16)
> fm <- lm (rperf ~ procs)
> residuals(fm)
1 2 3 4 5
1.627514e-13 -1.454981e-13 4.364426e-16 -4.677910e-15 -3.821397e-14
6 7 8
-1.490662e-14 -1.052861e-13 1.453949e-13
> fitted.values(fm)
1 2 3 4 5 6 7 8
372.27 744.54 1116.81 1489.08 1861.35 2233.62 2605.89 2978.16
> coefficients(fm)
(Intercept) procs
-3.215549e-13 1.163344e+01
Again ... minimal residuals. Okay ... last check ... for the 4.25 GHz procs. > procs <- c(24,32,48,64,80,96,112,128)
> rperf <- c(347.36,463.14,694.71,926.28,1157.85,1389.42,1620.99,1852.56)
> fm <- lm (rperf ~ procs)
> residuals(fm)
1 2 3 4 5
3.163842e-03 -1.638418e-03 -1.242938e-03 -8.474576e-04 -4.519774e-04
6 7 8
-5.649718e-05 3.389831e-04 7.344633e-04
> fitted.values(fm)
1 2 3 4 5 6 7
347.3568 463.1416 694.7112 926.2808 1157.8505 1389.4201 1620.9897
8
1852.5593
> coefficients(fm)
(Intercept) procs
0.002429379 14.473100282
Sorry ... that's is looking to perfect to me.
When you put the 64-cores LPARs data into the same system, you will see for 4.0 GHz: > procs <- c(128,256)
> rperf <- c(1406.36,2812.72)
> fm <- lm (rperf ~ procs)
> coefficients(fm)
(Intercept) procs
-3.215549e-13 1.098719e+01
And now for the 4.25 GHz P7:
> procs <- c(64,128)
> rperf <- c(777.09,1554.18)
> fm <- lm (rperf ~ procs)
> coefficients(fm)
(Intercept) procs
-1.607775e-13 1.214203e+01
Both times the intercept is 0 (i assume the small intercept is owed to rounding in data point by IBM or by the challenges of floating point arithmetic on computers.
That's totally unreasonable for measured data. When you just assume 99% of the performance for the 256 cores datapoint (thus an practically impossible scaling factor) you would have an intercept in the range of 28.13. > procs <- c(128,256)
> rperf <- c(1406.36,2812.72*0.99)
> fm <- lm (rperf ~ procs)
> coefficients(fm)
(Intercept) procs
28.12720 10.76744
ConclusionAt the moment i don't believe that IBM has really measured all the data it provides in the rperf list. The data fits to perfect in a linear model. The interesting question is: "Which data points were really measured?" All the data provided for the configurations look computed/guessed or something like that and not measured. Even when you want to assume that IBM found a way to the holy grail of linear scaleability, an R-squared of 1 and residuals at 0 are just ridiclious. I would really like to know what data points were really measured.
Wednesday, August 18. 2010
The new "IBM Power Systems Performance Report POWER7, POWER6 and POWER5 results" holds an interesting piece of information. A reoccuring question be colleagues and befriended admins is the impact of LPARs to the performance. It looks like IBM needs the LPARS to get some speed out of their larger systems.
Just a few examples: When using a 795 with 4.25 GHz and 64 cores a configuration with 4 LPARS a 16 cores yield a relative performance of 926.28. The same system with just 1 LPAR with 64 processors yield a relative performance of 777.09. So leaving the scaling to the OS instead of dividing it into 4 small systems gives you just 83.89% of the performance. When using a 795 with 4.25 GHz and 64 cores a configuration with 8 LPARs with 16 cores each yields a rperf value of 1852,56. With 2 LPAR with 64 cores each you get 1554,18. Interestingly is 83,89% again.
At first i thought "16 cores are easily fitting on a processor book (with 4 procs each). A 64 cores LPAR has to use two processor books. So when you use a configuration larger than a processor book you will leave 16,11% on the way". But doing the same calculation with some other data showed otherwise. But the move from 32 to 64 core lpars just reduce the performance by 5,7 percent respectively 5,4 %. 32 cores fit on a processor book, too. Thus the difference should be similar to the 64 to 16 cores situation.
So my interpretation is a little bit different: The scalability of AIX seems to have sweet spot between 16 and 32 cores. I thought a moment about an intra-book bottleneck, but the CPUs on the book are fully meshed (1 hop from each CPU to every other), so i don't think it's a problem.
When you look into this chart (please click into the image for a larger version), you may find some interesting points. The light blue line is a hypothetical perfectly scaling 4.0 GHz Power7 in a 795. The data is based on rPerf number of 103.41 for an 8 core system(source: Page 20 of the document). Please look at the right side of the chart at 256 cores. You end up at 81,9% of the hypothetical performance when you use 64 core LPARs and at 86% of the hypothetical performance when using 32 core LPARS (the difference is interestingly pretty much the same as computed before for 64 cores instead). At 64 cores your load is distributed at 8 processors, thus just 2 processor board. Still there is an serious impact of almost 20%. Will be interesting to further dig into this topic.
However it's important to know, that the operating system is limited to 64 core SMP no matter how many cores are in the system by the LPARs configuration. So this numbers doesn't factor in scalability challenges of AIX above 64 cores as the os has not to scale above this point while generating this rperf numbers. The numbers for the large core number configurations are not single OS image numbers. Further penalties for the OS scaling comes on top. Furthermore this benchmark is a pure CPU/memory benchmark. As IBM explained in their own description of the benchmark, there is no I/O and no networking involved.
That said, a number of really interesting data points are missing in the pdf:
- The rperf number of a fully blown unpartitioned system
- The rperf number on a fully blown system with just one LPAR in the size of the complete system
- Somehow i have the impression, IBM is hiding something. When you look into the mentioned pdf there is SPEC number for an unpartitioned system, but no rperf for it. The existence of the SPEC numbers hints to the point, that they had indeed an OS that was able to scale to 256 cores. On the other side, there is no SPEC number for the partitioned systems, but the rperf numbers. Just call it a presentiment ...
Wednesday, August 18. 2010
Yet another one leaves Oracle: Adam Leventhal.
Wednesday, August 18. 2010
Congratulations to IBM for the new lead in TPC-C. IBM has once again the lead in a benchmark that got meaningless a few years ago. Just before you ask, i wrote the same when Oracle was first. But somehow it looks like IBM wasn't able just to call it a day  So they did a TPC-C benchmark run again and at the end they were able to yield 10,366,254 tpmC.
The result is indeed impressive. However there is are some key difference. The response times are vastly longer in the IBM result.
Response Times (in seconds) | 90th % | Average | Maximum |
| New Order | 2.1 | 1.137 | 24.041 |
| Payment | 2.1 | 1.138 | 21.293 |
| Order-Status | 2.06 | 1.095 | 20.169 |
| Delivery (interactive) | 1.64 | 0.749 | 17.953 |
| Delivery (deferred) | 0.95 | 0.42 | 2.48 |
| Stock-Level | 2.08 | 1.113 | 21.547 |
| Menu | 1.64 | 0.77 | 23.037 |
Now look at the response times at the Oracle result.
Response Time in Seconds | 90 th % | Avg. | Max. |
| New-Order | 0.170 | 0.168 | 5.885 |
| Payment | 0.160 | 0.156 | 5.758 |
| Order-Status | 0.150 | 0.150 | 5.433 |
| Delivery (Interactive) | 0.120 | 0.134 | 3.869 |
| Delivery (Deferred) | 0.040 | 0.021 | 2.839 |
| Stock | 0.210 | 0.182 | 4.796 |
| Menu | 0.120 | 0.136 | 4.474 |
With similar response times, the number of transaction is more in the range of 8,9 million as stated by the diagrams in the full disclosures.
This is the diagram from the IBM result:

Somewhere between 80% and 100% of the final result the reaction time explodes.
This is the matching result of the TPC-C result of Oracle:

There is no equivalent "explosion" in this result....
Furthermore i want to hint you on certain points in the configuration as stated by the full disclosure.
- This configuration has 48*380 GB SAS cards with battery backed write cache, thus 17,8 GiB battery protected write cache in total.
- The configuration used 3*224 SSDs summing up to 672 SSD. I would suspect that they use SSD with Sandforce 1500 (the same 177 GB like they used in the TPC-C that is documented at several places to be generated with Sandforce based SSD). This controller has an interesting capability. It's capable to do compress and some benchmarks have suggested, that the performance of this drive is quite different with compressible and less compressible data. It's would be an interesting point to research how compressible TPC-C data is.
- The SSD use MLC. The Sandforce Controller have some special mode of operation to enable the use of MLC to reach better durability, but this mode is based on compression, too. Interesting questions are: Are the SSD really capable to hold 3 years of TPC-C load due to the usage of MLC and what would be the impact on durability of less compressible data.
- The database is completely on the SSD. Just the database log is on disk. That's similar to the Oracle config
- The configuration of the database on the system is ... well ... interesting. They use a partitioned database and all this partition are bound to certain resource sets. So essentially they splitted the system in several ones, to be exact ... into 32 partitions per systems bound to a resource set each. With this amount it's ensured that all requests are CPU local factoring out the interconnect.
- The configuration doesn't provide any availability protection, but that's okay, because TPC-C doesn't mandate such. Just keep this in mind with the pricing and with the configuration. There are no mechanisms ensure availability and you can't transform it into an available configuration because the storage is direct attached to a single node (the storage is in the i/o drawers). The Oracle configuration is highly available by accident, as it uses shared storage and RAC
Conclusion
I think TPC-C is now really a corporate ego thing. We are in an arms race here. I'm really interested, how Oracle strikes back
Wednesday, August 18. 2010
Interesting entry in Adam Leventhals blog about the rational behind choosing a brand of SSD for the S7000 series.
|
Comments