By Michael Victor < firstname.lastname@example.org >
Recently, Don MacAskill, the CEO of smugmug, wrote a review of Sun’s SunFire ‘CoolThreads’ T1000 Niagra based server. As someone who likes to keep up with interesting new processor designs, I was eager to see the results. Unfortunately, after reading the article, I had to ask, was it even worth doing the review?
In the review, MacAskill says that his company is primarily interested in minimizing the cost, per application performance unit, per watt, or as he says on his blog $/CPU/Watt (a common criteria for web companies, including Google if I recall correctly). The primary question he asks is that in terms of this metric, is the T1000 (CoolThreads) better than the Quad-core Opterons (Olde Faithful, machines with dual dual-core CPUs) they currently use. To spoil the review, the conclusion was an emphatic no. It wasn’t even close. In response, readers clamored for a better comparison, suggesting ways to tune performance, citing Linux’s poor thread scheduling compared to Solaris, and calling for Solaris results for the T1000. Since it was such a blowout in favor of Olde Faithful, the engineer in me has to ask, is there even a point in doing a Solaris test? Was there any point in doing the Linux test in the first place?
CoolThreads just too Expensive?
The real reason that I suspect that CoolThreads is doomed to lose is cited on the last page of the article. MacAskill says:
Olde Faithful costs $3,595. At the “sweet spot”, she uses 157.2 watts of power, which costs us $26.20/month in power and cooling costs, or $314.40 per year. Total cost for 1 year? $3,909.40. … CoolThreads costs $8,395 for a similar model with 4GB … Total cost for 1 year? $8546.52.
In other words, CoolThreads costs more than twice as much as Olde Faithful. So what does this mean? Going back to the original evaluation metric, $/CPU/Watt, let’s say that Olde Faithful consumes wo Watts of power, generates po performance units (however you want to measure them), and costs co dollars. (And you thought algebra was useless.) Then its evaluation metric, call it Mo, is Mo=co*wo/po. Call CoolThread’s metric Wc, and let it consume wc watts of power and deliver po performance units. We know that CoolThreads has a cost of 2*co. Knowing nothing about power consumption (wc) and performance (pc), we still know that for CoolThreads to win (i.e., for Mc < Mo), (2*co*wc/pc) < (co*wo/po ). Cancelling the co from both sides we see that 2*wc/pc < wo/po. After some manipulation we get that pc/wc > 2*po/wo. In other words, Coolthreads has to deliver more than twice as much performance per watt in order to win.
In my experience, this seems really unlikely for real applications. Let’s say that since Olde Faithful has only 4 processors, and CoolThreads has 8 processors, it handily delivers twice the performance. This is a very generous assumption, by the way, since each CoolThreads processor is not as fast as the Opterons in Olde Faithful. If each Niagra core in CoolThreads consumes just as much power as a core in the Opteron, then we also double power consumption, pc/wc remains unchanged, and thus Olde Faithful is better by a factor of 2! In order for CoolThreads to just tie in this scenario, each core would have to deliver the same performance as the Opteron in half the power. This seems very unlikely, since if it were possible and Sun knew how to do it, they could just release a laptop x86 and make a killing (since I also doubt the ISA makes much of a difference, as Intel proved in the RISC/CISC performance battle).
Of course, it is possible that the application smugmug is running is not CPU intensive, and therefore the processor performance is irrelevant, and thus the simplicity of the Niagra cores in Coolthreads could dramatically reduce power and have it come out as a winner. However, with power management technology, it seems unlikely that this would give CoolThreads a 2x advantage. In short, I just don’t see Sun developing an architecture that generates double the performance per watt as AMD. Even though Sun typically has much better I/O than x86 systems, x86 boxen have been improving on this front and I don’t think the gap is close to the needed 100% for most applications. I vaguely recall that the better I/O might get you 30% if you are I/O bound, but I could be wrong on this. To get the 2x with only a 1.3x improvement in I/O performance, you’d need a corresponding .65x drop in power consumption (a 35% reduction) in the I/O subsystem alone, which also seems unlikely.
So what about Solaris? This is more promising, since it is possible to deliver 2x or more performance improvements with little extra power consumption in software. In processor hardware you are lucky to get 3% with a single technique (hence the nickname the 3%-club for people doing processor architecture research). System architecture is a better bet, but software can give really big wins. However, to get 2x, it usually means that the original implemenation had some serious bottlenecks that have gone unfixed. These do exist, and I’ve seen some research in the OS-design-for-web-serving space that highlights these issues, but I suspect that Linux and its scheduler aren’t even close to 2x worse than Solaris in performance/power, performance alone, or power alone. Which means that Olde Faithful is going to win again.
In short, I can come to the same conclusion as MacAskill without running any numbers. If I had to budget my time, I’d forget about Solaris and forget about the CoolThreads box. Unless those machines get cheaper, they just won’t make sense if you care about $/CPU/Watt. Now, if you are an AMD user, it is definitely worth investigating the new Intel Woodcrest and Conroe based machines. Those machines essentially take the best of AMD’s architecture and Intel’s fab and deliver a great product in terms of price performance and performance/watt. I was astounded by the initial performance numbers reported for the Conroe, and am even more shocked to see that those numbers hold up to independent scrutiny. Now, if only the Xserve from Apple had the same compelling price point as the Mac Pro…
Postscript - Added August, 18, 2006
Looks like Sun will be helping with Solaris numbers and some performance tuning after MacAskill’s original post got Dugg at ton. I’ll be curious to see if the tuned Ubuntu or Solaris on CoolThreads can beat Olde Faithful. It will be most impressive if they succeed. Hopefully, there is a detailed analysis as to why they could double the performance per watt of the T1000 if they do.