IBM cheats on Cell with NVIDIA Tesla for servers

The Cell chip that powers the PlayStation 3 has been good to IBM's high-performance computing (HPC) efforts, with Big Blue's supercomputers riding the game console chip to fame and glory in the biannual Top 500 Supercomputer List. Cell makes an ideal coprocessor for the kinds of scientific codes that supercomputers run—indeed, it makes more sense as an HPC coprocessor than it does as a game console chip. But the writing may be on the wall for Cell, as IBM has just announced a server that can use two NVIDIA Tesla devices as coprocessors.

The IBM System x iDataPlex dx360 M3 features two full-sized x16 PCIe 2.0 slots that can accommodate two of NVIDIA's newest Tesla coprocessor modules. These coprocessors are based on the much-delayed Fermi architecture behind the GTX 470 and 480 cards, and they're essentially very expensive, modified GPUs. On a spectrum from "special-purpose and inflexible" to "general-purpose and fully programmable," Fermi sits somewhere between Cell and a traditional GPU. This means that generally speaking, Fermi should be much better than Cell at some minority subset of HPC workloads, and worse than Cell at the remainder.

By adopting Tesla, IBM isn't just giving a vote of confidence and a big boost to NVIDIA's HPC efforts—though the GPU maker is keen to spin the announcement this way. Rather, IBM is also diversifying away from Cell because Cell probably has no future as an actively developed commodity product. Rumors that Sony would not use Cell in the PlayStation 4 have been swirling since December of last year, and speculation to this end goes back even further, to late 2007, when Sony sold its Cell fab to Toshiba. There were also hints at the end of last year that IBM had stopped working on further Cell development. Big Blue later denied that this was the case, claiming that as long as the Sony contract is in place, it will plug ahead with the Cell.

But the idea that Sony will abandon Cell for its next console seems eminently plausible, and if that happens, there's no chance that IBM will keep the architecture around for HPC. Because of the economics of semiconductors and of high-performance computing, high-volume, commodity chips that can be repurposed for a largely cost-sensitive niche like HPC will beat a boutique, low-volume chip every single time. If Cell no longer has a base in the commodity gaming market, it no longer has a future in HPC.

NVIDIA, in contrast, does seem to have a future in HPC, mainly due to a combination of foresight and luck. NVIDIA identified HPC as a high-margin business with a lot of growth potential and began work on CUDA, its proprietary OpenCL equivalent, at a time when the idea of a GPU-powered supercomputer wasn't really on anyone's radar outside of academic circles. Luck comes into the picture in that ATI didn't bother to put any resources into nongaming applications for discrete GPUs. The result is that ATI is dominating the high-end GPU market while leaving HPC coprocessing to NVIDIA alone. IBM ends up pairing AMD's Opteron with Cell as a coprocessor and the new Tesla machines are based on Intel's Xeon. At some point, one wonders if we'll see an Opteron/Tesla combo in the Top 500 Supercomputers list—this would be an embarrassing spectacle for ATI and a PR coup for NVIDIA.

As GPUs gain wider acceptance as HPC coprocessors, this presents an opportunity for AMD to put out a superior single-vendor CPU/GPU combo platform aimed at the supercomputing market. Given the company's strengths in platform-level engineering and its position as the budget option, it's in a very good position to make this happen—if it would only commit the resources.

