A small AI company had designed an AI product that uses the whole wafer to make just one unit. It is 56.7 times larger than the largest Nvidia graphics processing unit, which measures 815 square millimeters and 21.1 billion transistors.

WSE beside an nVIDIA Tesla V100

Generally a wafer has defects so the company had to build redundancy into their device to handle the problem.

The WSE also contains 3,000 times more high-speed, on-chip memory and has 10,000 times more memory bandwidth. Chip size is profoundly important in AI, as big chips process information more quickly, producing answers in less time. 

The 46,225 square millimeters of silicon in the Cerebras WSE house 400,000 AI-optimized, no-cache, no-overhead, compute cores and 18 gigabytes of local, distributed, superfast SRAM memory as the one and only level of the memory hierarchy. Memory bandwidth is 9 petabytes per second.

The cores are linked together with a fine-grained, all-hardware, on-chip mesh-connected communication network that delivers an aggregate bandwidth of 100 petabits per second. More cores, more local memory, and a low-latency high-bandwidth fabric together create the optimal architecture for accelerating AI work.

The WSE is manufactured by TSMC on its advanced 16nm process technology. The company spent a lot of time with TSMC to design a wafer sized device.

The WSE needs about 15,000W of power so a bank of HX1000i PSUs would be needed to run it. This is over 50 time the power needed by the V100 GPU. Now if this WSE could perk up gaming, nVidia and AMD would have fits.