264 is the address span for a 64-bit CPU which is 18EB or 18,446,744,073,709,600,000. Keep in mind the CPU has a 64-bit accumulator so technically there are 272 bytes the realm of x64.
NASA needs lots of storage for their telescope telemetry etc. NASA has a lot of projects on the go with a lot of universities.
In 2019 it was reported that 240EB in hard disk capacity was sold primarily to data centers. Tape was close behind with 118EB in 2019 thanks in large part to the new LTO-8 tape. LTO-9 drives are on preorder with Quantum. LTO-10 is not that far off.
Most universities use maybe 5PB for their supercomputer. Some may have 10PB with a larger budget. Most supercomputers use the Lustre file system which can assemble swarms of servers to become one large file system.
Backblaze chassis can be purchased discretely by anyone who wants them for a storage appliance. The most recent chassis now fit a standard EIA 19″ rack properly. Assembled chassis for 60 disks sell for about $10,000.
The real cost is the the hard disk pool which cost on average about $500 for the top capacity disks which is about $30,000 per server so the net cost is about $40,000 per server with 1PB raw storage using 18TB hard disks. With 8 servers per rack there is 8PB of available storage that now costs only $320,000. Fault tolerance will reduce the aggregate capacity somewhat.
Lustre is the system of choice for supercomputers with its GNU 2.0 licensing. Lustre also uses ZFS making it ideal. Lustre can handle upwards of 10,000 users on a large data repository easily. Given each 60 disk server has 10GBASE-T so the aggregate bandwidth can easily handle a large pool of workloads.
More recently Ceph has been developed for open source and it has attracted major support. Ceph is designed for data centers that are expanding. The redundancy strategy has one useful feature, individual servers can be powered down for disk upgrades and powered up transparently. Backblaze uses Ceph which has been standard with the CentOS distribution of Linux. Ceph leverages server blocks typically made of 15 disks in a software RAID 6 configuration, these blocks are assembled into a new layer of redundancy spanning multiple servers. Ceph is tolerant of servers being shut down for maintenance which makes it easy to swap out old disks for bigger ones etc.
Cisco has stackable 48-port switches for 10GBASE-T. Servers may have dual 10GBASE-T ports so typically top of the rack 16-port switches are common and fiber links to the main network switch which are much faster. Cat 7 cabling is able to reach 100 meters for longer runs to a switch if need be. Single mode fiber can achieve 400 gigabit class performance but adapters are $5,000 each. 400GBASE-KR can handle 80 kilometers distances. 100 gigabit hardware is much less expensive.
Some data centers have 32 racks side by side in a block to allow easy access for maintenance. This makes cabling easier as well. The Quantum i6000 enterprise tape library has staggering capacity. This class of storage is used for off-line cold storage. The power required will need the combined output of several nuclear plants to handle the load. Large solar and wind use can reduce the nuclear requirements slightly.
|DISK CAPACITY||DISKS NEEDED||SERVERS||RACKS|
The table assumes disks are honest 8TB etc. This is due to the power of 2 math which is used by the CPU and RAM etc.
Excel use power(2, 64) to calculate the exact value in a cell making it easy to use for RAM or CPU addressing. The graphic abilities of Excel are classic and full support for web use is available. Excel is amazingly rich with calculating abilities supporting many disciplines including accounting, economic and statistical among the most popular.
Obviously the current 18TB hard disks have reduced the number of hard disks down to 250 trillion or some 521 billion racks. The upcoming 20TB disks will materially reduce the number of racks needed to handle the 18EB. The 22TB and 24TB will make a big dent in the racks needed.