1. ABCI System Overview
1.1. System Architecture
This system is AI Bridging Cloud Infrastructure (ABCI). The ABCI system consists of 1,088 compute nodes, large storage system which has 22PB disk space, high performance interconnect and software which makes the most of hardware.
The ABCI system provides total 16FP (half precision floating point) theoretical peak performance of 550PFLOPS and total 64FP (double precision floating point) theoretical peak performance of 37 PFLOPS. The total memory capacity is 476TiB, and total memory peak bandwidth is 4.19PB.
Each compute nodes and storage system are connected with InfiniBand EDR (100Gbps), and this system is connected to the Internet at speed of 100Gbps via SINET5.
1.2. Compute Node Configuration
The ABCI system comprises 1,088 nodes of FUJITSU Server PRIMERGY CX2570 M4. Each compute node has two Intel Xeon Gold 6148 Processor (2.4 GHz, 20cores) and total number of cores is 43,520 cores. In addition, each compute node has four NVIDIA GPU Tesla V100, and total number of GPU is 4,352 GPUs.
The specifications of the compute node are as follows.
|CPU||Intel Xeon Gold 6148 Processor
2.4 GHz, 20 Cores (40 Threads)
|GPU||NVIDIA Tesla V100 for NVLink
|Memory||384 GiB DDR4 2666 MHz RDIMM (ECC)|
|SSD||Intel SSD DC P4600 1.6 TB u.2||1|
|Interconnects||InfiniBand EDR (12.5 GB/s)||2|
1.3. Software Configuration
The software available on the ABCI system is shown below.
|Job Scheduler||Univa Grid Engine||8.6.6|
|Development Environment||Intel Parallel Stduio XE Cluster Edition||2017.8
|PGI Professional Edition||17.10
|File System||DDN Lustre||2.10.5_ddn7-1|
1.4. Storage Configuration
The ABCI system has large capacity storage for storing the results of Artificial Intelligence and Big Data Analytics. In addition, 1.6TB SSD is installed as local scratch in each compute node. A list of each file system that can be used in the ABCI system is shown below.
|Usage||Mount point||Capacity||File system||Notes|
|Group area 1||/groups1||6.6PB||GPFS|
|Group area 2||/groups2||6.6PB||GPFS|
|Local scratch area for intaractive node||/local||12TB/node||XFS|
|Local scratch area for compute node||/local||1.5TB/node||XFS|
1.5. System Use Overview
In the ABCI system, all compute nodes, interactive nodes share files through the parallel file system (DDN GRIDScaler). All users can login to the interactive node as frontend with SSH tunneling. After login to the interactive node, users can develop/compile/link a program, submit a job, display the status of the job and etc. Users can develop a program for the compute node on the interactive node not equipped with GPUs. To run a program on the compute nodes, users submit a batch job and an interactive job to job management system. A interactive job is typically used for debugging a program, running an interactive application or a visualization application.
Do not run high load tasks on the interactive node because computing resources such as CPU and memory of the interactive node are shared by many users. To run high load tasks for pre- & post-processing, use the compute nodes. Note that if you run high load tasks on the interactive node, the tasks will be forcibly terminated.