Skip to content

Storage

ABCI has the following five types of storage.

Tips

Such as Home Area or Group Area, other than Local Storage, are resources shared by all users. Excessive I/O load or unnecessary access will not only cause inconvenience to other users but also slow down the execution speed of your own jobs. Please keep the following points in mind when using each storage space.

  • For data that does not require persistence, such as intermediate data, we recommend that you refrain from creating files and use memory.
  • Proactively utilize scratch areas that can be accessed at high speed. It is recommended that files that will be accessed many times during job execution be staged (temporarily copied) to a Local scratch.
  • Creating and accessing large numbers of small files on a shared file system is not recommended. It is recommended to use scratch space or combine multiple files into one larger file and then access them. For example, consider using HDF5, WebDataset, etc.
  • Refrain from opening/closing the same file unnecessarily and repeatedly within a single job.
  • Please consult us in advance if you intend to create more than a hundred million files in a short period of time.

Home Area

Home area is the disk area of the Lustre file system shared by interactive and compute nodes, and is available to all ABCI users by default. The disk quota is limited to 200 GiB.

Warning

In Home area, there is an upper limit on the number of files that can be created under one directory. The upper limit depends on the length of the file names located in the directory and is approximately 4M to 12M files. If an attempt is made to create a file that exceeds the limit, an error message "No spcae left on device" is output and the file creation fails.

[Advanced Option] File Striping

Home area is provided by the Lustre file system. The Lustre file system distributes and stores file data onto multiple disks. On home area, you can choose two distribution methods which are Round-Robin (default) and Striping.

Tips

See Configuring Lustre File Striping for an overview of file striping feature.

How to Set Up File Striping

$ lfs setstripe  [options] <dirname | filename>
Option Description
-S Sets a stripe size. -S #k, -S #m or -S #g option sets the size to KiB, MiB or GiB respectively.
-i Specifies the start OST index to which a file is written. If -1 is set, the start OST is randomly selected.
-c Sets a stripe count. If -1 is set, all available OSTs are written.

Tips

To display OST index, use the lfs df or lfs osts command

Example) Set a stripe pattern #1. (Creating a new file with a specific stripe pattern.)

[username@es1 work]$ lfs setstripe -S 1m -i 4 -c 4 stripe-file
[username@es1 work]$ ls
stripe-file

Example) Set a stripe pattern #2. (Setting up a stripe pattern to a directory.)

[username@es1 work]$ mkdir stripe-dir
[username@es1 work]$ lfs setstripe -S 1m -i 4 -c 4 stripe-dir

How to Display File Striping Settings

To display the stripe pattern of a specified file or directory, use the lfs getstripe command.

$ lfs getstripe <dirname | filename>

Example) Display stripe settings #1. (Displaying the stripe pattern of a file.)

[username@es1 work]$ lfs getstripe stripe-file
stripe-file
lmm_stripe_count:  4
lmm_stripe_size:   1048576
lmm_pattern:       1
lmm_layout_gen:    0
lmm_stripe_offset: 4
        obdidx           objid           objid           group
             4         3024641       0x2e2701                0
             5         3026034       0x2e2c72                0
             6         3021952       0x2e1c80                0
             7         3019616       0x2e1360                0

Example) Display stripe settings #2. (Displaying the stripe pattern of a directory.)

[username@es1 work]$ lfs getstripe stripe-dir
stripe-dir
stripe_count:  4 stripe_size:   1048576 stripe_offset: 4

Group Area

Group area is the disk area of the Lustre file system shared by interactive and compute nodes. To use Group area, "User Administrator" of the group needs to apply "Add group disk" via ABCI User Portal. Regarding how to add group disk, please refer to Disk Addition Request in the ABCI Portal Guide.

To find the path to your group area, use the show_quota command. For details, see Checking Disk Quota.

How to check inode usage

The MDT stores inode information for a file, but there is an upper limit on the number of inodes that can be stored per MDT. You can see how much inodes are currently used for each MDT with the lfs df -i. The IUse%entry in the/groups [MDT:?]line in the output of the command is the percentage of the inode used in each MDT.
In the following example, the inode utilization for MDT:0 is 30%.

[username@es1 ~]$ lfs df -i /groups
UUID                      Inodes       IUsed       IFree IUse% Mounted on
groups-MDT0000_UUID   3110850464   904313344  2206537120  30% /groups[MDT:0]
groups-MDT0001_UUID   3110850464  2778144306   332706158  90% /groups[MDT:1]
groups-MDT0002_UUID   3110850464   935143862  2175706602  31% /groups[MDT:2]
groups-MDT0003_UUID   3110850464  1356224703  1754625761  44% /groups[MDT:3]
groups-MDT0004_UUID   3110850464   402932004  2707918460  13% /groups[MDT:4]
groups-MDT0005_UUID   3110850464         433  3110850031   1% /groups[MDT:5]
(snip)

You can check MDT No. used by your ABCI group with the following command.

[username@es1 ~]$ ls -d /groups/?/(ABCI group name)
/groups/(MDT No.)/(ABCI group name)

In the following example, the ABCI group uses MDT:0.

[username@es1 ~]$ ls -d /groups/?/gaa00000
/groups/0/gaa00000

Global scratch area

Global scratch area is lustre file system and available for all ABCI users. This storage is shared by interactive nodes and all Compute Nodes V and A. The quota for every users is set in 5TiB.

The following directory is available for all users as a high speed data area.

/scratch/(ABCI account)

To see the quota value of the global scratch area, issue show_quota command. For a description of the command, see Checking Disk Quota.

Warning

The global scratch area has a cleanup function.
When the usage of the file area or inode area of /scratch exceeds 80%, delete candidates are selected based on the last access time and creation date of files and directories directly under /scratch/(ABCI account), and the files/directories of the delete candidates are automatically deleted. If a directory directly under /scratch/(ABCI account) becomes a candidate for deletion, all files/directories under that directory are deleted. Note that the last access time and creation date of the files/directories under that directory are not taken into account.
The first candidate to be deleted is the one whose last access time is older than 7 days. If, after deleting the candidate, the utilization of/scratch is still over 80%, the next candidate to be deleted is one whose creation date is older than 7 days.

Note

When storing a large number of files under the global scratch area, create a directory under /scratch/(ABCI account) and store the files in the directory.

Checking creation date of file/directory

Files and directories under the global scratch area are selected as candidates for deletion based on the last access time and creation date. However, you cannot check the creation date of files and directories with the ls command.

We have prepared the show_scratch command to display the creation date of files and directories under the global scratch area. To check whether the file created in the global scratch area is a candidate for deletion, use the show_scratch command.

Example) Display creation date.

[username@es1 ~]$ show_scratch
                                                                     Last Updated: 2022/01/01 00:05
Directory/File                                     created_date        valid_date    remained(days)
/bb/scratch/username/dir1                            2021/12/17        2022/01/26                25
/bb/scratch/username/dir2                            2021/12/18        2022/01/27                26
/bb/scratch/username/file1                           2021/12/19        2022/01/28                27
/bb/scratch/username/file2                           2021/11/20        2021/12/30                 0
Directories and files that have expired will be deleted soon.
If necessary, please backup.
Item Description
Directory/File files and directories name
created_date creation date of files and directories
valid_date valid date (The date of the 40th day from the creation date. After this date, it will be a candidate for deletion.)
remained(days) remaining days until it becomes a candidate for deletion

Note

Files and directories changes (create, delete, rename) will be reflected in the show_scratch command after midnight the day after the change. The information before the change is displayed until it is reflected.

[Advanced Option] Data on MDT(DoM)

The Data on MDT (DoM) function is available in the global scratch area. By enabling the DoM function, performance improvement can be expected for small-file access. Note that the DoM and Stripe features are disabled by default.

Tips

See Data on MDT for an overview of DoM.

How to configure DoM Features

Use the lfs setstripe command to configure DoM features.

$ lfs setstripe [options] <dirname | filename>
Option Description
-E Specify the end offset of each component. -E #k, -E #m, -E #g allows you to set the size in KiB, MiB and GiB. Also, -1 means eof.
-L Set Layout Type. Specifying mdt enables DoM.

Note

You cannot disable DoM for DoM-enabled files. Also, you cannot enable DoM for files with DoM disabled.

Example)Create a new file with DoM enabled
The first 64KiB of the file data is placed on the MDT and rest of file is placed on OST(s) with default striping.

[username@es1 work]$ lfs setstripe -E 64k -L mdt -E -1 dom-file
[username@es1 work]$ ls
dom-file

Example)Configure DoM features for the directory

[username@es1 work]$ mkdir dom-dir
[username@es1 work]$ lfs setstripe -E 64k -L mdt -E -1 dom-dir

Example) Checking if the file is DoM enabled

[username@es1 work]$ lfs getstripe -I1 -L dom-file
mdt

If you see mdt, the DoM feature is enabled. It is not valid for any other display.

Note

In the above example, the data stored in the MDT is limited to 64 KiB. Data exceeding 64 KiB is stored in OST(s).

You can also configure File Striping with the DoM feature.

Example) Create a new file with DoM layout and a specific striping pattern for the rest data placed on OST(s).

[username@es1 work]$ lfs setstripe -E 64k -L mdt -E -1 -S 1m -i -1 -c 4 dom-stripe-file
[username@es1 work]$ ls
dom-stripe-file

Example) Enable the DoM feature and set a striping pattern (for OST(s)) of the directory

[username@es1 work]$ mkdir dom-stripe-dir
[username@es1 work]$ lfs setstripe -E 64k -L mdt -E -1 -S 1m -i -1 -c 4 dom-stripe-dir

Local Storage

In ABCI System, a 1.6 TB NVMe SSD x1 is installed into each compute node (V) and a 2.0 TB NVMe SSD x2 are installed into each compute node (A). There are two ways to utilize these storages as follows:

  • Using as a local scratch of a node (Local scratch, Persistent local scratch (Reserved only)).
  • Using as a distributed shared file system, which consists of multiple NVMe storages in multiple compute nodes (BeeOND storage).

Note

The memory-intensive node service ended at 15:00 on October 27, 2023.

Local scratch

Local storage on compute nodes is available as a local scratch without specifying any special options at job submission. Note that the amount of the local storage you can use is determined by "Resource type". For more detail on "Resource type", please refer to Job Execution Resource. The local storage path is different for each job and you can access to local storage by using environment variables SGE_LOCALDIR.

Example) sample of job script (use_local_storage.sh)

#!/bin/bash

#$-l rt_F=1
#$-cwd

echo test1 > $SGE_LOCALDIR/foo.txt
echo test2 > $SGE_LOCALDIR/bar.txt
cp -rp $SGE_LOCALDIR/foo.txt $HOME/test/foo.txt

Example) Submitting a job

[username@es1 ~]$ qsub -g grpname use_local_storage.sh

Example) Status after execution of use_local_storage.sh

[username@es1 ~]$ ls $HOME/test/
foo.txt    <- The file remain only when it is copied explicitly in script.

Warning

The files stored under $SGE_LOCALDIR directory are removed when the job finished. The required files need to be moved to Home area or Group area in a job script using cp command.

Note

In rt_AF, not only $SGE_LOCALDIR but also/local2 can be used as a local scratch. The files stored under /local2 are removed as well when the job finished.

Persistent local scratch (Reserved only)

This function is for the Reserved service only. The Reserved service allows the local storage of the compute node to have persistent space that is not deleted on a per-job basis. This space is created when the Reserved service starts and is removed when the Reserved service ends. The persistent local storage can be accessed by using environment variables SGE_LOCALDIR.

Example) sample of job script (use_reserved_storage_write.sh)

#!/bin/bash

#$-l rt_F=1
#$-cwd

echo test1 > $SGE_ARDIR/foo.txt
echo test2 > $SGE_ARDIR/bar.txt

Example) Submitting a job

[username@es1 ~]$ qsub -g grpname -ar 12345 use_reserved_storage_write.sh

Example) sample of job script (use_reserved_storage_read.sh)

#!/bin/bash

#$-l rt_F=1
#$-cwd

cat $SGE_ARDIR/foo.txt
cat $SGE_ARDIR/bar.txt

Example) Submitting a job

[username@es1 ~]$ qsub -g grpname -ar 12345 use_reserved_storage_read.sh

Warning

The files created under $SGE_ARDIR will be deleted when the Reserved service is finished, so the necessary files should be copied to the home area or group area using cp command etc.

Warning

Compute node (A) has two NVMe SSDs and persistent local scratch uses /local2. Local scratch and persistent local scratch may be assigned to the same storage. Compute node (V) has only one NVMe SSD, so local scratch and persistent local scratch are always assigned to the same storage and share its capacity. In both cases, pay attention to usage when using persistent local scratch.

BeeOND storage

By using the BeeGFS On Demand (BeeOND), you can aggregate local storages attached to compute nodes on which your job is running to use as a temporal distributed shared file system. To use BeeOND, you need to submit job with -l USE_BEEOND=1 option. And you need to specify -l rt_F or -l rt_AF option in this case, because node must be exclusively allocated to job.

The created distributed shared file system area can be accessed using Environment Variables.

Example) sample of job script (use_beeond.sh)

#!/bin/bash

#$-l rt_F=2
#$-l USE_BEEOND=1
#$-cwd

echo test1 > $SGE_BEEONDDIR/foo.txt
echo test2 > $SGE_BEEONDDIR/bar.txt
cp -rp $SGE_BEEONDDIR/foo.txt $HOME/test/foo.txt

Example) Submitting a job

[username@es1 ~]$ qsub -g grpname use_beeond.sh

Example) Status after execution of use_beeond.sh

[username@es1 ~]$ ls $HOME/test/
foo.txt    <- The file remain only when it is copied explicitly in script.

Warning

The file stored under $SGE_BEEONDDIR directory is removed when job finished. The required files need to be moved to Home area or Group area in job script using cp command.

Warning

Compute node (A) has two NVMe SSDs and BeeOND storage uses /local2. Compute node (V) has only one NVMe SSD, so local scratch and BeeOND storage are always assigned to the same storage and share its capacity.

BeeGFS allows data to be staged in and out of the BeeOND storage in parallel using the beeond-cp command. To use beeond-cp, specify the USE_SSH=1 option to enable SSH login to the compute nodes, and then specify the ssh command and port number in the PARALLEL_SSH environment variable.

Example) sample of job script (use beeond-cp)

#!/bin/bash
#$-l rt_F=4
#$-l USE_BEEOND=1
#$-l USE_SSH=1
#$-v SSH_PORT=2222
#$-j y
#$-cwd

export PARALLEL_SSH="ssh -p 2222"
export src_dir=/path/to/data

beeond-cp stagein -n ${SGE_JOB_HOSTLIST} -g ${src_dir} -l ${SGE_BEEONDDIR}
(main process)
beeond-cp stageout -n ${SGE_JOB_HOSTLIST} -g ${src_dir} -l ${SGE_BEEONDDIR}d

[Advanced Option] Configure BeeOND Servers

A BeeOND filesystem partition consists of two kinds of services running on compute nodes: one is storage service which stores files, and the other is metadata service which stores file matadata. Each service runs on compute nodes. We refer to a compute node which runs storage service as a storage server and a compute node which runs metadata service as a metadata server. Users can specify number of storage servers and metadata servers.

The default values for counts of metadata server and storage server are as follows.

Parameter Default
Count of metadata servers 1
Count of storage servers Number of nodes requested by a job

To change the counts, define following environment variables. These environment variables have to be defined at job submission, and changing in job script takes no effect. When counts of servers are less than the number of requested nodes, servers are lexicographically selected by their names from assigned compute nodes.

Environment Variable Description
BEEOND_METADATA_SERVER Count of metadata servers in integer
BEEOND_STORAGE_SERVER Count of storage servers in integer

The following example create a BeeOND partition with two metadata servers and four storage servers. beegfs-df is used to see the configuration.

Example) sample of job script (use_beeond.sh)

#!/bin/bash

#$-l rt_F=8
#$-l USE_BEEOND=1
#$-v BEEOND_METADATA_SERVER=2
#$-v BEEOND_STORAGE_SERVER=4
#$-cwd

beegfs-df -p /beeond

Example output

METADATA SERVERS:
TargetID   Cap. Pool        Total         Free    %      ITotal       IFree    %
========   =========        =====         ====    =      ======       =====    =
       1      normal    1489.7GiB    1322.7GiB  89%      149.0M      149.0M 100%
       2      normal    1489.7GiB    1322.7GiB  89%      149.0M      149.0M 100%

STORAGE TARGETS:
TargetID   Cap. Pool        Total         Free    %      ITotal       IFree    %
========   =========        =====         ====    =      ======       =====    =
       1      normal    1489.7GiB    1322.7GiB  89%      149.0M      149.0M 100%
       2      normal    1489.7GiB    1322.7GiB  89%      149.0M      149.0M 100%
       3      normal    1489.7GiB    1322.7GiB  89%      149.0M      149.0M 100%
       4      normal    1489.7GiB    1322.7GiB  89%      149.0M      149.0M 100%

[Advanced Option] File Striping

Files are split into small chunks and stored in multiple storage servers in a BeeOND partition. Users can change file striping configuration of BeeOND.

The default configuration of the file striping is as follows.

Parameter Default Description
Stripe size 512 KB File chunk size
Stripe count 4 Number of storage servers that store chunks of a file

Users can configure file striping per-file or per-directory using beegfs-ctl command.

The following example sets file striping configuration of /beeond/data directory as 8 stripe count and 4MB stripe size.

Example) sample of job script (use_beeond.sh)

#!/bin/bash

#$-l rt_F=8
#$-l USE_BEEOND=1
#$-cwd

BEEOND_DIR=/beeond/data
mkdir ${BEEOND_DIR}
beegfs-ctl --setpattern --numtargets=8 --chunksize=4M --mount=/beeond ${BEEOND_DIR}
beegfs-ctl --mount=/beeond --getentryinfo ${BEEOND_DIR}

Output example

New chunksize: 4194304
New number of storage targets: 8

EntryID: 0-5D36F5EC-1
Metadata node: gXXXX.abci.local [ID: 1]
Stripe pattern details:
+ Type: RAID0
+ Chunksize: 4M
+ Number of storage targets: desired: 8
+ Storage Pool: 1 (Default)