Skip to content

Known Issues

date category content status
2021/07/06 Singularity The remote build function is not available due to a failure of the Remote Builder service. 2021/07/21
close.
Resolved a communication problem in Remote Builder service.
2021/05/25 GPU A known issue has been identified that when using the GPU repeatedly, the processes remain with status D or Z and GPU memory is not released. When you try to use that GPU after this symptom, subsequent processes will not run normally because the GPU memory has not been released normally. If you find this symptom, please contact us at qa@abci.ai. 2021/07/12
Currently investigating the cause.
2020/05/17 MPI With Open MPI 4.0.5, a MPI program execution using 66 nodes or more will be failed. If you use 66 nodes or more, please set mca parameters plm_rsh_no_tree_spawn to true and plm_rsh_num_concurrent to $NHOSTS when invoking the executable.

$ mpirun -mca plm_rsh_no_tree_spawn true -mca plm_rsh_num_concurrent $NHOSTS ./a.out
2021/05/31
close
Modified the default value of these mca parameters
2020/09/30 Singularity SingularityPRO on ABCI has the following security issues. The issues affect on using SingularityPRO on the interactive nodes and in jobs that use resource types other than Full. Users are recommended to use SingularityPRO on Full resource type until it is updated.

CVE-2020-25039
CVE-2020-25040
2020/10/09
close
Updated to the fixed version, 3.5-4
2020/01/14 Cloud Storage The amount of object data is inconsistent, when the user of other groups put or delete objects in the bucket granted write permission by ACL. As a result, ABCI points to be consumed are not calculated correctly. 2020/04/03
close
Updated to the fixed version
2019/11/14 Cloud Storage Due to a bug in object storage, following error messages are output when overwriting or deleting objects that stored in multiparts.
[Overwrite] upload failed: object to s3://mybucket/object An error occurred (None) when calling the CompleteMultipartUpload operation: undefined
[Delete] delete failed: s3://mybucket/object An error occurred (None) when calling the DeleteObject operation: undefined

When you use the s3 command of AWS CLI, a large file is stored in multiparts. If you upload a large file, please refer to this page and set multipart_threshold to a large value.
2019/12/17
close
2019/10/04 MPI MPI_Allreduce provided by MVAPICH2-GDR 2.3.2 raises floating point exceptions in the following combinations of nodes, GPUs and message sizes when reduction between GPU memories is conducted.
Nodes: 28, GPU/Node: 4, Message size: 256KB
Nodes: 30, GPU/Node: 4, Message size: 256KB
Nodes: 33, GPU/Node: 4, Message size: 256KB
Nodes: 34, GPU/Node: 4, Message size: 256KB
2020/04/21
close
Updated to the fixed version
2019/04/10 Job The following qsub option requires to specify argument due to job scheduler update (8.5.4 -> 8.6.3).
resource type ( -l rt_F etc)
$ qsub -g GROUP -l rt_F=1
$ qsub -g GROUP -l rt_G.small=1
close
2019/04/10 Job The following qsub option requires to specify argument due to job scheduler update (8.5.4 -> 8.6.3).
use BEEOND ( -l USE_BEEOND)
$ qsub -g GROUP -l rt_F=2 -l USE_BEEOND=1
close
2019/04/05 Job Due to job scheduler update (8.5.4 -> 8.6.3), a comupte node can execute only up to 2 jobs each resource type "rt_G.small" and "rt_C.small" (normally up to 4 jobs ).This situation also occures with Reservation service, so to be careful when you submit job with "rt_G.small" or "rt_C.small".
$ qsub -ar ARID -l rt_G.small=1 -g GROUP run.sh (x 3 times)
$ qstat
job-ID prior name user state
--------
478583 0.25586 sample.sh username r
478584 0.25586 sample.sh username r
478586 0.25586 sample.sh username qw
2019/10/04
close