...
Helios is built with Slingshot interconnect and nodes of the following specification:
Partition | Number of nodes | CPU | RAM | Proportional RAM for one CPU | Proportional RAM for one GPU | Proportional CPU for one GPU | Accelerator |
---|---|---|---|---|---|---|---|
plgrid (includes plgrid-long) | 272 | 192 cores, 2x AMD EPYC 9654 96-Core Processor @ 2.4 GHz | 384 GB | 2000 MB | n/a | n/a | |
plgrid-bigmem | 120 | 192 cores, 2x AMD EPYC 9654 96-Core Processor @ 2.4 GHz | 768 GB | 4000 MB | n/a | n/a | |
plgrid-gpu-gh200 | 110 | 288 cores, 4x | 480 GB | n/a | 120 GB | 72 | 4x NVIDIA GH200 96GB |
Job submission
Helios is using Slurm resource manager, jobs should be submitted to the following partitions:
Name | Timelimit | Resource type (account suffix) | Access requirements | Description |
---|---|---|---|---|
plgrid | 72h | -cpu | Generally available. | Standard partition. |
plgrid-long | 168h | -cpu | Requires a grant with a maximum job runtime of 168h. | Used for jobs with extended runtime. |
plgrid-gpu-gh200 | 48h | -gpu-gh200 | Requires a grant with GPGPU resources. | GPU partition. |
If you are unsure of how to properly configure your job on Helios please consult this guide: Job configuration
...
Helios uses a new naming scheme for CPU and GPU computing accounts, which are supplied by the -A parameter in sbatch command. Currently, accounts are named in the following manner:
Resource | account name |
---|---|
CPU | grantname-cpu |
GPU | grantname-gpu-gh200 |
Please mind that sbatch -A grantname
won't work on its own. You need to add the -cpu, -cpu-bigmem, or -gpu-gh200 suffix! Available computing grants, with respective account names (allocations), can be viewed using the hpc-grants
command.
...
Available storage spaces are described in the following table:
Location | Location in the filesystem | Purpose |
---|---|---|
$HOME | /net/home/plgrid/<login> | Storing own applications, and configuration files. Limited to 10GB. |
$SCRATCH | /net/scratch/hscra/plgrid/<login> | High-speed storage for short-lived data used in computations. Data older than 30 days can be deleted without notice. It is best to rely on the $SCRATCH environment variable. |
$PLG_GROUPS_STORAGE/<group name> | /net/storage/pr3/plgrid/<group name> | Long-term storage for data living for the period of computing grant. Should be used for storing significant amounts of data. |
Current usage, capacity and other storage attributes can be checked by issuing the hpc-fs
command.
...