...
Helios is a hybrid cluster. CPU nodes use x86_64 CPUs, while the GPU partition is based on GH200 superchips, which include an Nvidia Grace - ARM CPU and Nvidia Hopper GPU. HPE Slingshot is used as an interconnect. The login01 node uses an x86_64 CPU; please keep that in mind when compiling software, etc. Knowing the destination CPU architecture is important for selecting the proper modules and software. Node specification can be found below:
Partition | Number of nodes | Operating system | CPU | RAM | Proportional RAM for one CPU | Proportional RAM for one GPU | Proportional CPU for one GPU | Accelerator |
---|---|---|---|---|---|---|---|---|
plgrid (includes plgrid-long) | 272 | RHEL 8 | 192 cores, x86_64, 2x AMD EPYC 9654 96-Core Processor @ 2.4 GHz | 384GB | 2000MB | n/a | n/a | |
plgrid-bigmem | 120 | RHEL 8 | 192 cores, x86_64, 2x AMD EPYC 9654 96-Core Processor @ 2.4 GHz | 768GB | 4000MB | n/a | n/a | |
plgrid-gpu-gh200 | 110 | CrayOS (SLES 15sp5) | 288 cores, aarch64, 4x NVIDIA Grace CPU 72-Core @ 3.1 GHz | 480GB | n/a | 120GB | 72 | 4x NVIDIA GH200 96GB |
Note that the machine will be upgraded to RHEL 9 in the following nodes. The change will be applied to all CPU and GPU nodes.
Job submission
Helios is using Slurm resource manager, jobs should be submitted to the following partitions:
...