Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: add info about -l option for shebang

...

For important information and announcements, please follow this page and the messages displayed in the login message.

Access to Helios

We strongly suggest using SSH keys to access the machine! SSH key management can be done through the PLGrid portal. Password access will be disabled in the near future.

Computing resources on Helios are assigned based on PLGrid computing grants. To perform computations on Helios, you must obtain a computing grant through the PLGrid Portal (https://portal.plgrid.pl/) and apply for Helios access.

...

Note that Helios uses PLGrid accounts and grants. Make sure to request the "Helios access" access service in the PLGrid portal.

Helios is using the node job-exclusive policy. This means that nodes are allocated for a dedicated, single job which is using the resources. This also impacts the accounting where the minimum amount of resources used equals to one node.

Helios is a hybrid cluster. CPU nodes use x86_64 CPUs, while the GPU partition is based on GH200 superchips, which include an Nvidia Grace - ARM CPU and Nvidia Hopper GPU. HPE Slingshot is used as an interconnect. The login01 node uses an x86_64 CPU ; please keep that and RHEL 8. Please keep this in mind when compiling software, etc. Knowing the destination CPU architecture and operating system is important for selecting the proper modules and software. Each architecture has its own set of modules, in order to see the complete list of modules you need to run module avail on a node of a chosen type. Node specification can be found below:

PartitionNumber of nodesOperating systemCPURAM

Proportional

RAM for one CPU

Proportional

RAM for one GPU

Proportional

CPU for one GPU

Accelerator
plgrid (includes plgrid-long)272RHEL 8192 cores, x86_64, 2x AMD EPYC 9654 96-Core Processor @ 2.4 GHz384GB2000MBn/an/a
plgrid-bigmem120RHEL 8192 cores, x86_64, 2x AMD EPYC 9654 96-Core Processor @ 2.4 GHz768GB4000MBn/an/a
plgrid-gpu-gh200110

CrayOS

(SLES 15sp5)

288 cores, aarch64, 4x NVIDIA Grace CPU 72-Core @ 3.1 GHz480GBn/a120GB724x NVIDIA GH200 96GB 

Note that the machine Helios will soon be upgraded to RHEL 9 in the following nodes. The This change will be applied to all CPU and GPU nodes.

...

The same principle was applied to GPU resources, where the GPU-hour is a billing unit, and there are proportional memory per GPU and proportional CPUs per GPU defined (consult the table above).

Note that Helios uses a job-exclusive policy, and currently, minimal resources assigned for a job equal to one node. The job will be billed at the minimum for a whole node (192CPUs on CPU and 4GPUs on GPU partition)!

The cost can be expressed as a simple algorithm:

...

Applications and libraries are available through the modules system. Modules for ARM and x86 CPUs are not interchangeable, and selecting the right module for the destined architecture is key critical for getting software to work! Please load the proper modules on the node, inside of the job script! The list of available modules can be obtained by issuing the command:

module avail

This command should be run on a compute node to get a full list of modules available on the given architecture (node type)! The list is searchable by using the '/' key. The specific module can be loaded by the add command:

...

and the environment can be purged by:

module purge

Modules' names on Helios are case sensitive.

Sample job scripts

Please note that using bash option -l is crucial for running jobs on Helios, especially on plgrid-gpu-gh200 partition. Please use the following shebang:

#!/bin/bash -l

in your scripts. Example job scripts (without -l option for Ares compatibility) are available on this page: Sample scripts

...