Content
Support
Please get in touch with the PLGrid Helpdesk: https://helpdesk.plgrid.pl/ regarding any difficulties in using the cluster.
For important information and announcements, please follow this page and the messages displayed in the login message.
Access to Ares
Computing resources on Ares are assigned based on PLGrid computing grants (more information can be found here: Obliczenia w PLGrid). To perform computations on Ares you need to obtain a computing grant and also apply for Ares access service through the PLGrid portal.
If your grant is active, and you have applied for the service access, the request should be accepted in about half an hour. Please report any issues through the helpdesk.
Machine description
Available login nodes:
- ssh <login>@ares.cyfronet.pl
Note that Ares uses PLGrid accounts and grants. Make sure to request the "Ares access" access service in the PLGrid portal.
Ares is built with Infiniband EDR interconnect and nodes of the following specification:
Partition | Number of nodes | CPU | RAM | RAM available for job allocations | Proportional RAM for one CPU | Proportional RAM for one GPU | Proportional CPUs for one GPU | Accelerator |
---|---|---|---|---|---|---|---|---|
plgrid (includes plgrid-long) | 532 + 256 (if not used by plgrid-bigmem) | 48 cores, Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz | 192GB | 184800MB | 3850MB | n/a | n/a | |
plgrid-bigmem | 256 | 48 cores, Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz | 384GB | 369600MB | 7700MB | n/a | n/a | |
plgrid-gpu-v100 | 9 | 32 cores, Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz | 384GB | 368000MB | n/a | 46000M | 4 | 8x Tesla V100-SXM2 |
Job submission
Ares is using Slurm resource manager, jobs should be submitted to the following partitions:
Name | Timelimit | Resource type (account suffix) | Access requirements | Description |
---|---|---|---|---|
plgrid | 72h | -cpu | Generally available. | Standard partition. |
plgrid-testing | 1h | -cpu | Generally available. | Only for testing jobs, limited to 1 running or queued job, 2 nodes maximum, high priority. |
plgrid-now | 12h | -cpu | Generally available. | Intended for interactive jobs, limited to 1 running or queued job, 1 node maximum, the highest priority. |
plgrid-long | 168h | -cpu | Requires a grant with a maximum job runtime of 168h. | Used for jobs with extended runtime. |
plgrid-bigmem | 72h | -cpu-bigmem | Requires a grant with CPU-BIGMEM resources. | Resources used for jobs requiring an extended amount of memory. |
plgrid-gpu-v100 | 48h | -gpu | Requires a grant with GPGPU resources. | GPU partition. |
If you are unsure of how to properly configure your job on Ares please consult this guide: Job configuration
Accounts and computing grants
Ares uses a new naming scheme for CPU and GPU computing accounts, which are supplied by the -A parameter in sbatch command. Currently, accounts are named in the following manner:
Resource | account name |
---|---|
CPU | grantname-cpu |
CPU bigmem nodes | grantname-cpu-bigmem |
GPU | grantname-gpu |
Please mind that sbatch -A grantname
won't work on its own. You need to add the -cpu, -cpu-bigmem, or -gpu suffix! Available computing grants, with respective account names (allocations), can be viewed using the hpc-grants
command.
Resource allocated on Ares doesn't use normalization, which was used on Prometheus and previous clusters. 1 hour of CPU time equals 1 hour spent on a computing core with a proportional amount of memory (consult the table above). The billing system accounts for jobs with more memory than the proportional amount. If the job uses more memory for each allocated CPU than the proportional amount, it will be billed as it would have used more CPUs. The billed amount can be calculated by dividing the used memory by the proportional memory per core and rounding the result to the closest and larger integer. Jobs on CPU partitions are always billed in CPU hours.
The same principle was applied to GPU resources, where the GPU-hour is a billing unit, and there are proportional memory per GPU and proportional CPUs per GPU defined (consult the table above).
The cost can be expressed as a simple algorithm:
cost_cpu = job_cpus_used * job_duration cost_memory = ceil(job_memory_used/memory_per_cpu) * job_duration final_cost = max(cost_cpu, cost_memory)
and for GPUs, where a GPU has the respective amount of memory per GPU and CPUs per GPU, respectively:
cost_gpu = job_gpus_used * job_duration cost_cpu = ceil(job_cpus_used/cpus_per_gpu) * job_duration cost_memory = ceil(job_memory_used/memory_per_gpu) * job_duration final_cost = max(cost_gpu, cost_cpu, cost_memory)
Storage
Available storage spaces are described in the following table:
Location | Location in the filesystem | Purpose |
---|---|---|
$HOME | /net/people/plgrid/<login> | Storing own applications, and configuration files. Limited to 10GB. |
$SCRATCH | /net/ascratch/people/<login> | High-speed storage for short-lived data used in computations. Data older than 30 days can be deleted without notice. It is best to rely on the $SCRATCH environment variable. Uses transparent compression. |
$PLG_GROUPS_STORAGE/<group name> | /net/pr2/projects/plgrid/<group name> | Long-term storage for data living for the period of computing grant. Should be used for storing significant amounts of data. Uses transparent compression. |
Current usage, capacity and other storage attributes can be checked by issuing the hpc-fs
command.
Please note that if a storage space uses transparent compression, the data stored on physical disks is compressed on the fly. The compression might manifest as different sizes reported by utilities like ls and du, where ls by default reports apparent file size (the amount of data in the file), and du shows actual disk usage. In most cases, the apparent file size is larger than the disk usage. The actual disk usage (as reported by du) is used for quota calculation.
System Utilities
Please use the following commands to interact with the account and storage management system:
hpc-grants
-
shows available grants, resource allocations, consumed resourcedhpc-fs
- shows available storagehpc-jobs
- shows currently pending/running jobshpc-jobs-history
- shows information about past jobs
Software
Applications and libraries are available through the modules system. Please note that the module structure was flattened, and module paths have changed compared to Prometheus! The list of available modules can be obtained by issuing the command:
module avail
the list is searchable by using the '/' key. The specific module can be loaded by the add command:
module add openmpi/4.1.1-gcc-11.2.0
and the environment can be purged by:
module purge
Sample job scripts
Example job scripts are available on this page: Sample scripts
More information
Ares is following Prometheus' configuration and usage patterns. Prometheus documentation can be found here: https://kdm.cyfronet.pl/portal/Prometheus:Basics