nf-core/configs: UTD Juno Configuration

TBD: All nf-core pipelines have been successfully configured for use on the Juno HPC cluster at The Univeristy of Texas at Dallas.

To use, run the pipeline with -profile utd_juno. This will download and launch the utd_juno.config which has been pre-configured with a setup suitable for Juno.

All of the intermediate files required to run the pipeline will be stored in the work/ directory. It is recommended to delete this directory after the pipeline has finished successfully because it can get quite large, and all of the main output files will be saved in the results/ directory anyway.

[!NOTE] You will need an account to use the HPC cluster on Ganymede in order to run the pipeline. https://docs.circ.utdallas.edu/user-guide/accounts/index.html Nextflow will need to submit the jobs via SLURM to the HPC cluster and as such the commands above will have to be executed on the login node. If in doubt contact CIRC.

Heterogenous/GPU jobs

Juno is a heterogenous compute cluster, which means it can accommodate pipelines that require GPUs. The config file has a dispatch rule that will automatically assign a queue based on the accelerator directive. You can always override this by specifying a queue directly in the queue directive.

The supported accelerators considered by the profile are NVIDIA H100 and A30 GPUs, you can request them like this:

process NVIDIA_SMI {
    publishDir "results", mode: 'copy'
    accelerator params.gpus, type: params.gpu_type
 
    input:
    val(gpu_id)
 
    output:
    path "nvidia_smi_${gpu_id}.out", emit: results
 
    script:
    """
    NVIDIA_SMI=\${NVIDIA_SMI:-nvidia-smi}
    \${NVIDIA_SMI} -i ${gpu_id} > nvidia_smi_${gpu_id}.out
    """
}

Config file

See config file on GitHub

conf/utd_juno
params {
    config_profile_description = 'University of Texas at Dallas HPC cluster profile provided by nf-core/configs'
    config_profile_contact = 'Anne Fu'
    config_profile_contact_github = '@eternal-flame-ad'
    config_profile_contact_email = 'anne.fu@utdallas.edu'
    config_profile_url = 'https://hpc.utdallas.edu/systems-resources/juno/'
}
 
env {
    SINGULARITY_CACHEDIR="/home/$USER/scratch/singularity"
}
 
 
def select_queue_and_flags = { cpus, memory, time, accelerator ->
    var accelerator_count = 0;
    var accelerator_ty = null;
    if (accelerator != null) {
        if (accelerator instanceof Number) {
            accelerator_count = accelerator;
        } else if (accelerator instanceof Map) {
            accelerator_ty = accelerator.type;
            accelerator_count = accelerator.limit;
        } else {
            throw new IllegalArgumentException("Invalid `accelerator` directive value: $accelerator [${accelerator.getClass().getName()}]")
        }
    }
 
    if (accelerator_count > 0) {
        if (["h100", "nvidia_h100", "nvidia-h100",  "nvidia_h100_80gb_hbm3"].contains(accelerator_ty)) {
            // if this is a long task with a single accelarator, probably need the whole GPU
            if (accelerator_count == 1 && time >= 2.h) {
                return ['queue': 'h100', 'flags': "--gres=nvidia_h100_80gb_hbm3=${accelerator_count}"];
            } else {
                return ['queue': 'h100-2.40gb', 'flags': "--gres=nvidia_h100_80gb_hbm3=${accelerator_count}"];
            }
        }
 
        if (["a30", "nvidia_a30", "nvidia-a30", "nvidia_a30_2g", "nvidia_a30_4g"].contains(accelerator_ty)) {
            if (accelerator_count >= 4) {
                return ['queue': 'a30-4.6gb', 'flags': "--gres=nvidia_a30=${accelerator_count}"];
            } else if (accelerator_count >= 2) {
                return ['queue': 'a30-2.12gb', 'flags': "--gres=nvidia_a30_2g=${accelerator_count}"];
            } else if (accelerator_count >= 1) {
                return ['queue': 'a30', 'flags': "--gres=nvidia_a30_1g=${accelerator_count}"];
            }
        }
    }
 
    if (memory >= 384.GB && memory <= 512.GB) {
        return ['queue': 'h100', 'flags': ""];
    }
 
    if (memory >= 512.GB) {
        return ['queue': 'a30', 'flags': ""];
    }
 
    return ['queue': 'normal', 'flags': ''];
}
 
executor {
    queueSize = 72
    queueStatInterval = '2 min'
    submitRateLimit = '20 min'
    jobName = { "${task.process.split(':').last()}" }
}
 
singularity {
    enabled = true
    autoMounts = true
    cacheDir = "/home/$USER/scratch/singularity"
}
 
process {
    executor = 'slurm'
    queue = {
        select_queue_and_flags(task.cpus, task.memory, task.time, task.get('accelerator')).queue
    }
    clusterOptions = {
        select_queue_and_flags(task.cpus, task.memory, task.time, task.get('accelerator')).flags
    }
    withLabel: gpu {
        containerOptions = '--nv'
    }
}