Execution

Deploying the pipeline

With the config.yaml configured to your run-mode of choice with paths to the necessary manifest and input files, the workflow can be executed on any infrastructure using the snakemake command, supplied with further Snakemake command-line arguments (e.g. specifying a profile with --profile or --cluster to submit jobs to an HPC) depending on your environment.

Test your configuration by performing a dry-run:

snakemake --use-conda -n

Execute the workflow locally via:

snakemake --use-conda --cores $N

Execute the workflow on a cluster using something like:

snakemake --use-conda --cluster sbatch --jobs 100

The pipeline will automatically create a subdirectory for logs in logs/ and temporary workspace at the path specified for tempDir in the config.yaml.

Wrapper scripts

We have provided two convenience scripts in the 54gene-wgs-germline repository to execute the workflow in a cluster environment: run.sh and wrapper.sh. You may customize these scripts for your needs, or run using a profile (e.g. this profile for a slurm job scheduler).

The wrapper.sh script embeds the snakemake command and other command-line flags to control submission of jobs to an HPC using the cluster_mode string pulled from the config.yaml. This script also directs all stdout from Snakemake to a log file in the parent directory named WGS_${DATE}.out which will include the latest git tag and version of the pipeline, if cloned from our repository. For additional logging information, see Logging.

This wrapper script can be edited to your needs and run using bash run.sh.

Automatic retries with scaling resources

Many rules in this pipeline are configured to automatically re-submit upon failure up to a user-specified number of times. This is controlled via Snakemake’s --restart-times command line parameter. The relevant rules will automatically scale resource requests with every retry as follows (example from rule align_reads):

resources:
   mem_mb=lambda wildcards, attempt: attempt * config["bwa"]["memory"],

In this example, if the specified amount for bwa used in align_reads is set to memory: 3000 but the job fails, it will be resubmitted on a second attempt with twice the memory. Subsequently, it if fails again, a third attempt with three times the memory will be submitted (depending on your setting for --restart-times). If your system or infrastructure does not have the necessary memory available, there is potential for re-submission of jobs to fail due to insufficient resources.

Logging

All job-specific logs will be directed to a logs/ subdirectory in the home analysis directory of the pipeline. This directory is automatically created for you upon execution of the pipeline. For example, if you run the pipeline on a Slurm cluster with default parameters, these log files will follow the naming structure of snakejob.<name_of_rule>.<job_number>.

If you choose to use the wrapper.sh script provided and modified for your environment, a WGS_${DATE}.out log file containing all stdout from snakemake will also be available in the parent directory of the pipeline.