Execution
Deploying the pipeline
With the config.yaml
configured to your run-mode of choice with paths to the necessary manifest and input files, the workflow can be executed on any infrastructure using the snakemake
command, supplied with further Snakemake command-line arguments (e.g. specifying a profile with --profile
or --cluster
to submit jobs to an HPC) depending on your environment.
Test your configuration by performing a dry-run:
snakemake --use-conda -n
Execute the workflow locally via:
snakemake --use-conda --cores $N
Execute the workflow on a cluster using something like:
snakemake --use-conda --cluster sbatch --jobs 100
The pipeline will automatically create a subdirectory for logs in logs/
and temporary workspace at the path specified for tempDir
in the config.yaml
.
Wrapper scripts
We have provided two convenience scripts in the 54gene-wgs-germline repository to execute the workflow in a cluster environment: run.sh
and wrapper.sh
. You may customize these scripts for your needs, or run using a profile (e.g. this profile for a slurm job scheduler).
The wrapper.sh
script embeds the snakemake
command and other command-line flags to control submission of jobs to an HPC using the cluster_mode
string pulled from the config.yaml
. This script also directs all stdout from Snakemake to a log file in the parent directory named WGS_${DATE}.out
which will include the latest git tag and version of the pipeline, if cloned from our repository. For additional logging information, see Logging.
This wrapper script can be edited to your needs and run using bash run.sh
.
Automatic retries with scaling resources
Many rules in this pipeline are configured to automatically re-submit upon failure up to a user-specified number of times. This is controlled via Snakemake’s --restart-times
command line parameter. The relevant rules will automatically scale resource requests with every retry as follows (example from rule align_reads
):
resources:
mem_mb=lambda wildcards, attempt: attempt * config["bwa"]["memory"],
In this example, if the specified amount for bwa
used in align_reads
is set to memory: 3000
but the job fails, it will be resubmitted on a second attempt with twice the memory. Subsequently, it if fails again, a third attempt with three times the memory will be submitted (depending on your setting for --restart-times
). If your system or infrastructure does not have the necessary memory available, there is potential for re-submission of jobs to fail due to insufficient resources.
Logging
All job-specific logs will be directed to a logs/
subdirectory in the home analysis directory of the pipeline. This directory is automatically created for you upon execution of the pipeline. For example, if you run the pipeline on a Slurm cluster with default parameters, these log files will follow the naming structure of snakejob.<name_of_rule>.<job_number>
.
If you choose to use the wrapper.sh
script provided and modified for your environment, a WGS_${DATE}.out
log file containing all stdout from snakemake will also be available in the parent directory of the pipeline.