nf-core/viralrecon
Assembly and intrahost/low-frequency variant calling for viral samples
Define where the pipeline should find input data and save output data.
Path to comma-separated file containing information about the samples you would like to analyse.
string
^\S+\.csv$
You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row. See usage docs.
NGS platform used to sequence the samples.
string
Specifies the type of protocol used for sequencing.
string
The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
string
Email address for completion summary.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (~/.nextflow/config
) then you don't need to specify this on the command line for every run.
MultiQC report title. Printed as page header, used for filename if not otherwise specified.
string
Options for the reference genome indices used to align reads.
Name of viral reference genome.
string
You can find the keys to specify the genomes in the Genomes config file.
Path to FASTA genome file.
string
^\S+\.fn?a(sta)?(\.gz)?$
If you have no genome reference available, the pipeline can build one using a FASTA file. This requires additional time and resources, so it's better to use a pre-build index if possible.
Full path to GFF annotation file.
string
^\S+\.gff(\.gz)?$
Full path to additional annotation file in GTF or GFF format.
string
^\S+(\.gff|\.gtf)(\.gz)?$
Path to directory or tar.gz archive for pre-built Bowtie2 index.
string
If the '--protocol amplicon' parameter is provided then iVar is used to trim primer sequences after read alignment and before variant calling.
string
^\S+\.bed(\.gz)?$
iVar uses the primer positions relative to the viral genome supplied in this file to soft clip primer sequences from a coordinate sorted BAM file. The file must be in BED format as highlighted below:
MN908947.3 30 54 nCoV-2019_1_LEFT 60 -
MN908947.3 385 410 nCoV-2019_1_RIGHT 60 +
MN908947.3 320 342 nCoV-2019_2_LEFT 60 -
MN908947.3 704 726 nCoV-2019_2_RIGHT 60 +
If the '--protocol amplicon' parameter is provided then Cutadapt is used to trim primer sequences from FastQ files before de novo assembly.
string
^\S+\.fn?a(sta)?(\.gz)?$
This file must contain amplicon primer sequences in Fasta format. An example is shown below:
>nCoV-2019_1_LEFT
ACCAACCAACTTTCGATCTCTTGT
>nCoV-2019_1_RIGHT
CATCTTTAAGATGTTGACGTGCCTC
>nCoV-2019_2_LEFT
CTGTTTTACAGGTTCGCGACGT
>nCoV-2019_2_RIGHT
TAAGGATCAGTGCCAAGCTCGT
The primer set to be used for the data analysis.
string
Where possible we are trying to collate links and settings for standard primer sets to make it easier to run the pipeline with standard keys. See https://github.com/nf-core/configs/blob/master/conf/pipeline/viralrecon/genomes.config
Version of the primer set e.g. '--primer_set artic --primer_set_version 3'.
number
Where possible we are trying to collate links and settings for standard primer sets to make it easier to run the pipeline with standard keys. See https://github.com/nf-core/configs/blob/master/conf/pipeline/viralrecon/genomes.config
Suffix used in name field of '--primer_bed' to indicate left primer position.
string
_LEFT
Suffix used in name field of '--primer_bed' to indicate right primer position.
string
_RIGHT
If generated by the pipeline save reference genome related files to the results folder.
boolean
Options exclusive to running the pipeline on Nanopore data using the ARTIC fieldbioinformatics pipeline.
Path to a folder containing fastq files from the Nanopore run.
string
e.g. '--fastq_dir ./20191023_1522_MC-110615_0_FAO93606_12bf9b4f/fastq_pass/'.
Path to a folder containing fast5 files from the Nanopore run.
string
e.g. '--fast5_dir ./20191023_1522_MC-110615_0_FAO93606_12bf9b4f/fast5_pass/'. Not required when running the pipeline with the '--artic_minion_caller medaka' workflow.
Sequencing summary file generated after Nanopore run completion.
string
^\S+\.txt$
e.g. '--sequencing_summary ./20191023_1522_MC-110615_0_FAO93606_12bf9b4f/sequencing_summary.txt'. Not required when running the pipeline with the '--artic_minion_caller medaka' workflow.
Minimum number of raw reads required per sample/barcode in order to be considered for the downstream processing steps.
integer
100
Minimum number of reads required after the artic guppyplex process per sample/barcode in order to be considered for the downstream processing steps.
integer
10
Variant caller used when running artic minion (default: 'nanopolish').
string
Aligner used when running artic minion (default: 'minimap2').
string
Primer scheme recognised by the artic minion command.
string
e.g. '--artic_scheme ncov-2019'. See https://artic.readthedocs.io/en/latest/primer-schemes/ and https://github.com/artic-network/primer-schemes/blob/master/schemes_manifest.json.
Parameter passed to artic minion and required when using the '--artic_minion_caller medaka' workflow.
string
See https://github.com/nanoporetech/medaka
Skip pycoQC.
boolean
Skip NanoPlot.
boolean
Options common to both the Nanopore and Illumina workflows in the pipeline.
Full path to Nextclade dataset required for 'nextclade run' command.
string
Name of Nextclade dataset to retrieve. A list of available datasets can be obtained using the 'nextclade dataset list' command.
string
Version tag of the dataset to download. A list of available datasets can be obtained using the 'nextclade dataset list' command.
string
Maximum read depth used to generate ASCIIGenome screenshots for variant locii.
integer
50
Maximum window size before and after variant locii used to generate ASCIIGenome screenshots.
integer
50
Skip freyja deep SARS-CoV-2 variant analysis using a depth weighted approach.
boolean
Skip the bootstrapping module of Freyja
boolean
Specify the name where to store UShER database (default: 'freyja_db').
string
freyja_db
Specify a coverage depth minimum which excludes sites with coverage less than the specified value
number
Using the depthcutoff
option may result in some distinct lineages now having identical barcodes, which are grouped into the format [lineage]-like(num)
(based on their shared phylogeny) in the output.
Specify the number of bootstrap repeats to do.
integer
100
Lineage defining barcodes, default is most recent from UShER database.
string
Metadata of lineages that match barcode, default is most recent from UShER database.
string
File size limit when attaching MultiQC reports to summary emails.
string
25.MB
If file generated by pipeline exceeds the threshold, it will not be attached.
Skip genome-wide and amplicon coverage plot generation from mosdepth output.
boolean
Skip Pangolin lineage analysis for genome consensus sequence.
boolean
Skip Nextclade clade assignment, mutation calling, and sequence quality checks for genome consensus sequence.
boolean
Skip variant screenshot generation with ASCIIGenome.
boolean
Skip generation of QUAST aggregated report for consensus sequences.
boolean
Skip long table generation for reporting variants.
boolean
Skip MultiQC.
boolean
Options to adjust QC, read trimming and host read filtering with Kraken2 for the Illumina workflow.
Full path to Kraken2 database built from host genome.
string
s3://ngi-igenomes/test-data/viralrecon/kraken2_human.tar.gz
Name for host genome as recognised by Kraken2 when using the 'kraken2 build' command.
string
human
Remove host reads identified by Kraken2 before running variant calling steps in the pipeline.
boolean
Remove host reads identified by Kraken2 before running aseembly steps in the pipeline.
boolean
true
Save the trimmed FastQ files in the results directory.
boolean
By default, trimmed FastQ files will not be saved to the results directory. Specify this flag (or set to true in your config file) to copy these files to the results directory when complete.
Skip FastQC.
boolean
Skip Kraken2 process for removing host classified reads.
boolean
Skip the initial read trimming step peformed by fastp.
boolean
Skip the amplicon trimming step with Cutadapt when using --protocol amplicon.
boolean
Various options for the variant calling branch of the Illumina workflow.
Specify which variant calling algorithm you would like to use. Available options are 'ivar' (default for '--protocol amplicon') and 'bcftools' (default for '--protocol metagenomic').
string
Specify which consensus calling algorithm you would like to use. Available options are 'bcftools' and 'ivar' (default: 'bcftools').
string
Minimum number of mapped reads below which samples are removed from further processing. Some downstream steps in the pipeline will fail if this threshold is too low.
integer
1000
This option unsets the '-e' parameter in 'ivar trim' to discard reads without primers.
boolean
This option sets the '-x' parameter in 'ivar trim' so that reads that occur at the specified offset positions relative to primer positions will also be trimmed.
integer
This parameter will need to be set for some amplicon-based sequencing protocols (e.g. SWIFT) as described and implemented here
Filtered duplicates reads detected by Picard MarkDuplicates from alignments.
boolean
Save unaligned reads in FastQ format from Bowtie 2 to the results directory.
boolean
Save mpileup files generated when calling variants with iVar variants or iVar consensus.
boolean
Skip iVar primer trimming step. Not recommended for --protocol amplicon.
boolean
Skip picard MarkDuplicates step.
boolean
true
Skip Picard CollectMultipleMetrics steps.
boolean
Skip SnpEff and SnpSift annotation of variants.
boolean
Skip creation of consensus base density plots.
boolean
Skip genome consensus creation step and any downstream QC.
boolean
Specify this parameter to skip all of the variant calling and mapping steps in the pipeline.
boolean
Various options for the de novo assembly branch of the Illumina workflow.
Specify which assembly algorithms you would like to use. Available options are 'spades', 'unicycler' and 'minia'.
string
spades
Specify the SPAdes mode you would like to run (default: 'rnaviral').
string
Path to profile HMMs specific for gene/organism to enhance SPAdes assembly.
string
Path to directory or tar.gz archive for pre-built BLAST database.
string
Skip Bandage image creation for assembly visualisation.
boolean
Skip blastn of assemblies relative to reference genome.
boolean
Skip ABACAS process for assembly contiguation.
boolean
Skip assembly report generation by PlasmidID.
boolean
true
Skip generation of QUAST aggregated report for assemblies.
boolean
Specify this parameter to skip all of the de novo assembly steps in the pipeline.
boolean
Minimum contig length to filter from BLAST results.
integer
200
Minimum percentage of contig aligned to filter from BLAST results.
number
0.7
Set this parameter to false to add an X at the begining or end of the primer's fasta sequence to specify cutadapt that they are non-internal 5' or 3' adapters, respectively.
boolean
See viralrecon's usage and cutadapt documentation: https://cutadapt.readthedocs.io/en/stable/guide.html#adapter-types
Set this parameter to true when the primer's for cutadapt are 3' adapters. Default value is false, as default primers are 5' adapters.
boolean
See viralrecon's usage and cutadapt documentation: https://cutadapt.readthedocs.io/en/stable/guide.html#adapter-types
Less common options for the pipeline, typically set in a config file.
Display version and exit.
boolean
Method used to save pipeline results to output directory.
string
The Nextflow publishDir
option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.
Email address for completion summary, only when pipeline fails.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
This works exactly as with --email
, except emails are only sent if the workflow is not successful.
Send plain-text email instead of HTML.
boolean
Set to receive plain-text e-mails instead of HTML formatted.
Do not use coloured log outputs.
boolean
Set to disable colourful command line output and live life in monochrome.
Incoming hook URL for messaging service
string
Incoming hook URL for messaging service. Currently, only MS Teams is supported.
Custom config file to supply to MultiQC.
string
Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file
string
Custom MultiQC yaml file containing HTML including a methods description.
string
Boolean whether to validate parameters against the schema at runtime
boolean
true
Base URL or local path to location of pipeline test dataset files
string
https://raw.githubusercontent.com/nf-core/test-datasets/
Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.
string
Parameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
string
master
Base directory for Institutional configs.
string
https://raw.githubusercontent.com/nf-core/configs/master
If you're running offline, nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell nextflow where to find them with the custom_config_base
option. For example:
## Download and unzip the config files
cd /path/to/my/configs
wget https://github.com/nf-core/configs/archive/master.zip
unzip master.zip
## Run the pipeline
cd /path/to/my/data
nextflow run /path/to/pipeline/ --custom_config_base /path/to/my/configs/configs-master/
Note that the nf-core/tools helper package has a
download
command to download all required pipeline files + singularity containers + institutional configs in one go for you, to make this process easier.
Institutional config name.
string
Institutional config description.
string
Institutional config contact information.
string
Institutional config URL link.
string