Reference

This is an automatically generated list of all supported rules, their docstrings, and command. At the start of each workflow run a list is printed of which rules will be run. And while the workflow is running it prints which rules are being started and finished. This page is here to give an explanation to the user about what each rule does, and for developers to find what is, and isn’t yet supported. Not all Metaphor rules are listed here, only the ones with a shell directive. Rules with script or wrapper directives are not included. To see all rules in Metaphor, please refer to the workflow source code.

qc.smk

fastp_pipe

Pipe reads into fastp.

cat {input} > {output} 2> {log}

fastp_pe

Trim paired end reads with fastp.

merge_fastqs

Concatenate paired-end reads from different units.

cat {input} > {output} 2> {log}

fastqc_raw

host_removal

{{ minimap2 -t {threads}           \
         -ax {params.preset}       \
         {input.reference}         \
         {input.fastqs} ; }}                                2>> {log}   | 
{{ samtools view -buSh -f 4 ; }}                            2>> {log}   |
{{ samtools fastq - > {output.unpaired} ; }}                2>> {log}

fastq_pair

fastq_pair {input.unpaired} 2>> {log}      

compress_paired

{{ pigz -p {threads} -f -c {input} > {output} ; }} &> {log}

assembly.smk

concatenate_merged_reads

{{ cat {input.R1} > {output.R1_concat} ; }} > {log}
{{ cat {input.R2} > {output.R2_concat} ; }} >> {log}

megahit

# MegaHit has no --force flag, so we must remove the created directory prior to running
rm -rf {params.out_dir}/{wildcards.group}

megahit -1 {params.fastq1} -2 {params.fastq2}       \
        -o {params.out_dir}/{wildcards.group}       \
        --presets {params.preset}                   \
        --out-prefix {wildcards.group}              \
        --min-contig-len {params.min_contig_len}    \
        -t {threads}                                \
        --k-list {params.k_list} &> {log}

{params.remove_intermediate_contigs}

rename_contigs

Rename contigs for downstream analysis.

This is mainly used so contigs and mapping files are compatible with Anvi’o.

assembly_report

Get metrics for each assembly.

metaquast

metaquast.py -t {threads}               \
             -o {params.outdir}         \
             -m {params.mincontig}      \
             -r {input.reference}       \
             {params.extra_params}      \
             {input.contigs} &> {log}

annotation.smk

prodigal

prodigal {params.quiet}         \
         -p {params.mode}       \
         -i {input}             \
         -a {output.proteins}   \
         -o {output.genbank}    \
         {params.genes}         \
         {params.scores} &> {log}

prokka

# Get kingdom from bin eval file
bin_clean=$(echo {wildcards.bin} | sed 's/_sub$//g')  # Remove '_sub' from corrected bins
kingdom=$(grep $bin_clean {input.bin_evals} | cut -f 5)
kingdom=$(echo $kingdom | cut -f 1 -d ' ')
kingdom=$(echo $kingdom | head -c 1 | tr '[a-z]' '[A-Z]'; echo $kingdom | tail -c +2)

prokka --outdir {params.outdir}     \
       --kingdom $kingdom           \
       --cpus {threads}             \
       --prefix {wildcards.bin}     \
       {params.args}                \
       {input.genome_bin} &> {log}

download_COG_database

for file in {output}; do
    wget https://ftp.ncbi.nih.gov/pub/COG/COG2020/data/$(basename $file) -O $file 2>> {log};
done

generate_COG_taxonmap

diamond

echo {params.output_format} | sed -e 's/ /\t/g' > {output.fname}
{{ diamond blastp -q {input.fname_fasta}                \
           -p {threads}                                 \
           -d {input.fname_db}                          \
           -f {params.output_type}                      \
           {params.output_format}                       \
           {params.extra}                               \
           >> {output.fname} ; }} &> {log}

mapping.smk

concatenate_contigs

concatenate.py -m {params.sequence_length_cutoff} {output} {input} &> {log}

decompress_catalogue

pigz -d -f -p {threads} -k {input.catalogue_gz} &> {log}

concatenate_proteins

Used by DAS_Tool (skips the Prodigal run).

cat {input} > {output}

create_contigs_index

minimap2 -d {output} {input} &> {log}

create_genes_index

minimap2 -d {output} {input} &> {log}

map_reads

{{ minimap2 -t {threads}                        \
            -N {params.N}                       \
            -ax {params.preset}                 \
            --split-prefix {params.split_prefix}\
            {input.catalogue_idx}               \
            {input.fastq1}                      \
            {input.fastq2} ; }} 2>> {log}       |
{{ samtools view                                \
            -F {params.flags}                   \
            -b --threads                        \
            {threads} > {output.bam} ; }} 2>> {log}

binning.smk

vamb

rm -rf {params.outdir}

vamb --outdir {params.outdir}           \
     --fasta {input.catalogue}          \
     --jgi {input.bam_contig_depths}    \
     -p {threads}                       \
     -o {params.binsplit_sep}           \
     -t {params.batchsize}              \
     --minfasta {params.minfasta} &> {log}

{{ awk -v OFS='\t' '{{ print $2, $1 }}' {output.clusters} |  \
sed "s/$(echo '\t')/$(echo '\t')vamb./g" >                   \
{params.scaffolds2bin_unfiltered} ; }} >> {log} 2>&1

{{ grep -E "$(ls {params.outdir}/bins/*.fna | cut -f 6 -d / |   \
cut -f 1 -d . | xargs | sed 's/ /$|/g')$"                       \
{params.scaffolds2bin_unfiltered} > {output.scaffolds2bin} ; }} >> {log} 2>&1

rm {params.scaffolds2bin_unfiltered}

metabat2

rm -rf {output.outdir} && mkdir {output.outdir}

metabat2 -i {input.contigs}             \
         -a {input.depths}              \
         -m {params.minContig}          \
         -t {threads}                   \
         --seed {params.seed}           \
         --saveCls                      \
         -o {params.outfile} &> {log}

sed "s/$(echo '\t')/$(echo '\t')metabat2./g" {params.outfile} > {output.scaffolds2bin}

concoct

 
rm -rf {output.outdir}
mkdir {output.outdir}

{{ cut_up_fasta.py {input.catalogue}                        \
                   -c {params.contig_size}                  \
                   -o 0                                     \
                   -b {params.bed}                          \
                   --merge_last                             \
                   > {params.contigs}  ; }} 2>> {log}

{{ concoct_coverage_table.py {params.bed}                   \
                             {input.bams}                   \
                             > {params.coverage_table} ; }} 2>> {log}

{{ concoct --composition_file {params.contigs}              \
           --coverage_file {params.coverage_table}          \
           -b {output.outdir}                               \
           -t {threads}  ; }} 2>> {log}

{{ merge_cutup_clustering.py {params.clustering_gt}         \
                             > {params.clustering_merged}  ; }} 2>> {log}

mkdir {params.fasta_bins}

{{ extract_fasta_bins.py {input.catalogue}                  \
                         {params.clustering_merged}         \
                         --output_path {params.fasta_bins} ; }} 2>> {log}

sed "s/,/$(echo '\t')concoct./g" {params.clustering_merged} | tail -n +2 > {output.scaffolds2bin}

DAS_tool

Refine bins assembled with one or more binners.

DAS_Tool -i {params.fmt_scaffolds2bin}                      \
         -c {input.contigs}                                 \
         -o {params.outpreffix}                             \
         -l {params.binners}                                \
         -p {input.proteins}                                \
         --score_threshold {params.score_threshold}         \
         --search_engine diamond                            \
         --write_bins                                       \
         --write_bin_evals                                  \
         {params.extra}                                     \
         --threads {threads} &> {log}

bins_report

Plots binning metrics generated by DAS Tool.


Disclaimer

This page was generated with a script adapted from the seq2science repository.

MIT License

Copyright (c) 2019 Maarten-vd-Sande (vanheeringen-lab)

For the full license, please see the script source code.