[WIP] Functional Tests (Pierre)

July 6

Next 2 steps (binning will change: don't test; how to fix the CAT bug? test profile: “genotoul”):

06_func_annot
07_taxo_affi
Test megahits for 3rd step + change file naming method in main.nf
See if observed files are always the same (some surprises with kaiju)
Remove nb_bases files production from pipeline
sbatch script.sh (kaiju step bug?)
Repertory already exists or not for each step: if not don't crash just don't test
Change kaiju_db[_refseq].fmi method: channel.size() == 1 else error
Renamed contigs_len to prot_len in diamond step
Go trough main.nf code to see if it needs refactoring
Versioning (git push problem)
Launch all steps (07) on MAG

Progress

--script = .sh launched with sh and not sbatch (other runners)
Human bank for host_fasta is too big, use smaller one with select chr (from MURATHGEN?)
Diamond is long bc bank is huge. Create or download smaller bank for diamond: nr_bacteria.dmnd or refseq_bacteria.dmnd. If crashed later down the pipe, possible that it comes from wrong identifiers.
Then, config profiles tweaking on CPU/RAM usage
Not tested: line in output showing directory not tested

Refactoring:

process index_db_kaiju check (print): kaiju_totalPath = params.kaiju_db_dir.lastIndexOf(File.separator) kaiju_Path = params.kaiju_db_dir.substring(0,kaiju_totalPath) kaiju_Path_db = params.kaiju_db_dir.substring(kaiju_totalPath + 1)
Test 3 cases: already existing db, or wget db or skip_kaiju (before and after refactoring)
Rename all *_final in kaiju steps: retest
Replace replicateId with sampleId, then test
Replace kaiju_nodes|names by taxo_nodes|names
Replace ${taxons} by ${taxon_levels}

Refactoring:

Functional tests:

Functional tests:

Refactoring:

Replace best_hits_diamond.py in best_hits_diamond process with filter_diamond_hits.py coded by Jean
Test MAG dataset (evaluate calculation time)
Add if statement to check if taxdump & accession2taxid are specified by user
l.624: add rm for tmp bam
l.556-558: "scaffolds" renaming might break things; for metaspades & megahits: change in one process with "if" statement for each "params.assembly"

Bugs in tests:

Check why changes in 01_clean_qc step for bwa: params.bwa_option in nextflow_config (nothing by default) or in profile test (-K threads * 10M)
bwa mem ${params.bwa_option} -t ${task.cpus} ${fasta} ${trimmed_reads_R1} ${trimmed_reads_R2} > ${sampleId}.bam

Binning step:

Functional tests:

Refactoring:

Kaiju output is random by order of sample processed. For samples a and c, two states can happen randomly in the 3,4 and 5,6 columns.
Accession + taxdump && kaiju_db: links in config for DL
Accession + taxdump: if dir is given: don't DL; check if files exist; channel set un one w/ 2 val
Update Doc

Functional tests:

Refactoring:

Update doc
Scaffolds renaming + assembly param output in publishdir
Problem with diamond_parser when giving --taxonomy_dir as input: process was using taxonomy_ch only once. So one sample was processed. Fixed with taxonomy_ch.collect()
merge_abundance_and_functional_annot.py: check refac for table queries (merge.drop 28; diamond 0,1,14): diamond table is made of 0,1,14 corresponding to qseqid, sseqid and stitle, qseqid is used in the groupby() function to concatenate eggnog and diamond results but needs to be dropped after; when using merge.drop[28], we wanted to drop qseqid but it wasn't clear: sseqid was dropped instead
-> added column names to the diamond files (even if we only keep 3 columns) to make the column selections and drops clearer
check if Pandas is memory heavy in other configuration of merge table
Other

Refactoring (before merge, urgent):

Functional tests (before merge, urgent):

Refac:

FT:

add to tag 2.1 bug fixes for last column + samples
create repository test datasets
look at issues for metagwgs (wednesday 29)
ask joanna for source file of pipeline png (svg ?)
05_alignment step in usage.md: warning output for links being functional needs ncbi bank
mail to Maxime Manno: tag 2.1

Edited Sep 29, 2021 by MARTIN Pierre

Assignee Loading

Time tracking Loading

Confidentiality

Confidentiality controls have moved to the issue actions menu () at the top of the page.