[WIP] Functional Tests (Pierre)
July 6
Next 2 steps (binning will change: don't test; how to fix the CAT bug? test profile: “genotoul”):
-
06_func_annot -
07_taxo_affi -
Test megahits for 3rd step + change file naming method in main.nf -
See if observed files are always the same (some surprises with kaiju) -
Remove nb_bases files production from pipeline -
sbatch script.sh (kaiju step bug?) -
Repertory already exists or not for each step: if not don't crash just don't test -
Change kaiju_db[_refseq].fmi method: channel.size() == 1 else error -
Renamed contigs_len to prot_len in diamond step -
Go trough main.nf code to see if it needs refactoring -
Versioning (git push problem) -
Launch all steps (07) on MAG
July 12
Progress
-
--script = .sh launched with sh and not sbatch (other runners) -
Human bank for host_fasta is too big, use smaller one with select chr (from MURATHGEN?) -
Diamond is long bc bank is huge. Create or download smaller bank for diamond: nr_bacteria.dmnd or refseq_bacteria.dmnd. If crashed later down the pipe, possible that it comes from wrong identifiers. -
Then, config profiles tweaking on CPU/RAM usage -
Not tested: line in output showing directory not tested
Refactoring:
-
process index_db_kaiju check (print): kaiju_totalPath = params.kaiju_db_dir.lastIndexOf(File.separator) kaiju_Path = params.kaiju_db_dir.substring(0,kaiju_totalPath) kaiju_Path_db = params.kaiju_db_dir.substring(kaiju_totalPath + 1) -
Test 3 cases: already existing db, or wget db or skip_kaiju (before and after refactoring) -
Rename all *_final in kaiju steps: retest -
Replace replicateId with sampleId, then test -
Replace kaiju_nodes|names by taxo_nodes|names -
Replace ${taxons} by ${taxon_levels}
July 19
Refactoring:
-
host_fasta OR bwa_index (exclusionary) -
refac channel eggnog_db_dir (bad creation, not file) -
test download db + evaluate time -
remove "01" step to launch bwaindex
Functional tests:
-
reduce diamond database size to ~ 100k lines -
FT: dict with step as key to file+method (['step':('file', 'method')]) -
os.path.join for paths -
use JSON to list files to test
August 16
Functional tests:
-
Taxo affiliation test only taxa list? -
Retest some steps to check for errors -
Change json format for file names modularity -
Use other host fasta for human (chr 21)
Refactoring:
-
Replace best_hits_diamond.py in best_hits_diamond process with filter_diamond_hits.py coded by Jean -
Test MAG dataset (evaluate calculation time) -
Add if statement to check if taxdump & accession2taxid are specified by user -
l.624: add rm for tmp bam -
l.556-558: "scaffolds" renaming might break things; for metaspades & megahits: change in one process with "if" statement for each "params.assembly"
Bugs in tests:
-
Check why changes in 01_clean_qc step for bwa: params.bwa_option in nextflow_config (nothing by default) or in profile test (-K threads * 10M) -
bwa mem ${params.bwa_option} -t ${task.cpus} ${fasta} ${trimmed_reads_R1} ${trimmed_reads_R2} > ${sampleId}.bam
Binning step:
-
Check SqueezeMeta (test on MAG data) -
Check metaWrap
August 30
Functional tests:
-
Fetch exp log files with regexp to test automatically -
Find a way to use the good method for each file
Refactoring:
-
Kaiju output is random by order of sample processed. For samples a and c, two states can happen randomly in the 3,4 and 5,6 columns. -
Accession + taxdump && kaiju_db: links in config for DL -
Accession + taxdump: if dir is given: don't DL; check if files exist; channel set un one w/ 2 val -
Update Doc
September 6
Functional tests:
-
taxo_diff method check -
maybe change check of methods list -
Select files to use for final expected results -
create a readme for FT
Refactoring:
-
Update doc -
Scaffolds renaming + assembly param output in publishdir -
Problem with diamond_parser when giving --taxonomy_dir as input: process was using taxonomy_ch only once. So one sample was processed. Fixed with taxonomy_ch.collect() -
merge_abundance_and_functional_annot.py: check refac for table queries (merge.drop 28; diamond 0,1,14): diamond table is made of 0,1,14 corresponding to qseqid, sseqid and stitle, qseqid is used in the groupby() function to concatenate eggnog and diamond results but needs to be dropped after; when using merge.drop[28], we wanted to drop qseqid but it wasn't clear: sseqid was dropped instead -
-> added column names to the diamond files (even if we only keep 3 columns) to make the column selections and drops clearer -
check if Pandas is memory heavy in other configuration of merge table -
Other
September 13
Refactoring (before merge, urgent):
-
send exp_dir files link to CH for check -
finish documentation and send to CH: "how the pipeline works today"
Functional tests (before merge, urgent):
-
finish documentation and send to CH & CN
Refac:
-
arrange meeting on steps calling method -
fixed quantification_by_contig_lineage.py the same way as merge_kaiju
FT:
-
MAG dataset: use both samples (with both host samples or chr21?) -
Explain in documentation how to wget the data samples -
produce exp_dir for MAG + create repo with exp_dir & samples -
use exp_dir for Test + create repo with exp_dir & samples
September 27
-
add to tag 2.1 bug fixes for last column + samples -
create repository test datasets -
look at issues for metagwgs (wednesday 29) -
ask joanna for source file of pipeline png (svg ?) -
05_alignment step in usage.md: warning output for links being functional needs ncbi bank -
mail to Maxime Manno: tag 2.1
Edited by MARTIN Pierre