pipelines-nextflow
pipelines-nextflow copied to clipboard
The annotation_preprocessing new pipeline does not filter out the contigs less than 1000 nucleotides
The annotation_preprocessing new pipeline does not filter out the contigs less than 1000 nucleotides The old pipeline was doing it so we need now to do it manually if there are contigs of less than 1000 nucleotides. To be fixed eventually.
can use https://github.com/NBISweden/GAAS/blob/master/bin/gaas_fasta_purify.pl for now (I think I need to test it)
Can you check the script written by Nextflow (.command.sh
) to see if it has the --size 1000
in it?
yes :
#!/bin/bash -ue
gaas_fasta_purify.pl \
--size 1000 \
--infile genome_uppercase.fa \
--output genome_uppercase_purified
cat <<-END_VERSIONS > versions.yml
"ANNOTATION_PREPROCESSING:ASSEMBLY_PURIFY":
gaas: 1.2.0
END_VERSIONS
and gaas_fasta_purify.pl does not remove the contigs or not anymore. I tried it separately and the contigs were still there
Then check if the --size
option has changed name from a version update
Interesting, GAAS has the same release since 2020 (v1.2), the script should continue to work in the same way.
Is this still an issue? Can you provide me some data I can replicate the issue with?
The GAAS script works. The module works independently of the workflow. Testing the workflow with a sample file:
>seq1
ACGTACGTACGT
>seq2
ACGTACGT
>seq3
ACGTACGTACGT
custom.config:
process {
withName: 'ASSEMBLY_PURIFY' {
ext.args = '--size 10'
}
}
command:
nextflow run main.nf -profile test,docker,gitpod --subworkflow 'annotation_preprocessing' -c custom.config --genome sample.fasta
also works successfully.
purified file:
>seq1
ACGTACGTACGT
>seq3
ACGTACGTACGT
I'm not able to replicate.