pipelines-nextflow icon indicating copy to clipboard operation
pipelines-nextflow copied to clipboard

The annotation_preprocessing new pipeline does not filter out the contigs less than 1000 nucleotides

Open LucileSol opened this issue 1 year ago • 7 comments

The annotation_preprocessing new pipeline does not filter out the contigs less than 1000 nucleotides The old pipeline was doing it so we need now to do it manually if there are contigs of less than 1000 nucleotides. To be fixed eventually.

can use https://github.com/NBISweden/GAAS/blob/master/bin/gaas_fasta_purify.pl for now (I think I need to test it)

LucileSol avatar Mar 29 '23 09:03 LucileSol

Can you check the script written by Nextflow (.command.sh) to see if it has the --size 1000 in it?

mahesh-panchal avatar Mar 29 '23 13:03 mahesh-panchal

yes :

#!/bin/bash -ue
gaas_fasta_purify.pl \
    --size 1000 \
    --infile genome_uppercase.fa \
    --output genome_uppercase_purified

cat <<-END_VERSIONS > versions.yml
"ANNOTATION_PREPROCESSING:ASSEMBLY_PURIFY":
    gaas: 1.2.0
END_VERSIONS

LucileSol avatar Mar 29 '23 13:03 LucileSol

and gaas_fasta_purify.pl does not remove the contigs or not anymore. I tried it separately and the contigs were still there

LucileSol avatar Mar 29 '23 13:03 LucileSol

Then check if the --size option has changed name from a version update

mahesh-panchal avatar Mar 29 '23 14:03 mahesh-panchal

Interesting, GAAS has the same release since 2020 (v1.2), the script should continue to work in the same way.

Juke34 avatar Mar 29 '23 20:03 Juke34

Is this still an issue? Can you provide me some data I can replicate the issue with?

mahesh-panchal avatar Sep 26 '23 08:09 mahesh-panchal

The GAAS script works. The module works independently of the workflow. Testing the workflow with a sample file:

>seq1
ACGTACGTACGT
>seq2
ACGTACGT
>seq3
ACGTACGTACGT

custom.config:

process {
    withName: 'ASSEMBLY_PURIFY' {
        ext.args = '--size 10'
    }
}

command:

nextflow run main.nf -profile test,docker,gitpod --subworkflow 'annotation_preprocessing' -c custom.config --genome sample.fasta

also works successfully.

purified file:

>seq1
ACGTACGTACGT
>seq3
ACGTACGTACGT

I'm not able to replicate.

mahesh-panchal avatar Sep 26 '23 13:09 mahesh-panchal