bwameth index creation fails
Description of the bug
Hi.
I'm encountering an error with the methylsig pipeline when using bwameth aligner. The pipeline runs fine with Bismark, but my mapping is low (~40%) and I want to compare against bwameth.
The error seems to be in preparing the reference genome index. I tried saving the reference genome locally and running offline, but encounter the same issue. Tried with both methylsig 2.4.0 and 2.5.0.
Command used and terminal output
Command input:
~/nextflow run /path/to/nf-core-methylseq_2.5.0/2_5_0/ \
--input samplesheet.csv \
--outdir nfcore-methylseq-2.5.0-bwameth \
--genome hg38 \
--save_reference \
--rrbs \
--pbat \
--aligner bwameth \
--methyl_kit \
-profile singularity \
--email [email protected]
Error:
ERROR ~ Error executing process > 'NFCORE_METHYLSEQ:METHYLSEQ:PREPARE_GENOME:SAMTOOLS_FAIDX'
Caused by:
Not a valid path value type: groovyx.gpars.dataflow.DataflowVariable (DataflowVariable(value=/ngi-igenomes/igenomes/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa))
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
-- Check '.nextflow.log' file for details
Relevant files
System information
Nextflow version 23.04.3 HPC, LSF executor nf-core/methylseq 2.4.0 and 2.5.0 Singularity container
Hey,
I have successfully run the pipeline using --aligner bismark
Issue description:
I am having the same issue reported here when I am trying to repeat the analysis using bwameth instead of bismark:
Test
The following test works fine
nextflow run nf-core/methylseq --aligner bwameth --outdir $RefDir -profile test,conda
Summary of my experimental conditions:
- [x] Reference
.fastagenome - [x] fasta index for
--bwa_meth_indexgenerated withbwameth.py index Genome.fasta - [x] 4 samples (3 biological replicates each (Double paired-end reads for each replicate)). These are
*.fastq.gzfiles - [ ] I can not provide a single
--fasta_index, nor do I know how to feed multiple sample indexes into the pipeline. I assume this is something that the pipeline does by itself using the--input $SampleSheet.csv, but am I wrong?
Command used and error message:
Pipeline Info
N E X T F L O W ~ version 23.10.0 nf-core/methylseq v2.5.0-g66c6138 Conda
nextflow run nf-core/methylseq \
--input $SampleSheet.csv \
--fasta $Genome.fasta \
--outdir $RefDir \
--bwa_meth_index $DirContainingRefFastaGenome \
--email [email protected] \
--aligner bwameth \
--comprehensive \
--max_cpus 40 \
--max_memory 400.GB \
-profile conda
Error:
-[nf-core/methylseq] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_METHYLSEQ:METHYLSEQ:PREPARE_GENOME:SAMTOOLS_FAIDX'
Caused by:
Not a valid path value type: groovyx.gpars.dataflow.DataflowVariable (DataflowVariable(value=/local/storage/Projects/ppar_emseq/data/005_ppar_emseq_muscle/sub/P_parae_wMito+lambda+puc19.fasta))
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
-- Check '.nextflow.log' file for details
I have tried including and excluding the following flags, but no version has worked:
--save_align_intermeds \
--ignore_flags \
--save_reference \
I appreciate any help! :)
Maximiliano
Mmm, I guess I found the problem.
This is NOT a bug. It is the result of insufficient input to the pipeline with bwameth
You need to provide the 2 indexed versions of the genome
--bwa_meth_index $DirContainingRefGenome versions \
Install bwameth and use
bwameth.py --reference $RefGenome.fastaThis will produce a series of indexed converted genomes in a container folder (Later, you will use that container folder path as the argument in the--bwa_meth_indexflag)
--fasta_index $RefGenome.fai \
Use SamTools
samtools faidx $RefGenome.fastaIt will produce a$RefGenome.faiThis is an index of the non-converted genome See a good explanation here
This is NOT a bug. It is the result of insufficient input to the pipeline with
bwameth
I disagree. You've found your way around the bug by providing more inputs, meaning that the pipeline did not need to create them. However, the pipeline should automatically generate the index files itself if only given a Fasta file.
This is what we need to fix:
Not a valid path value type: groovyx.gpars.dataflow.DataflowVariable (DataflowVariable(value=/local/storage/Projects/ppar_emseq/data/005_ppar_emseq_muscle/sub/P_parae_wMito+lambda+puc19.fasta))
Looks like there's something wrong with how the channels are being built / supplied.
Hi,
I also encountered the same issue when trying to run bwameth without supplying the indexes. But in my case, supplying --fasta_index was sufficient to work around the bug (my run succeeded), which narrowed down and suggested that the issue arises somewhere during the process of building index using samtools faidx.
Has this been fixed? I'm still having the same error when trying to use bwameth as the aligner
Has this been fixed? I'm still having the same error when trying to use bwameth as the aligner
@drothen15 Not sure if a fix has been included in a release yet, but I was able to fix this locally by modifying the SAMTOOLS_FAIDX process inputs.
In the prepare_genome.nf file, when calling the SAMTOOLS_FAIDX process if the params.fasta_index is not specified, there is an issue when trying to pass a value channel in as part of the tuple. I think it would work if you either 1) removed the tuple, or 2) switched the ch_fasta input to be just the file(params.fasta) instead of Channel.value(file(params.fasta)).
I resolved it by removing the tuple altogether, and then updating the main.nf containing the SAMTOOLS_FAIDX process to expect a val and path input, instead of a tuple with a val and path. If you just change the original process input as specified in prepare_genome.nf to be a file instead of the value channel, there should be no need to update the SAMTOOLS_FAIDX:main.nf.
One solution (that I used):
// inside prepare_genome.nf
SAMTOOLS_FAIDX([:], ch_fasta)
instead of
SAMTOOLS_FAIDX([[:], ch_fasta])
And then modifying SAMTOOLS_FAIDX:main.nf:
input:
val(meta)
path(fasta)
instead of
input:
tuple val(meta), path(fasta)
Mmm, I guess I found the problem. This is NOT a bug. It is the result of insufficient input to the pipeline with
bwamethYou need to provide the 2 indexed versions of the genome
--bwa_meth_index $DirContainingRefGenome versions \Install bwameth and use
bwameth.py --reference $RefGenome.fastaThis will produce a series of indexed converted genomes in a container folder (Later, you will use that container folder path as the argument in the--bwa_meth_indexflag)
--fasta_index $RefGenome.fai \Use SamTools
samtools faidx $RefGenome.fastaIt will produce a$RefGenome.faiThis is an index of the non-converted genome See a good explanation here
Sorry, I have a doubt: is the input to --bwa_meth_index, i.e. $DirContainingRefGenome, the output of the bwameth.py --reference $RefGenome.fasta as mentioned in this reply or the output of bwameth.py index $RefGenome.fasta?
Thank you!
Mmm, I guess I found the problem. This is NOT a bug. It is the result of insufficient input to the pipeline with
bwamethYou need to provide the 2 indexed versions of the genome
--bwa_meth_index $DirContainingRefGenome versions \Install bwameth and use
bwameth.py --reference $RefGenome.fastaThis will produce a series of indexed converted genomes in a container folder (Later, you will use that container folder path as the argument in the--bwa_meth_indexflag)
--fasta_index $RefGenome.fai \Use SamTools
samtools faidx $RefGenome.fastaIt will produce a$RefGenome.faiThis is an index of the non-converted genome See a good explanation hereSorry, I have a doubt: is the input to
--bwa_meth_index, i.e. $DirContainingRefGenome, the output of thebwameth.py --reference $RefGenome.fastaas mentioned in this reply or the output ofbwameth.py index $RefGenome.fasta?Thank you!
I think the later, for me that's how I got it to work
a fix for this issue has been added to the dev branch. could someone test it and let us know so we can mark this issue closed if successful
should be fixed in 2.7.0 release. Kindly let us know if issue persists.