methylseq icon indicating copy to clipboard operation
methylseq copied to clipboard

bwameth index creation fails

Open FutureFellwalker opened this issue 2 years ago • 7 comments

Description of the bug

Hi.

I'm encountering an error with the methylsig pipeline when using bwameth aligner. The pipeline runs fine with Bismark, but my mapping is low (~40%) and I want to compare against bwameth.

The error seems to be in preparing the reference genome index. I tried saving the reference genome locally and running offline, but encounter the same issue. Tried with both methylsig 2.4.0 and 2.5.0.

Command used and terminal output

Command input:

~/nextflow run /path/to/nf-core-methylseq_2.5.0/2_5_0/ \
--input samplesheet.csv \
--outdir nfcore-methylseq-2.5.0-bwameth \
--genome hg38 \
--save_reference \
--rrbs \
--pbat \
--aligner bwameth \
--methyl_kit \
-profile singularity \
--email [email protected]

Error:
ERROR ~ Error executing process > 'NFCORE_METHYLSEQ:METHYLSEQ:PREPARE_GENOME:SAMTOOLS_FAIDX'

Caused by:
  Not a valid path value type: groovyx.gpars.dataflow.DataflowVariable (DataflowVariable(value=/ngi-igenomes/igenomes/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa))


Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

Relevant files

nextflow.log

System information

Nextflow version 23.04.3 HPC, LSF executor nf-core/methylseq 2.4.0 and 2.5.0 Singularity container

FutureFellwalker avatar Oct 23 '23 21:10 FutureFellwalker

Hey, I have successfully run the pipeline using --aligner bismark

Issue description:

I am having the same issue reported here when I am trying to repeat the analysis using bwameth instead of bismark:

Test

The following test works fine

nextflow run nf-core/methylseq --aligner bwameth --outdir $RefDir -profile test,conda

Summary of my experimental conditions:

  • [x] Reference .fasta genome
  • [x] fasta index for --bwa_meth_index generated with bwameth.py index Genome.fasta
  • [x] 4 samples (3 biological replicates each (Double paired-end reads for each replicate)). These are *.fastq.gz files
  • [ ] I can not provide a single --fasta_index, nor do I know how to feed multiple sample indexes into the pipeline. I assume this is something that the pipeline does by itself using the --input $SampleSheet.csv , but am I wrong?

Command used and error message:

Pipeline Info

N E X T F L O W ~ version 23.10.0 nf-core/methylseq v2.5.0-g66c6138 Conda

nextflow run nf-core/methylseq \
--input $SampleSheet.csv \
--fasta $Genome.fasta \
--outdir $RefDir \
--bwa_meth_index $DirContainingRefFastaGenome \
--email [email protected] \
--aligner bwameth \
--comprehensive \
--max_cpus 40 \
--max_memory 400.GB \
-profile conda

Error: 
-[nf-core/methylseq] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_METHYLSEQ:METHYLSEQ:PREPARE_GENOME:SAMTOOLS_FAIDX'

Caused by:
  Not a valid path value type: groovyx.gpars.dataflow.DataflowVariable (DataflowVariable(value=/local/storage/Projects/ppar_emseq/data/005_ppar_emseq_muscle/sub/P_parae_wMito+lambda+puc19.fasta))


Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

I have tried including and excluding the following flags, but no version has worked:

--save_align_intermeds \
--ignore_flags \
--save_reference \ 

I appreciate any help! :)

Maximiliano

mz448 avatar Nov 06 '23 20:11 mz448

Mmm, I guess I found the problem. This is NOT a bug. It is the result of insufficient input to the pipeline with bwameth

You need to provide the 2 indexed versions of the genome

  1. --bwa_meth_index $DirContainingRefGenome versions \

Install bwameth and use bwameth.py --reference $RefGenome.fasta This will produce a series of indexed converted genomes in a container folder (Later, you will use that container folder path as the argument in the --bwa_meth_index flag)

  1. --fasta_index $RefGenome.fai \

Use SamTools samtools faidx $RefGenome.fasta It will produce a $RefGenome.fai This is an index of the non-converted genome See a good explanation here

mz448 avatar Nov 09 '23 22:11 mz448

This is NOT a bug. It is the result of insufficient input to the pipeline with bwameth

I disagree. You've found your way around the bug by providing more inputs, meaning that the pipeline did not need to create them. However, the pipeline should automatically generate the index files itself if only given a Fasta file.

This is what we need to fix:

  Not a valid path value type: groovyx.gpars.dataflow.DataflowVariable (DataflowVariable(value=/local/storage/Projects/ppar_emseq/data/005_ppar_emseq_muscle/sub/P_parae_wMito+lambda+puc19.fasta))

Looks like there's something wrong with how the channels are being built / supplied.

ewels avatar Feb 03 '24 00:02 ewels

Hi,

I also encountered the same issue when trying to run bwameth without supplying the indexes. But in my case, supplying --fasta_index was sufficient to work around the bug (my run succeeded), which narrowed down and suggested that the issue arises somewhere during the process of building index using samtools faidx.

jkh00 avatar Feb 26 '24 12:02 jkh00

Has this been fixed? I'm still having the same error when trying to use bwameth as the aligner

drothen15 avatar Apr 05 '24 14:04 drothen15

Has this been fixed? I'm still having the same error when trying to use bwameth as the aligner

@drothen15 Not sure if a fix has been included in a release yet, but I was able to fix this locally by modifying the SAMTOOLS_FAIDX process inputs.

In the prepare_genome.nf file, when calling the SAMTOOLS_FAIDX process if the params.fasta_index is not specified, there is an issue when trying to pass a value channel in as part of the tuple. I think it would work if you either 1) removed the tuple, or 2) switched the ch_fasta input to be just the file(params.fasta) instead of Channel.value(file(params.fasta)).

I resolved it by removing the tuple altogether, and then updating the main.nf containing the SAMTOOLS_FAIDX process to expect a val and path input, instead of a tuple with a val and path. If you just change the original process input as specified in prepare_genome.nf to be a file instead of the value channel, there should be no need to update the SAMTOOLS_FAIDX:main.nf.

One solution (that I used):

// inside prepare_genome.nf
SAMTOOLS_FAIDX([:], ch_fasta)

instead of

SAMTOOLS_FAIDX([[:], ch_fasta])

And then modifying SAMTOOLS_FAIDX:main.nf:

input: 
val(meta)
path(fasta)

instead of

input: 
tuple val(meta), path(fasta)

oliviapetrillo avatar May 08 '24 17:05 oliviapetrillo

Mmm, I guess I found the problem. This is NOT a bug. It is the result of insufficient input to the pipeline with bwameth

You need to provide the 2 indexed versions of the genome

  1. --bwa_meth_index $DirContainingRefGenome versions \

Install bwameth and use bwameth.py --reference $RefGenome.fasta This will produce a series of indexed converted genomes in a container folder (Later, you will use that container folder path as the argument in the --bwa_meth_index flag)

  1. --fasta_index $RefGenome.fai \

Use SamTools samtools faidx $RefGenome.fasta It will produce a $RefGenome.fai This is an index of the non-converted genome See a good explanation here

Sorry, I have a doubt: is the input to --bwa_meth_index, i.e. $DirContainingRefGenome, the output of the bwameth.py --reference $RefGenome.fasta as mentioned in this reply or the output of bwameth.py index $RefGenome.fasta?

Thank you!

LucaZanella15 avatar May 24 '24 18:05 LucaZanella15

Mmm, I guess I found the problem. This is NOT a bug. It is the result of insufficient input to the pipeline with bwameth You need to provide the 2 indexed versions of the genome

  1. --bwa_meth_index $DirContainingRefGenome versions \

Install bwameth and use bwameth.py --reference $RefGenome.fasta This will produce a series of indexed converted genomes in a container folder (Later, you will use that container folder path as the argument in the --bwa_meth_index flag)

  1. --fasta_index $RefGenome.fai \

Use SamTools samtools faidx $RefGenome.fasta It will produce a $RefGenome.fai This is an index of the non-converted genome See a good explanation here

Sorry, I have a doubt: is the input to --bwa_meth_index, i.e. $DirContainingRefGenome, the output of the bwameth.py --reference $RefGenome.fasta as mentioned in this reply or the output of bwameth.py index $RefGenome.fasta?

Thank you!

I think the later, for me that's how I got it to work

JihedC avatar Aug 21 '24 11:08 JihedC

a fix for this issue has been added to the dev branch. could someone test it and let us know so we can mark this issue closed if successful

sateeshperi avatar Sep 23 '24 15:09 sateeshperi

should be fixed in 2.7.0 release. Kindly let us know if issue persists.

sateeshperi avatar Oct 25 '24 06:10 sateeshperi