Bismark icon indicating copy to clipboard operation
Bismark copied to clipboard

Bowtie2 index faulty or non existant: bismark fails to run

Open guruprasadh1996 opened this issue 2 years ago • 2 comments

Dear Felix,

Greetings!

I have been trying to run bismark (Bismark-0.23.1) for a paired end data for hg38. The genome preparation for reference hg38.fa gives me 5 files as output in CT conversion and 7 files in GA conversion. When I run bismark, it fails saying that: The Bowtie 2 index of the C->T converted genome seems to be faulty or non-existant for 'BS_CT.rev.2.bt2' and 'BS_CT.rev.2.bt2'

genome_folder/Bisulfie_Genome/CT_conversion BS_CT.1.bt2 24.6kb BS_CT.2.bt2 0kb BS_CT.3.bt2 17.6kb BS_CT.4.bt2 784.2MB genome_mfa.CT_conversion.fa 3.3GB

genome_folder/Bisulfie_Genome/GA_conversion

BS_GA.1.bt2 1GB BS_GA.2.bt2 784.2MB BS_GA.3.bt2 17.6kb BS_GA.4.bt2 784.2MB BS_GA.rev.1.bt2 1GB BS_GA.rev.2.bt2 784.2MB genome_mfa.GA_conversion.fa 3.3GB

(Command: /home/drogel/Bismark-0.23.1/bismark -n 1 --bowtie2 --genome /media/drogel/Expansion/01_User_GS_2022/005_A549ChIPSeq_GS/007_A549_WGBS_data/genome_hg38 -1 L1_A549WGBS_ENCFF327QCK_val_1.fq.gz -2 L2_A549WGBS_ENCFF986UWM_val_2.fq.gz --sam -p 2

Output: Bowtie 2 seems to be working fine (tested command 'bowtie2 --version' [2.3.4]) Output format manually set as SAM Reference genome folder provided is /media/drogel/Expansion/01_User_GS_2022/005_A549ChIPSeq_GS/007_A549_WGBS_data/genome_hg38/ (absolute path is '/media/drogel/Expansion/01_User_GS_2022/005_A549ChIPSeq_GS/007_A549_WGBS_data/genome_hg38/)' The Bowtie 2 index of the C->T converted genome seems to be faulty or non-existant ('BS_CT.rev.1.bt2'). Please run the bismark_genome_preparation before running Bismark The Bowtie 2 index of the C->T converted genome seems to be faulty or non-existant ('BS_CT.rev.2.bt2'). Please run the bismark_genome_preparation before running Bismark Couldn't find a traditional small Bowtie 2 index for the genome specified (ending in .bt2). Now searching for a large index instead (64-bit index ending in .bt2l)... The Bowtie 2 index of the C->T converted genome seems to be faulty or non-existant ('BS_CT.1.bt2l'). Please run the bismark_genome_preparation before running Bismark)

Command for TrimGalore: trim_galore --paired L1_A549WGBS_ENCFF327QCK.fastq.gz L2_A549WGBS_ENCFF986UWM.fastq.gz --quality 30 --phred33 --stringency 3 --length 20 --rrbs --trim1

How do I fix this? I tried doing this step of genome preparation several times (even with UCSC hg38.fa and NCBI hg38.fa) My trimming was done with TrimGalore: > Trimming generated 2 output files in .fq.gz.

Ubuntu version: (base) drogel@ubuntu:~$ lsb_release -a No LSB modules are available. Distributor ID:Ubuntu Description:Ubuntu 18.04.6 LTS Release: 18.04 Codename:bionic

(base) drogel@ubuntu:~$ bowtie2 --version /usr/bin/bowtie2-align-s version 2.3.4.1 64-bit

Do I have to try with HISAT2? Any inputs or suggestions will be very useful!!!

Thanks very much in advance :)

guruprasadh1996 avatar Sep 08 '22 04:09 guruprasadh1996

Dear @guruprasadh1996

It appears that something has gone wrong during the genome preparation, as there should be 6 files generated by Bowtie2 for each genome conversion (the ones ending in .bt2). I think you should just re-do this step, and monitor closely if there are any error messages. In my experience, if something goes wrong at this step it is most likely due to insufficient memory; how many system resources do you have available?

Regarding your commands:

Trim Galore: I would remove all some parameters (is your data RRBS?) so that you end up with a command like this:

trim_galore --paired L1_A549WGBS_ENCFF327QCK.fastq.gz L2_A549WGBS_ENCFF986UWM.fastq.gz --quality 30 --rrbs 

And from the Bismark command I would drop the -n 1 and --sam.

FelixKrueger avatar Sep 08 '22 07:09 FelixKrueger

Dear Felix,

Thanks for the reply. I tried with Hisat2, and it works. I mean the genome preparation is fine and the mapping is going on. I think the problem is with bowtie2.

As per your advice, I will follow the commands with your suggestions. Yes my data is RRBS Also I will drop the commands -n 1 and --sam

System resources: I have 32GB RAM; Memory is adequate (I have around 4 TB) as I am running my commands from the external hard disk, which is always connected to the CPU.

Thanks very much, Felix!

guruprasadh1996 avatar Sep 08 '22 11:09 guruprasadh1996