informME icon indicating copy to clipboard operation
informME copied to clipboard

Bismark version for bam

Open loki-phd opened this issue 3 years ago • 11 comments

Hi, my bam files have been generated based on bismark v0.22.1. It doesn't seem to fit the script since it has been built on bismark v0.12.3 I am aligning one of the bam files using Bismark v0.12.3 to test the software incompatibility at the moment, but is there any plan to update upon the upgraded version of Bismark?

Many thanks,

loki-phd avatar Nov 23 '20 16:11 loki-phd

This is the first I am hearing about the incompatibility although I have not tested against more recent bismarks. Please keep me updated with your findings here. Also to ensure it is the bam files and not the installation, can you walk through the steps in our wiki to ensure informME is working properly using the small toy testing data including in our repository:

https://github.com/GarrettJenkinson/informME/wiki/Testing%20Your%20Installation

GarrettJenkinson avatar Nov 23 '20 16:11 GarrettJenkinson

Hi, Meanwhile I am actively figuring out the problem that I've raised above, I wanted to try to install the Julia version of its.

It gave me the following error messages:

When I tried Julia-v1.4:

Test Summary: | Pass Total Test Core Algs | 28 28 Test Larger Functions: Error During Test at /home/xxx/.julia/packages/InformMe/5Rkri/test/runtests.jl:125 Got exception outside of a @test UndefVarError: FASTA not defined Stacktrace: [1] getproperty at ./Base.jl:26 [inlined] [2] fastaToCpG(::String; outdir::String, wsize::Int64) at /home/xxx/.julia/packages/InformMe/5Rkri/src/fastaToCpG.jl:91 [3] top-level scope at /home/xxx/.julia/packages/InformMe/5Rkri/test/runtests.jl:193 [4] top-level scope at /tmp/robobuild/spack-stage-julia-1.4.0-sbt6j7fpud5mpmhrvswiv6igxhftvbda/spack-src/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113 [5] top-level scope at /home/xxx/.julia/packages/InformMe/5Rkri/test/runtests.jl:127 [6] include(::String) at ./client.jl:439 [7] top-level scope at none:6 [8] eval(::Module, ::Any) at ./boot.jl:331 [9] exec_options(::Base.JLOptions) at ./client.jl:264 [10] _start() at ./client.jl:484

Test Summary: | Pass Error Total Test Larger Functions | 8 1 9 ERROR: LoadError: Some tests did not pass: 8 passed, 0 failed, 1 errored, 0 broken. in expression starting at /home/xxx/.julia/packages/InformMe/5Rkri/test/runtests.jl:125 ERROR: Package InformMe errored during testing

Julia v1.0 [ Info: Compiling bit-parallel GC counter for LongSequence{<:NucleicAcidAlphabet} [ Info: Compiling bit-parallel mismatch counter for LongSequence{<:NucleicAcidAlphabet} [ Info: Compiling bit-parallel match counter for LongSequence{<:NucleicAcidAlphabet} [ Info: Compiling bit-parallel ambiguity counter... [ Info: For a single LongSequence{<:NucleicAcidAlphabet} [ Info: For a pair of LongSequence{<:NucleicAcidAlphabet}s [ Info: Compiling bit-parallel certainty counter for LongSequence{<:NucleicAcidAlphabet} [ Info: Compiling bit-parallel gap counter for LongSequence{<:NucleicAcidAlphabet} Test Summary: | Pass Total Test Core Algs | 28 28 Test Larger Functions: Error During Test at /home/xxx/.julia/packages/InformMe/5Rkri/test/runtests.jl:125 Got exception outside of a @test UndefVarError: FASTA not defined Stacktrace: [1] getproperty at ./sysimg.jl:13 [inlined] [2] #fastaToCpG#11(::String, ::Int64, ::Function, ::String) at /home/xxx/.julia/packages/InformMe/5Rkri/src/fastaToCpG.jl:91 [3] (::getfield(InformMe, Symbol("#kw##fastaToCpG")))(::NamedTuple{(:outdir,),Tuple{String}}, ::typeof(fastaToCpG), ::String) at ./none:0 [4] macro expansion at /home/xxx/.julia/packages/InformMe/5Rkri/test/runtests.jl:193 [inlined] [5] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Test/src/Test.jl:1083 [inlined] [6] top-level scope at /home/xxx/.julia/packages/InformMe/5Rkri/test/runtests.jl:127 [7] include at ./boot.jl:317 [inlined] [8] include_relative(::Module, ::String) at ./loading.jl:1038 [9] include(::Module, ::String) at ./sysimg.jl:29 [10] include(::String) at ./client.jl:388 [11] top-level scope at none:0 [12] eval(::Module, ::Any) at ./boot.jl:319 [13] macro expansion at ./logging.jl:317 [inlined] [14] exec_options(::Base.JLOptions) at ./client.jl:219 [15] _start() at ./client.jl:421 Test Summary: | Pass Error Total Test Larger Functions | 8 1 9 ERROR: LoadError: Some tests did not pass: 8 passed, 0 failed, 1 errored, 0 broken. in expression starting at /home/xxx/.julia/packages/InformMe/5Rkri/test/runtests.jl:125 ERROR: Package InformMe errored during testing

Tried with downloading the jil.git to the local file as well. Got me the same error messages. Have you had any questions on this part?

loki-phd avatar Nov 25 '20 10:11 loki-phd

It is strange that you are having trouble with both repos, which are totally independent and written in different programming languages. So this might mean it is an OS/dependency issue.

Can you give me more information about your system? What OS? Is samtools on the system path? For julia did Quad direct install smoothly? For Matlab do you have the correct gcc version for your version of Matlab and did the mex setup command go through smoothly?

GarrettJenkinson avatar Nov 25 '20 19:11 GarrettJenkinson

I ran on our HPC server, thus I assume maybe something went wrong. I am discussing with our IT team on this. I will let you know if anything turns into gold. many thanks.

loki-phd avatar Nov 25 '20 21:11 loki-phd

I am currently attempting to install InformMe on Ubuntu 18.04.3 LTS, using gcc 7.5.0 and I have installed samtools on /usr/local/bin/samtools and have loaded the path to samtools from my bash_profile. I will attempt installation further, but I have not been having much luck with it.

loki-phd avatar Nov 27 '20 19:11 loki-phd

A common issue can be the compatibility of your gcc and matlab versions, see here:

https://www.mathworks.com/support/requirements/previous-releases.html

Did mex -setup work OK as described here?

https://www.mathworks.com/help/matlab/ref/mex.html

And you may want to check out this issue which dove into detail on how to get informME installed in a linux cluster environment: https://github.com/GarrettJenkinson/informME/issues/16

In general, the matlab version tends to be trickier than the julia version. When I get a chance I will check if some dependency in the julia version has been broken (since the continuous integration test shows the build as succeeding that is the only possibility I can see for a problem there).

GarrettJenkinson avatar Nov 29 '20 17:11 GarrettJenkinson

Hi,

Many thanks to your feedback. I finally managed to compile files and successfully ran the test files that you addressed. However, I've not run with ./install.sh script rather I compiled individually by giving each of its paths.

g++ -c computeZtilde.cpp -I path/to/eigen -I /path/toeigen/unsupported/Eigen/mpreal -I /path/tomatlab/mex

I assume this is due to our HPC environment is different from yours. Many thanks for your help.

Now I am back to the bismark version problem.

I've completed the step1 and 2 (Reference Genome Analysis and Methylation Data Matrix Generation) and got *_matrice.mat files. However, when I ran step3 getMatrices.sh and setp4 to convert the files into bed files, it gave me a blank bed files.

Since I am using hg38 and did not have "chr" on our fasta file, I've re-downloaded fasta file with "chr", re-run reference genome analysis, and followed the downstream analysis. Bam header was also corrected it with samtools reheader. I've not yet tested the old bismark file, but bismark v22 gives a very different file format compare to bismark v12.3.

To spot the differences, I might need to run your test bam with bismark v22. To have a quick look, I converted *matrice.mat files into v6 and compared the data format. It seems like the test file has the list of 7 when I opened test_matrice.mat files, but ours had list of 5.

I may look into more details but these are the updates from our side.

loki-phd avatar Dec 02 '20 15:12 loki-phd

Hey that is great news that you got it installed and the test data passed through correctly!

Regarding your most recent issues: please keep your bam files and your reference genome consistent with respect to the fasta analysis (i.e., analyze the exact same versions). What you will need to do is use the optional arguments to specify that your reference genome does not have the "chr" string. See the -c argument here, for instance:

https://github.com/GarrettJenkinson/informME/blob/master/src/bash_src/parseBamFile/getMatrices/getMatrices_help.txt

Given that your test data went through, I am guessing this will start working for you, but do report back if you run into problems.

GarrettJenkinson avatar Dec 03 '20 01:12 GarrettJenkinson

Also while we are discussing the reference fasta, please ensure that it is sorted as discussed in the readme here:

https://github.com/GarrettJenkinson/informME/blob/master/src/bash_src/parseBamFile/fastaToCpg/fastaToCpg_help.txt

GarrettJenkinson avatar Dec 03 '20 02:12 GarrettJenkinson

Hi, I'm back with no hope. I've done with all your suggestions. I've curated reference file (hg38 decoy gatk version), kept alignment with the same reference file (trim galore, bismark v22), compiled again, checked with test files (ran smoothly), made reference mat files, ran smoothly until the getMatrices, ran informME_run and got no error messages but got blank bed files from the downstream analysis.

Initially, I tried with targeted bam files. I noticed someone tried and you've answered that this pipeline does not fit to the targeted sequencing files. Therefore, I tried with shallow whole-genome bisulfite sequencing file and got stuck at the same point. Does shallow WGBS not suitable for the tool due to the low sequencing depth?

I truly believe you have developed a marvellous tool to adapt to bisulfite sequencing data. Could you please suggest me any other possible things I can try to get the data out?

These are the files are the examplar file size that has been generated from step 1 and 2. the examplar matrices.mat is from chromosome 9.

`11M Dec 4 11:37 CpGlocationChr2.mat 7.9M Dec 4 11:43 CpGlocationChr3.mat 7.2M Dec 4 11:48 CpGlocationChr4.mat 7.3M Dec 4 11:52 CpGlocationChr5.mat 7.2M Dec 4 11:57 CpGlocationChr6.mat 7.5M Dec 4 12:02 CpGlocationChr7.mat 6.3M Dec 4 12:07 CpGlocationChr8.mat 5.9M Dec 4 12:11 CpGlocationChr9.mat 6.5M Dec 4 12:15 CpGlocationChr10.mat 6.2M Dec 4 12:19 CpGlocationChr11.mat 12M Dec 4 12:53 CpGlocationChr1.mat

166M Dec 10 17:17 hg38_sorted_matrices.mat `

Many thanks!

loki-phd avatar Dec 10 '20 22:12 loki-phd

You are correct that the tool is only expected to work for WGBS (so won't work in RRBS for example). What is the average depth of your "shallow" WGBS? In our publications, we typically had 20-40X coverage. Going below that is going to result in increasingly sparse models, but I would expect some modeling results even at ~15X coverage.

And just to be clear, you also re-ran your fastaToCpG on this hg38 genome that you aligned your data to? Furthermore, you should delete any ".mat" result files prior to rerunning our pipeline since it is set up to not delete any older results. So if you do not point to a new location or delete your old files, rerunning the code will not alter existing result ".mat" files.

GarrettJenkinson avatar Dec 10 '20 23:12 GarrettJenkinson