Whippet.jl icon indicating copy to clipboard operation
Whippet.jl copied to clipboard

ArgumentError with whippet-delta.jl

Open mmilevskiy opened this issue 5 years ago • 2 comments

Hi Tim @timbitz

I have used Whippet successfully on a few different files and cell types. In my current experiment I have several comparisons to make, each one with the P5 cells gives me this error and produces a truncated *Whippet.diff.gz file, which ends are different points depending on the comparison (i.e Wt P5 vs P7 or P5 Wt vs P5 KO).

I have reproduced the whippet-quant.jl files and still get the same error. The files from whippet-quant.jl seem fine, as I have taken a look through the gene.tpm.gz, isoform.tpm.gz, jnc.gz, map.gz and psi.gz files. I am using the default graph.jls (./julia-9d11f62bcb/bin/julia bin/whippet-index.jl --fasta mm10.fa --gtf anno/gencode_mm10.vm11.basic.gtf.gz) and the --circ options. The fastq files are working perfectly fine in another bioinformatic pipeline.

Any help or suggestions you might have are most welcome!

./julia-9d11f62bcb/bin/julia bin/whippet-delta.jl -a Whippet_Quant/Wt1-P5.psi.gz,Whippet_Quant/Wt2-P5.psi.gz,Whippet_Quant/Wt3-P5.psi.gz,Whippet_Quant/Wt4-P5.psi.gz -b Whippet_Quant/Wt1-P7.psi.gz,Whippet_Quant/Wt2-P7.psi.gz,Whippet_Quant/Wt3-P7.psi.gz,Whippet_Quant/Wt4-P7.psi.gz -o P5vsP7_Wt_Whippet -s 3 Whippet v0.11 loading and compiling... 13.883366 seconds Sample A: Wt1-P5.psi.gz,Wt2-P5.psi.gz,Wt3-P5.psi.gz,Wt4-P5.psi.gz Sample B: Wt1-P7.psi.gz,Wt2-P7.psi.gz,Wt3-P7.psi.gz,Wt4-P7.psi.gz Now processing files and calculating posterior distributions... ERROR: LoadError: ArgumentError: Beta: the condition α > zero(α) && β > zero(β) is not satisfied. Stacktrace: [1] macro expansion at /home/.../.julia/v0.6/Distributions/src/utils.jl:6 [inlined] [2] Distributions.Beta{Float64}(::Float64, ::Float64) at /home/.../.julia/v0.6/Distributions/src/univariate/continuous/beta.jl:34 [3] fit(::Type{Distributions.Beta}, ::Array{Float64,1}) at /home/.../.julia/v0.6/Distributions/src/univariate/continuous/beta.jl:133 [4] #PosteriorPsi#92(::Bool, ::Int64, ::Type{T} where T, ::Array{Whippet.PosteriorPsi,1}) at /home/.../.julia/v0.6/Whippet/src/diff.jl:26 [5] #process_psi_files#100(::Int64, ::Int64, ::Float64, ::Int64, ::Function, ::String, ::Array{BufferedStreams.BufferedInputStream,1}, ::Array{BufferedStreams.BufferedInputStream,1}) at /home/.../.julia/v0.6/Whippet/src/diff.jl:127 [6] (::Whippet.#kw##process_psi_files)(::Array{Any,1}, ::Whippet.#process_psi_files, ::String, ::Array{BufferedStreams.BufferedInputStream,1}, ::Array{BufferedStreams.BufferedInputStream,1}) at ./:0 [7] macro expansion at /home/.../.julia/v0.6/Whippet/src/timer.jl:5 [inlined] [8] main() at /stornext/.../bin/whippet-delta.jl:91 [9] include_from_node1(::String) at ./loading.jl:576 [10] include(::String) at ./sysimg.jl:14 [11] process_options(::Base.JLOptions) at ./client.jl:305 [12] _start() at ./client.jl:371 while loading /stornext/.../bin/whippet-delta.jl, in expression starting on line 5

mmilevskiy avatar May 07 '19 04:05 mmilevskiy

Hey @mmilevskiy, thanks for reporting this... it's an interesting error. I'm thinking this could be a result of having really high read-depth (but perhaps low complexity data?) coupled to extremely bimodal PSI values for a splicing event in one set of replicates (in this case I'd bet an event in your Wt P5 is to blame). So what I mean by this, is that the only way I can think of to produce that error with the default emperical sampling rate (-e 1000) is to have an event where one replicate (e.g. Wt1-P5) is 100% included with hundreds of reads supporting it (maybe PCR artifacts?), and then another replicate (e.g. Wt2-P5.psi.gz) with 0% inclusion and also probably a lot of reads supporting it-- which isn't normally to be expected from high complexity data with true replicates. As a distribution becomes more bimodal the alpha/beta parameters approach zero, and depending on the sampling rate can throw that error (though I had never encountered this myself)-- see below:

using Distributions
fit(Beta, vcat(rand(Beta(100,1), 10), rand(Beta(1,100), 10)))
ERROR: ArgumentError: Beta: the condition α > zero(α) && β > zero(β) is not satisfied.
# now increase sampling size and it fits an alpha/beta that approach zero without error
fit(Beta, vcat(rand(Beta(100,1), 100), rand(Beta(1,100), 100)))
Distributions.Beta{Float64}(α=0.01749163387578357, β=0.017638576761842543)

So-- long story short, (1) You could try to use a larger emperical sampling size (-e 10000 or -e 100000) and see if it still throws the error? (2) If strongly bimodal replicates really are what is causing that error, then I'd be more concerned about the data complexity... have you visualized the bam files, do the reads look well distributed?

timbitz avatar May 07 '19 16:05 timbitz

Hi @timbitz thanks so much for the swift reply. It took me a little while to test it out as it took a lot longer to run on the servers. It’s looks to have worked! I have to try one more comparison so will let you know if there is anything else wrong there. Thanks so much for the suggestion!

mmilevskiy avatar May 15 '19 21:05 mmilevskiy