Whippet.jl icon indicating copy to clipboard operation
Whippet.jl copied to clipboard

Error during quantification of FASTQ files

Open dBenedek opened this issue 3 years ago • 12 comments

Hello,

I generated the Whippet index file:

julia bin/whippet-index.jl --fasta data/genomes/fasta/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz --gtf data/transcriptomes/gencode/gencode.v34/gencode.v34.annotation.gtf.gz --index data/whippet_index

And then approached to run the quantification:

julia /home/bd1/tools/Whippet.jl/bin/whippet-quant.jl /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_1.fastq.gz /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_2.fastq.gz -x /home/bd1/research_mds/data/whippet_index/whippet.jls -o test --biascorrect

The quantification step reports the following error message:

Whippet v1.6.1 loading... 
  Activating environment at `~/tools/Whippet.jl/Project.toml`
 14.281455 seconds.
Loading splice graph index... /home/bd1/research_mds/data/whippet_index/whippet.jls
  5.462022 seconds (6.04 M allocations: 1.040 GiB, 23.73% gc time)
Processing reads from file...
FASTQ_1: /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_1.fastq.gz
FASTQ_2: /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_2.fastq.gz
ERROR: LoadError: Cannot encode 78 to BioSequences.DNAAlphabet{2}()
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] throw_encode_error(A::BioSequences.DNAAlphabet{2}, src::Vector{UInt8}, soff::Int64)
    @ BioSequences ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:216
  [3] encode_chunk
    @ ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:228 [inlined]
  [4] copyto!(dst::BioSequences.LongSequence{BioSequences.DNAAlphabet{2}}, doff::Int64, src::Vector{UInt8}, soff::Int64, N::Int64, #unused#::BioSequences.AsciiAlphabet)
    @ BioSequences ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:368
  [5] copyto!
    @ ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:292 [inlined]
  [6] BioSequences.LongSequence{BioSequences.DNAAlphabet{2}}(src::Vector{UInt8}, startpos::Int64, stoppos::Int64)
    @ BioSequences ~/.julia/packages/BioSequences/k4j4J/src/longsequences/constructors.jl:49
  [7] BioSequence
    @ /disk/work/users/bd1/softwares/Whippet.jl/src/types.jl:74 [inlined]
  [8] fill!(rec::Whippet.FASTQRecord, offset::Int64)
    @ Whippet /disk/work/users/bd1/softwares/Whippet.jl/src/record.jl:14
  [9] process_paired_reads!(fwd_parser::FASTX.FASTQ.Reader{TranscodingStreams.NoopStream{BufferedStreams.BufferedInputStream{Libz.Source{:inflate, BufferedStreams.BufferedInputStream{IOStream}}}}}, rev_parser::FASTX.FASTQ.Reader{TranscodingStreams.NoopStream{BufferedStreams.BufferedInputStream{Libz.Source{:inflate, BufferedStreams.BufferedInputStream{IOStream}}}}}, param::AlignParam, lib::GraphLib, quant::GraphLibQuant{SGAlignPaired, JointBiasCounter}, multi::MultiMapping{SGAlignPaired, JointBiasCounter}, mod::JointBiasMod; bufsize::Int64, sam::Bool, qualoffset::Int64)
    @ Whippet /disk/work/users/bd1/softwares/Whippet.jl/src/reads.jl:103
 [10] macro expansion
    @ /disk/work/users/bd1/softwares/Whippet.jl/src/timer.jl:5 [inlined]
 [11] main()
    @ Main ~/tools/Whippet.jl/bin/whippet-quant.jl:143
 [12] top-level scope
    @ /disk/work/users/bd1/softwares/Whippet.jl/src/timer.jl:5
in expression starting at /home/bd1/tools/Whippet.jl/bin/whippet-quant.jl:185

Is this error related to the different format of my FASTQ files?

My FASTQ files look like this:

@SRR6781235.1.1 D87PMJN1:270:C34YGACXX:3:1101:1351:1935 length=101
GTCTTTGGTCTTTTTGACTAAACCTCTTTTATAACATGTTCAATAAAAAGCTGAACTGTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCGTGCTAA
+SRR6781235.1.1 D87PMJN1:270:C34YGACXX:3:1101:1351:1935 length=101
@C@DFFFFDFHHHJJIGIGHJ>GHHGIIII<FEIGIJIGIIIIJJJIJE<H@FHIIHIDACGIJJJDCDBDDDBB<BDDDDDDDDD###############
@SRR6781235.2.1 D87PMJN1:270:C34YGACXX:3:1101:1499:1984 length=101
TAATTTTTCTTTTCGTATTTTTTTAGAGATGGGATTTTTCCATATTGCTCAGTGTGGTCTTAAACTCCTGAGCTCAGGCAATCCACCTGCCTTGGCCTCTC
+SRR6781235.2.1 D87PMJN1:270:C34YGACXX:3:1101:1499:1984 length=101
CCCFFFFFHHHHHJJHIJJJJJJIIJIJIJJJJHIJJJJJJJJJJJIJJJJJIIIIJHHIJJJJJJHHHHHFFFFFEEEDEDDDDDDDDDDDDCDDDDDDD

Thanks, Benedek

dBenedek avatar May 06 '21 11:05 dBenedek