Whippet.jl
Whippet.jl copied to clipboard
Error during quantification of FASTQ files
Hello,
I generated the Whippet index file:
julia bin/whippet-index.jl --fasta data/genomes/fasta/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz --gtf data/transcriptomes/gencode/gencode.v34/gencode.v34.annotation.gtf.gz --index data/whippet_index
And then approached to run the quantification:
julia /home/bd1/tools/Whippet.jl/bin/whippet-quant.jl /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_1.fastq.gz /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_2.fastq.gz -x /home/bd1/research_mds/data/whippet_index/whippet.jls -o test --biascorrect
The quantification step reports the following error message:
Whippet v1.6.1 loading...
Activating environment at `~/tools/Whippet.jl/Project.toml`
14.281455 seconds.
Loading splice graph index... /home/bd1/research_mds/data/whippet_index/whippet.jls
5.462022 seconds (6.04 M allocations: 1.040 GiB, 23.73% gc time)
Processing reads from file...
FASTQ_1: /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_1.fastq.gz
FASTQ_2: /home/bd1/mds-datasets-no-backup/dataset2/fastq/SRR6781235_2.fastq.gz
ERROR: LoadError: Cannot encode 78 to BioSequences.DNAAlphabet{2}()
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:33
[2] throw_encode_error(A::BioSequences.DNAAlphabet{2}, src::Vector{UInt8}, soff::Int64)
@ BioSequences ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:216
[3] encode_chunk
@ ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:228 [inlined]
[4] copyto!(dst::BioSequences.LongSequence{BioSequences.DNAAlphabet{2}}, doff::Int64, src::Vector{UInt8}, soff::Int64, N::Int64, #unused#::BioSequences.AsciiAlphabet)
@ BioSequences ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:368
[5] copyto!
@ ~/.julia/packages/BioSequences/k4j4J/src/longsequences/copying.jl:292 [inlined]
[6] BioSequences.LongSequence{BioSequences.DNAAlphabet{2}}(src::Vector{UInt8}, startpos::Int64, stoppos::Int64)
@ BioSequences ~/.julia/packages/BioSequences/k4j4J/src/longsequences/constructors.jl:49
[7] BioSequence
@ /disk/work/users/bd1/softwares/Whippet.jl/src/types.jl:74 [inlined]
[8] fill!(rec::Whippet.FASTQRecord, offset::Int64)
@ Whippet /disk/work/users/bd1/softwares/Whippet.jl/src/record.jl:14
[9] process_paired_reads!(fwd_parser::FASTX.FASTQ.Reader{TranscodingStreams.NoopStream{BufferedStreams.BufferedInputStream{Libz.Source{:inflate, BufferedStreams.BufferedInputStream{IOStream}}}}}, rev_parser::FASTX.FASTQ.Reader{TranscodingStreams.NoopStream{BufferedStreams.BufferedInputStream{Libz.Source{:inflate, BufferedStreams.BufferedInputStream{IOStream}}}}}, param::AlignParam, lib::GraphLib, quant::GraphLibQuant{SGAlignPaired, JointBiasCounter}, multi::MultiMapping{SGAlignPaired, JointBiasCounter}, mod::JointBiasMod; bufsize::Int64, sam::Bool, qualoffset::Int64)
@ Whippet /disk/work/users/bd1/softwares/Whippet.jl/src/reads.jl:103
[10] macro expansion
@ /disk/work/users/bd1/softwares/Whippet.jl/src/timer.jl:5 [inlined]
[11] main()
@ Main ~/tools/Whippet.jl/bin/whippet-quant.jl:143
[12] top-level scope
@ /disk/work/users/bd1/softwares/Whippet.jl/src/timer.jl:5
in expression starting at /home/bd1/tools/Whippet.jl/bin/whippet-quant.jl:185
Is this error related to the different format of my FASTQ files?
My FASTQ files look like this:
@SRR6781235.1.1 D87PMJN1:270:C34YGACXX:3:1101:1351:1935 length=101
GTCTTTGGTCTTTTTGACTAAACCTCTTTTATAACATGTTCAATAAAAAGCTGAACTGTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCGTGCTAA
+SRR6781235.1.1 D87PMJN1:270:C34YGACXX:3:1101:1351:1935 length=101
@C@DFFFFDFHHHJJIGIGHJ>GHHGIIII<FEIGIJIGIIIIJJJIJE<H@FHIIHIDACGIJJJDCDBDDDBB<BDDDDDDDDD###############
@SRR6781235.2.1 D87PMJN1:270:C34YGACXX:3:1101:1499:1984 length=101
TAATTTTTCTTTTCGTATTTTTTTAGAGATGGGATTTTTCCATATTGCTCAGTGTGGTCTTAAACTCCTGAGCTCAGGCAATCCACCTGCCTTGGCCTCTC
+SRR6781235.2.1 D87PMJN1:270:C34YGACXX:3:1101:1499:1984 length=101
CCCFFFFFHHHHHJJHIJJJJJJIIJIJIJJJJHIJJJJJJJJJJJIJJJJJIIIIJHHIJJJJJJHHHHHFFFFFEEEDEDDDDDDDDDDDDCDDDDDDD
Thanks, Benedek