kallisto icon indicating copy to clipboard operation
kallisto copied to clipboard

no reads pseudoalign when reads are the same length as transcripts in index or of length 3?

Open winni2k opened this issue 5 years ago • 2 comments

I have put a small example on [this gist].(https://gist.github.com/winni2k/64efa2e354a70a72d8a70a5ac373cc49)

When I run run.sh, I get the following output:

0 reads pseudoalign

[build] loading fasta file transcripts.fa
[build] k-mer length: 3
[build] counting k-mers ... done.
[build] building target de Bruijn graph ...  done 
[build] creating equivalence classes ...  done
[build] target de Bruijn graph has 3 contigs and contains 3 k-mers 


[quant] fragment length distribution is truncated gaussian with mean = 4, sd = 0.1
[index] k-mer length: 3
[index] number of targets: 2
[index] number of k-mers: 3
[index] number of equivalence classes: 3
[quant] running in single-end mode
[quant] will process file 1: single.fq
[quant] finding pseudoalignments for the reads ... done
[quant] processed 8 reads, 0 reads pseudoaligned
[~warn] no reads pseudoaligned.
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 52 rounds


0 reads pseudoalign in this case as well

[quant] fragment length distribution is truncated gaussian with mean = 4, sd = 0.1
[index] k-mer length: 3
[index] number of targets: 2
[index] number of k-mers: 3
[index] number of equivalence classes: 3
[quant] running in single-end mode
[quant] will process file 1: single_v3.fq
[quant] finding pseudoalignments for the reads ... done
[quant] processed 8 reads, 0 reads pseudoaligned
[~warn] no reads pseudoaligned.
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 52 rounds


8 reads pseudoalign

[build] loading fasta file transcripts_v2.fa
[build] k-mer length: 3
[build] counting k-mers ... done.
[build] building target de Bruijn graph ...  done 
[build] creating equivalence classes ...  done
[build] target de Bruijn graph has 3 contigs and contains 5 k-mers 


[quant] fragment length distribution is truncated gaussian with mean = 4, sd = 0.1
[index] k-mer length: 3
[index] number of targets: 2
[index] number of k-mers: 5
[index] number of equivalence classes: 3
[quant] running in single-end mode
[quant] will process file 1: single.fq
[quant] finding pseudoalignments for the reads ... done
[quant] processed 8 reads, 8 reads pseudoaligned
[   em] quantifying the abundances ... done
[   em] the Expectation-Maximization algorithm ran for 52 rounds

Summary:

  • In the first case I have reads of length 4 and transcripts of length 4
  • In the second case I have reads of length 3 and transcripts of length 4
  • in the third case I have reads of length 4 and transcripts of length 5

I don't understand why the reads don't pseudoalign in the first two cases. Is this a bug or a feature?

winni2k avatar Sep 22 '18 09:09 winni2k

I don't know the internals of what causes the issue, but it has something to do with how the fragment length is used to constrain possible alignments. You can use a fragment length of 1 to remove the constraint. I checked and this will result in all reads aligning in each case you provided.

mfansler avatar Dec 03 '18 04:12 mfansler

I think k-mer length must be smaller than read length?

Zepeng-Mu avatar Feb 07 '21 01:02 Zepeng-Mu