kallisto icon indicating copy to clipboard operation
kallisto copied to clipboard

long running time for index.idx for snDropSeq by kb ref

Open jingliu1700 opened this issue 2 years ago • 2 comments

Hi there,

I have a question about the following Kallisto code: is it appropriate for generating index and reference files for single nuclear DropSeq?

“kb ref -i index.idx -g t2g.txt -f1 cdna.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 intron_t2c.txt --workflow nucleus Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz Homo_sapiens.GRCh38.107.gtf.gz”

I was able to successfully generate t2g.txt, cdna.fa, intron.fa, cdna_t2c.txt and intron-t2c.txt files on my home computer. However, I guess because I chose the –workflow as nucleus, the index.idx took so long over 24 hours and was still running not completed yet.

I attempted to use instead the kallisto index through the ENSEMBL cdna.all file as those for bulk-seq, I found out that the output from “kb count” gave same size for spliced.mhx and unspliced.mhx, which I am suspicious, probably problems with the wrong index file.

I also attempted to use “-n 8” according to one of the kallisto bus tutorial to split the index files into 8 pieces, however, my version of “kb ref” does not recognize “-n”.

Many Thanks!

Jing Jing Liu

jingliu1700 avatar Aug 01 '22 17:08 jingliu1700

The command looks correct. Running the nucleus index takes a lot of RAM, so your home computer may not have enough memory.

Yenaled avatar Aug 01 '22 18:08 Yenaled

ok I see, I will then switch to hpc cluster, thanks!

jingliu1700 avatar Aug 01 '22 18:08 jingliu1700