salmon
salmon copied to clipboard
Trying to make an index but twopaco_tmp doesn't change or grow, seems stuck in Round0, Salmon doesn't use swap
Hello,
I'm trying to create an index file with salmon (version 1.4.0) according to this tutorial here: https://combine-lab.github.io/alevin-tutorial/2019/selective-alignment/ (because I would like to find isoforms using isoformSwitchAnalyseR and the salmon files I already have didn't seem to work, and this tutorial was recommended in the documentation (https://salmon.readthedocs.io/en/latest/salmon.html ) for preparing transcriptome indices (mapping-based mode) )
Since my samples are from humans, I replaced the mouse-files with the (still) current Gencode files for human (v37), and everything seemed to work well until the last step:
salmon index -t gentrome.fa.gz -d decoys.txt -p 12 -i salmon_index --gencode
I left my machine alone for almost 8 h (no other programs, nothing to disturb it). When I finally had a look at it, the terminal was still in Round0, the system monitor still showed some action, and the system was still responsive (no freeze or anything). When I had a look at the index directory, there were no changes in any files after the sub-directory "twopaco_tmp" was created, and this directory only contains a file called bifurcations.bin, which was 0 byes (after 8h of computing time)
Therefore, I rebooted my system (if there went anything wrong that I couldn't see) and tried changing the parameters.
-
I changed the number of threads to -p 6 since my machine is rather old, and maybe -p 12 was too much
-
Since someone seemed to have a similar problem and would have tried changing the filter size next, I tried to change the filter size by adding --filterSize 2^39 (at the same time, I also added --keepDuplicates because I want to use the data to find differentially expressed isoforms later on)
salmon index -t gentrome.fa.gz -d decoys.txt -p 6 --keepDuplicates --filterSize 2^39 -i salmon_index --gencode
However, this didn't work and got killed.
I thought it might be due to the --filterSize argument and changed it to 39 (because maybe the 2^ is assumed automatically and it needs only the number after ^ )
And this got killed, too:
salmon index -t gentrome.fa.gz -d decoys.txt -p 6 --filterSize 39 -i salmon_index --gencode
[2021-04-02 08:34:22.664] [puff::index::jointLog] [info] Replaced 151,122,967 non-ATCG nucleotides [2021-04-02 08:34:22.670] [puff::index::jointLog] [info] Clipped poly-A tails from 1,833 transcripts wrote 233807 cleaned references Threads = 6 Vertex length = 31 Hash functions = 5 Filter size = 549755813888 Capacity = 2 Files: salmon_index/ref_k31_fixed.fa
Killed
After that, I stopped experimenting and used the original code with only 6 threads.
salmon index -t gentrome.fa.gz -d decoys.txt -p 6 -i salmon_index --gencode
This seemed to be working fine and got to Round 0 without any problems. It created the "twopaco_tmp" directory about 5 minutes after starting and created a "bifurcations.bin" inside this directory.
However, 4 hours have passed, and neither the file nor the directory have any new modifications (and the file is still a staggering 0 bytes large). I also read that this will take quite a lot of memory, but there should be some TB left on the hard drive. I created a 21 GiB swap partition which works fine (I've already seen it at 60%, so it is active and recognized and everything despite being on another physical hard drive (the 1. hard drive has a dual installation of win 10 and ubuntu 20.4 and the swap from the installation (2 GiB, because I thought that would be enough at the time and tried to save some space ^^) and the 2. hard drive has a swap partition of 20 GiB (with the highest priority of all swaps), which results in a total of 21.4 GiB in the System Monitor Resources Tab).
The system is still responding, and the System Monitor seems to be working fine (salmon is an active process using 12% CPU, 6.9 GiB memory, 13.5 GiB Disk read total, N/A Disk write total, Disk read 1.2 MiB/s (sometimes 1.1 or 1.3), Disk write N/A, Priority normal) and the Resources Tab shows that
- out of 8 available CPUs, one is working at 100% (the others 5% at max) and
- Memory 99.3% (which is only 7.7 GiB)
- swap 9.5% (which is only 2 GiB of 21.4 GiB available)
So here are (finally) my questions:
- How can I get Salmon to use the swap (I set swappiness to 100, and it still doesn't seem to care about the swap)
- Is it normal that there is nothing "visible" happening during Round0?
- How can I see or check if it is still working (and it might be worthwhile to wait longer)?
- Is what I'm doing correct to create an index file for running Salmon on human RNA-seq data that is supposed to "end up" in sleuth?
- Is there another way to create an index file that I can use as an index file to create input data for sleuth from RNA-seq data using Salmon (that might need less computational power)?
The last lines of the output are:
[2021-04-02 08:46:48.282] [puff::index::jointLog] [warning] Entry with header [ENST00000634174.1|ENSG00000282732.1|OTTHUMG00000191398.1|OTTHUMT00000487783.1|AC073539.7-201|AC073539.7|28|unprocessed_pseudogene|], had length less than equal to the k-mer length of 31 (perhaps after poly-A clipping)
[2021-04-02 08:48:32.206] [puff::index::jointLog] [warning] Removed 833 transcripts that were sequence duplicates of indexed transcripts.
[2021-04-02 08:48:32.207] [puff::index::jointLog] [warning] If you wish to retain duplicate transcripts, please use the --keepDuplicates
flag
[2021-04-02 08:48:33.535] [puff::index::jointLog] [info] Replaced 151,122,967 non-ATCG nucleotides
[2021-04-02 08:48:33.536] [puff::index::jointLog] [info] Clipped poly-A tails from 1,833 transcripts
wrote 233807 cleaned references
[2021-04-02 08:50:46.347] [puff::index::jointLog] [info] Filter size not provided; estimating from number of distinct k-mers
[2021-04-02 08:51:54.414] [puff::index::jointLog] [info] ntHll estimated 2628453213 distinct k-mers, setting filter size to 2^36
Threads = 6
Vertex length = 31
Hash functions = 5
Filter size = 68719476736
Capacity = 2
Files:
salmon_index/ref_k31_fixed.fa
Round 0, 0:68719476736 Pass Filling Filtering
Thank you very much and happy Easter holidays :)