chromap
chromap copied to clipboard
chromosome length
Hello.
Is there any limit to the length of chromosomes in chromap?
I'm working on an contact map for genomes with chromosomes greater than 2GB, and some other alignment tools seem to have length limitations. Then I found chromap, so I want to know if it has a limit.
Best wishes.
What is the size of your genome? It is known from previous issue #9 that when your genome is huge (e.g., 17GB), it could fail. In most cases, it should work.
The total genome size is about ~10G, and two chromsomes length greater than 2G.
bwa and bowtie2 all failed, because of this two long chromosomes.
It should work if the max chromosome length is less than 4GB. You may give it a try.
Ok, thanks. I will have a try.
Unfortunately. It fails.
Build index for the reference.
Kmer length: 27, window size: 14
Reference file: ref.fasta
Output file: ./index
Loaded all sequences successfully in 3.18s, number of sequences: 0, number of bases: 0.
Collected 0 minimizers.
Sorted minimizers.
chromap: src/index.cc:31: void chromap::Index::Construct(uint32_t, const chromap::SequenceBatch&): Assertion num_minimizers != 0 && num_minimizers <= 0x7fffffff' failed.
/public/home/mzhliu/.lsbatch/1650634248.54319695: line 8: 400412 Aborted (core dumped) chromap -i -r ref.fasta -o ./index -k 27 -w 14
And this is head ref.fasta.fai
Chr1 2378143669 6 2378143669 2378143670
Chr2 2277712715 2378143682 2277712715 2277712716
Chr3 2135057119 4655856404 2135057119 2135057120
Chr4 1869254315 6790913530 1869254315 1869254316
Chr5 1839510873 8660167852 1839510873 1839510874
Scaffold1 31492 10499678737 31492 31493
Scaffold2 122381 10499710241 122381 122382
Scaffold3 188277 10499832634 188277 188278
Scaffold4 1022717 10500020923 1022717 1022718
Scaffold5 606318 10501043652 606318 606319
Loaded all sequences successfully in 3.18s, number of sequences: 0, number of bases: 0.
This is weird. It looks like your ref file is empty. Is there a way for me to get the ref and test?
And which Chromap version were you using? Can you try build an index for a small test genome provided at https://github.com/haowenz/chromap/blob/master/test/ref.fa?
Chromap version is 0.2.2-r388.
I split my genome, and then index them separately. The two long chromosome was failed, and other chromsome was successful. So, I think chromap is OK.
I'm not familiar with C, uint32_t may lead to this problem.
The ref is too big to transfer. Would you like to simulate a genome with single chromsome length ~2.5G.
Best wishes!
Thank you. I will try a genome >2GB. Can you post your error message for those two genomes here?
The max of uint32_t is actually around 4GB. That's why I assumed it could work.
Yes. The error message of two long chromsome was the same:
Build index for the reference.
Kmer length: 17, window size: 7
Reference file: out1
Output file: index1
Loaded all sequences successfully in 4.54s, number of sequences: 0, number of bases: 0.
Collected 0 minimizers.
Sorted minimizers.
chromap: src/index.cc:31: void chromap::Index::Construct(uint32_t, const chromap::SequenceBatch&): Assertion num_minimizers != 0 && num_minimizers <= 0x7fffffff' failed.
Aborted (core dumped)
I aslo changed the -w and -k, they were also failed.