chromap icon indicating copy to clipboard operation
chromap copied to clipboard

chromosome length

Open pxxiao-hz opened this issue 3 years ago • 9 comments

Hello.

Is there any limit to the length of chromosomes in chromap?

I'm working on an contact map for genomes with chromosomes greater than 2GB, and some other alignment tools seem to have length limitations. Then I found chromap, so I want to know if it has a limit.

Best wishes.

pxxiao-hz avatar Apr 22 '22 01:04 pxxiao-hz

What is the size of your genome? It is known from previous issue #9 that when your genome is huge (e.g., 17GB), it could fail. In most cases, it should work.

haowenz avatar Apr 22 '22 02:04 haowenz

The total genome size is about ~10G, and two chromsomes length greater than 2G. bwa and bowtie2 all failed, because of this two long chromosomes.

pxxiao-hz avatar Apr 22 '22 02:04 pxxiao-hz

It should work if the max chromosome length is less than 4GB. You may give it a try.

haowenz avatar Apr 22 '22 04:04 haowenz

Ok, thanks. I will have a try.

pxxiao-hz avatar Apr 22 '22 06:04 pxxiao-hz

Unfortunately. It fails. Build index for the reference. Kmer length: 27, window size: 14 Reference file: ref.fasta Output file: ./index Loaded all sequences successfully in 3.18s, number of sequences: 0, number of bases: 0. Collected 0 minimizers. Sorted minimizers. chromap: src/index.cc:31: void chromap::Index::Construct(uint32_t, const chromap::SequenceBatch&): Assertion num_minimizers != 0 && num_minimizers <= 0x7fffffff' failed. /public/home/mzhliu/.lsbatch/1650634248.54319695: line 8: 400412 Aborted (core dumped) chromap -i -r ref.fasta -o ./index -k 27 -w 14

And this is head ref.fasta.fai Chr1 2378143669 6 2378143669 2378143670 Chr2 2277712715 2378143682 2277712715 2277712716 Chr3 2135057119 4655856404 2135057119 2135057120 Chr4 1869254315 6790913530 1869254315 1869254316 Chr5 1839510873 8660167852 1839510873 1839510874 Scaffold1 31492 10499678737 31492 31493 Scaffold2 122381 10499710241 122381 122382 Scaffold3 188277 10499832634 188277 188278 Scaffold4 1022717 10500020923 1022717 1022718 Scaffold5 606318 10501043652 606318 606319

pxxiao-hz avatar Apr 22 '22 13:04 pxxiao-hz

Loaded all sequences successfully in 3.18s, number of sequences: 0, number of bases: 0.

This is weird. It looks like your ref file is empty. Is there a way for me to get the ref and test?

And which Chromap version were you using? Can you try build an index for a small test genome provided at https://github.com/haowenz/chromap/blob/master/test/ref.fa?

haowenz avatar Apr 22 '22 14:04 haowenz

Chromap version is 0.2.2-r388.

I split my genome, and then index them separately. The two long chromosome was failed, and other chromsome was successful. So, I think chromap is OK. I'm not familiar with C, uint32_t may lead to this problem.

The ref is too big to transfer. Would you like to simulate a genome with single chromsome length ~2.5G.

Best wishes!

pxxiao-hz avatar Apr 22 '22 16:04 pxxiao-hz

Thank you. I will try a genome >2GB. Can you post your error message for those two genomes here?

The max of uint32_t is actually around 4GB. That's why I assumed it could work.

haowenz avatar Apr 22 '22 20:04 haowenz

Yes. The error message of two long chromsome was the same: Build index for the reference. Kmer length: 17, window size: 7 Reference file: out1 Output file: index1 Loaded all sequences successfully in 4.54s, number of sequences: 0, number of bases: 0. Collected 0 minimizers. Sorted minimizers. chromap: src/index.cc:31: void chromap::Index::Construct(uint32_t, const chromap::SequenceBatch&): Assertion num_minimizers != 0 && num_minimizers <= 0x7fffffff' failed. Aborted (core dumped)

I aslo changed the -w and -k, they were also failed.

pxxiao-hz avatar Apr 23 '22 05:04 pxxiao-hz