ntCard icon indicating copy to clipboard operation
ntCard copied to clipboard

Integer overflow on large dataset

Open lczech opened this issue 4 months ago • 1 comments

Hi there!

I am running into what seems to be essentially the same issue as #48 and #73, where my output looks like this

F1	14753176821204
F0	9223372036854775808
1	9223372036854775808
2	9223372036854775808
3	9223372036854775808
4	9223372036854775808
5	9223372036854775808
...

using

ntcard -k `seq -s, 16 31` -p counts -t 48 @ref_cat_fasta_list.txt

This is on a rather large dataset though (~10TB), so it does not seem to be coming from too little sequences. It only occurs for k >= 22. I'm using ntCard 1.2.2 installed via conda.

Please let me know how to fix this, or if that is a bug in ntCard that needs fixing first.

Cheers and thanks Lucas

lczech avatar Aug 22 '25 17:08 lczech

For added context, here is what the histograms look like:

Image

It seems to me that the upwards trend of the lines for larger values of k seems to be related to the numerical issue here, right? They should not go up like that, should they?

lczech avatar Aug 22 '25 17:08 lczech