ntCard
ntCard copied to clipboard
Integer overflow on large dataset
Hi there!
I am running into what seems to be essentially the same issue as #48 and #73, where my output looks like this
F1 14753176821204
F0 9223372036854775808
1 9223372036854775808
2 9223372036854775808
3 9223372036854775808
4 9223372036854775808
5 9223372036854775808
...
using
ntcard -k `seq -s, 16 31` -p counts -t 48 @ref_cat_fasta_list.txt
This is on a rather large dataset though (~10TB), so it does not seem to be coming from too little sequences. It only occurs for k >= 22. I'm using ntCard 1.2.2 installed via conda.
Please let me know how to fix this, or if that is a bug in ntCard that needs fixing first.
Cheers and thanks Lucas
For added context, here is what the histograms look like:
It seems to me that the upwards trend of the lines for larger values of k seems to be related to the numerical issue here, right? They should not go up like that, should they?