cgmlst-dists
cgmlst-dists copied to clipboard
Problem with large input
we have investigated the issue with cgmls-dists in handling large input files (the error has been reported with 80k Lm samples) . The tool goes in segmentation fault. The bug is due to an incorrect memory allocation for the distance vector. The memory size is calculated as nrownrow which generates an Integer Overflow for a large nrow and using 32 bits (line 219 on the original version). The maximum value that can be stored in an int variable is 2147483647 (in our case, the final dist vector size might be 8000080000 = 6.400.000.000 > 2.147.483.647). This is due to the fact that the tool uses a vector and treats it as a matrix, which is a nice optimization.
We just imported the inttypes.h library to bypass the overflow using 64 bits. We have successfully tested on 80,000 samples and 1,748 loci.
We look forward to your feedback on this. Best Adriano