snps
snps copied to clipboard
Use Pandas nullable integer for position in for normalized snps dataframe
Along the same lines as #108 has it been considered to use the Pandas nullable integer datatype (pd.Int64Dtype()) for pos? More details here. We've seen a number of files that fail to parse because the position information for a small number of rows is missing (for example, on a RSID with multiple possible locations).
Interesting. Yes a nullable integer dtype would be good to handle these cases. But let's go with pd.UInt32Dtype(), which would minimize memory usage.