pixy
pixy copied to clipboard
Code crashes when chromosome ID's are digit-only
When running pixy on datasets where chomosome IDs consists only of digits, and starting with zeros (e.g. ["0001", [...], "0016"]), the program will break when reading from the temp files, as the IDs get automatically converted to numerics, removing any trailing zeros (pd.read_csv, line 328, main.py), resulting in a KeyError on lines 363 and 370.
This is certainly very much an edge case, as most chromosome IDs will not be digit-only, but some tools output numeric-only chromosome IDs.
This bug is certainly irrespective of the pixy command and populations files used and the system architecture and it is very easy to reproduce.
A line from an exemplary VCF: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT X1 [...] XY 0001 1 . A . 100 . DP=[...] GT:[...]
I suggest the following fix at line 328, main.py:
< --- outpanel = pandas.read_csv(temp_file, sep='\t', header=None) ---- > outpanel = pandas.read_csv(temp_file, sep='\t', header=None, dtype = {3 : 'string'})