pixy icon indicating copy to clipboard operation
pixy copied to clipboard

Code crashes when chromosome ID's are digit-only

Open jmoellmann opened this issue 3 months ago • 1 comments

When running pixy on datasets where chomosome IDs consists only of digits, and starting with zeros (e.g. ["0001", [...], "0016"]), the program will break when reading from the temp files, as the IDs get automatically converted to numerics, removing any trailing zeros (pd.read_csv, line 328, main.py), resulting in a KeyError on lines 363 and 370.

This is certainly very much an edge case, as most chromosome IDs will not be digit-only, but some tools output numeric-only chromosome IDs.

This bug is certainly irrespective of the pixy command and populations files used and the system architecture and it is very easy to reproduce.

A line from an exemplary VCF: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT X1 [...] XY 0001 1 . A . 100 . DP=[...] GT:[...]

I suggest the following fix at line 328, main.py:

< --- outpanel = pandas.read_csv(temp_file, sep='\t', header=None) ---- > outpanel = pandas.read_csv(temp_file, sep='\t', header=None, dtype = {3 : 'string'})

jmoellmann avatar Oct 31 '24 20:10 jmoellmann