datatable
datatable copied to clipboard
Need an option to skip bad lines, like pandas' on_bad_line='skip'
I was dealing with some log files of about 78G size.
Even using fill=True, I got errors:

IOError: Too many fields on line 104485: expected 20 but more are present.
It's just because there's 1 bad line with one more column. If I set columns=list(range(21)), fill=True I got another error: ValueError: Input contains 20 columns, whereas columns parameter specifies only 21 columns
These log files are generated everyday, so I can't edit it before throwing to datatable.
So, I hope an option like " error='skip' " or so.
Environment: Python 3.9.6 x64, datatable 1.0.0, vscode 1.58.2
Now I found a way to avoid this error, using ripgrep-all as a filter: .fread(cmd=r'rga "^([^\t]\t){19}[^\t]$" c:\Users\Acer\Desktop\20210817pc'))
However, it slows down the reading procedure...