datatable icon indicating copy to clipboard operation
datatable copied to clipboard

Need an option to skip bad lines, like pandas' on_bad_line='skip'

Open Binger-cn opened this issue 4 years ago • 1 comments

I was dealing with some log files of about 78G size. Even using fill=True, I got errors: image

IOError: Too many fields on line 104485: expected 20 but more are present.

It's just because there's 1 bad line with one more column. If I set columns=list(range(21)), fill=True I got another error: ValueError: Input contains 20 columns, whereas columns parameter specifies only 21 columns

These log files are generated everyday, so I can't edit it before throwing to datatable.

So, I hope an option like " error='skip' " or so.

Environment: Python 3.9.6 x64, datatable 1.0.0, vscode 1.58.2

Binger-cn avatar Jul 20 '21 03:07 Binger-cn

Now I found a way to avoid this error, using ripgrep-all as a filter: .fread(cmd=r'rga "^([^\t]\t){19}[^\t]$" c:\Users\Acer\Desktop\20210817pc'))

However, it slows down the reading procedure...

Binger-cn avatar Aug 19 '21 03:08 Binger-cn