readr icon indicating copy to clipboard operation
readr copied to clipboard

read_delim parsing issue with compressed file

Open rvalieris opened this issue 11 months ago • 0 comments

Trying to read the attached file with read_delim results in the following error:

Attached file: f.log.gz

r$> a = read_delim('f.log.gz', delim=' | ',col_names=F,col_types='cccc')
Warning message:
One or more parsing issues, call `problems()` on your data frame for details, e.g.:
  dat <- vroom(...)
  problems(dat)

r$> problems(a)
# A tibble: 1 × 5
    row   col expected  actual    file
  <int> <int> <chr>     <chr>     <chr>
1  1494     3 4 columns 3 columns ""

r$> a[1494,]
# A tibble: 1 × 4
  X1    X2              X3                                        X4
  <chr> <chr>           <chr>                                     <chr>
1 15:26 07 | 三特東喰赤 いのけん(+33) まあぷ(-10) 陸奥陽之助(-23)       NA

However, the indicated line does have 4 columns (note the | on X2 column), and if I uncompress the file before calling read_delim it parses it fine.

I was not able to reduce the file further than this and still reproduce the issue, so it seems the issue is not related to that specific line.

Env info: Linux R 4.3.3 readr 2.1.5

rvalieris avatar Mar 23 '24 14:03 rvalieris