r4ds icon indicating copy to clipboard operation
r4ds copied to clipboard

`challenge.csv` now reads without problems

Open AnttiRask opened this issue 2 years ago • 2 comments

This is what the code in the chapter looks like:

`challenge <- read_csv(readr_example("challenge.csv"))`
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   x = col_double(),
#>   y = col_logical()
#> )
#> Warning: 1000 parsing failures.
#>  row col           expected     actual                                                           file
#> 1001   y 1/0/T/F/TRUE/FALSE 2015-01-16 '/Users/runner/work/_temp/Library/readr/extdata/challenge.csv'
#> 1002   y 1/0/T/F/TRUE/FALSE 2018-05-18 '/Users/runner/work/_temp/Library/readr/extdata/challenge.csv'
#> 1003   y 1/0/T/F/TRUE/FALSE 2015-09-05 '/Users/runner/work/_temp/Library/readr/extdata/challenge.csv'
#> 1004   y 1/0/T/F/TRUE/FALSE 2012-11-28 '/Users/runner/work/_temp/Library/readr/extdata/challenge.csv'
#> 1005   y 1/0/T/F/TRUE/FALSE 2020-01-13 '/Users/runner/work/_temp/Library/readr/extdata/challenge.csv'
#> .... ... .................. .......... ..............................................................
#> See problems(...) for more details.

This is what it looks like when I do the same in R (readr version 2.1.2):

`challenge <- read_csv(readr_example("challenge.csv"))`
#> Rows: 2000 Columns: 2                                                                                         
#> -- Column specification ---------------------------------------------------
#> Delimiter: ","
#> dbl  (1): x
#> date (1): y
#> 
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.

So it seems as if the problem mentioned in the chapter isn't a problem anymore. Which is a good problem, but still confusing to the reader.

Also, I tried to find mention of this change in the readr Changelog, but I couldn't find any.

ps. This is my first GitHub issue that I submit, so please forgive me if I didn't do this right. Thank you!

AnttiRask avatar Apr 10 '22 10:04 AnttiRask

A good problem to have indeed, thanks!

@hadley We should decide whether the solution is to remove the text around challenges from the text or to replace the challenge dataset in readr. (Full disclosure: Leaving this note here without chasing down relevant changes in readr, a decision may have already been made there.)

mine-cetinkaya-rundel avatar May 09 '22 18:05 mine-cetinkaya-rundel

I think we'll need to do a bit of both — as of readr 2.0.0, read_csv() etc now use 1000 rows spread throughout the dataset, rather than just the first 1000 rows, so you're much less likely to see this sort of problem. But obviously we'll still want to want to introduce problems somehow.

hadley avatar May 11 '22 02:05 hadley

Now fixed 😄

hadley avatar Nov 18 '22 22:11 hadley