CSV.jl icon indicating copy to clipboard operation
CSV.jl copied to clipboard

skipto breaks if there is a quote in the skipped rows

Open enderw88 opened this issue 2 years ago • 3 comments

CSV.jl version 0.10.9

When there is a single " character in the rows being skipped the parse completely fails.

Using the file below, CSV.File("Ephemeris/199_horizons_results.txt"; header = false, footerskip=2, skipto = 14)

returns

0-element CSV.File

Note the single quote character on line 12 where the file is giving an angle in arc-seconds (11.0")

If you remove the quote, or add another quote the file parses as expected.

Example file (this is JPL Ephemeris data from https://ssd.jpl.nasa.gov/horizons/app.html#/:

START HERE********* Revised: April 12, 2021 Mercury 199 / 1

PHYSICAL DATA (updated 2021-Apr-12): Vol. Mean Radius (km) = 2440+-1 Density (g cm^-3) = 5.427 Mass x10^23 (kg) = 3.302 Volume (x10^10 km^3) = 6.085
Sidereal rot. period = 58.6463 d Sid. rot. rate (rad/s)= 0.00000124001 Mean solar day = 175.9421 d Core radius (km) = ~1600 Geometric Albedo = 0.106 Surface emissivity = 0.77+-0.06 GM (km^3/s^2) = 22031.86855 Equatorial radius, Re = 2440 km GM 1-sigma (km^3/s^2) = Mass ratio (Sun/plnt) = 6023682 Mom. of Inertia = 0.33 Equ. gravity m/s^2 = 3.701
Atmos. pressure (bar) = < 5x10^-15 Max. angular diam. = 11.0"
$$SOE 2460054.500000000, A.D. 2023-Apr-20 00:00:00.0000, -5.905515598394141E+07, 1.095406506579951E+04, 5.338424508122876E+06, -1.024822519185871E+01, -4.663625465158221E+01, -2.869521897808202E+00, 1.977900176172211E+02, 5.929595554933000E+07, 9.939648483777580E+00, 2460084.500000000, A.D. 2023-May-20 00:00:00.0000, 7.125054943081480E+05, -6.914571093648924E+07, -5.791097786304295E+06, 3.892607881523086E+01, 3.918031422263903E+00, -3.248684184801126E+00, 2.314649740053875E+02, 6.939145349798119E+07, -3.233345953068147E+00, 2460115.500000000, A.D. 2023-Jun-20 00:00:00.0000, 4.078941695378204E+07, 2.420438251063379E+07, -1.834172799806722E+06, -3.395318213887084E+01, 4.426155006862578E+01, 6.733053498281190E+00, 1.583285391126107E+02, 4.746570191211869E+07, -6.867202743261015E+00, 2460145.500000000, A.D. 2023-Jul-20 00:00:00.0000, -6.039733719901565E+07, -1.222642038283616E+07, 4.474262616182682E+06, -4.364750049603895E-01, -4.566244905792482E+01, -3.689867534588014E+00, 2.060914018993717E+02, 6.178464794807850E+07, 9.195500960923743E+00, $$EOE


enderw88 avatar Apr 20 '23 05:04 enderw88

Yeah, i think we should change this behaviour.

Discussed this with @quinnj and @drvi and we think skipping a header / footer in data should be quote-unaware.

skipto and footerskip are intended for skipping over arbitrary data (that's not necessarily comma-separated values) at the start/end of a file. And for this skipping we should just skip past the requested the number of newlines, and not do the clever lexing we do for actual CSV data where newlines in quoted (string) data aren't "real" newlines indicating a new row of data.

nickrobinson251 avatar Apr 20 '23 16:04 nickrobinson251

I would appreciate that. I took a look through the code to see if I could do that with a local repo and my Julia-fu is not strong enough. The docs don't make any mention of any processing lines before the skipto line or after the footerskip line. It took a while to figure exactly what was going on. This is probably a corner case I ran into.

enderw88 avatar Apr 21 '23 02:04 enderw88

Probably same as https://github.com/JuliaData/CSV.jl/issues/1012

aplavin avatar May 18 '23 09:05 aplavin