Deedle icon indicating copy to clipboard operation
Deedle copied to clipboard

Frame.ReadCsv with inferred types silently ignores small/scientific values

Open marklam opened this issue 9 years ago • 0 comments

If types are inferred from a CSV file with no small (scientific-format) values in the rows considered for the infer, then any cell with a scientific format entry is marked as a missing value. Take the following CSV (d:\temp\scientific.csv)

#,X,Y
Happy,7.058365954,5.754336636
Grumpy,6.62148607,9.51E-05

And this code to read Grumpy's Y value (the inferTypes=true case is limited to 1 line to make the required sample CSV shorter, but imagine it's not limited but the 'bad' value is on line 500)

open Deedle

let (p : float) = Frame.ReadCsv(@"D:\temp\scientific.csv", hasHeaders=true, inferTypes=false)
                  |> Frame.indexRowsString("#")
                  |> Frame.getCol "Y"
                  |> Series.get "Grumpy"
printfn "Without inferTypes : %f" p

let (q : float) = Frame.ReadCsv(@"D:\temp\scientific.csv", hasHeaders=true, inferTypes=true, inferRows = 1)
                  |> Frame.indexRowsString("#")
                  |> Frame.getCol "Y"
                  |> Series.get "Grumpy"
printfn "With inferTypes : %f" q

You'll get the following output:

Without inferTypes : 0.000095
Deedle.MissingValueException: Value at the key Grumpy is missing
   at Deedle.Series`2.Get(K key) in c:\Tomas\Public\bmc\Deedle\src\Deedle\Series.fs:line 311
>    at Deedle.SeriesModule.Get[K,T](K key, Series`2 series) in c:\Tomas\Public\bmc\Deedle\src\Deedle\SeriesModule.fs:line 275
   at <StartupCode$FSI_0002>.$FSI_0002.main@() 

marklam avatar Jan 26 '16 11:01 marklam