FSharp.Data
FSharp.Data copied to clipboard
Package does not work with #r nuget (requires ResolutionFolder=__SOURCE_DIRECTORY__)
I think this package is doing some custom stuff, because it simply doesn't work with #r "nuget"
.
Firstly, Load
simply does not work as advertised. Consider the following script:
#r "nuget: FSharp.Data"
open FSharp.Data
type Stocks = CsvProvider<"data/MSFT.csv">
let msft = Stocks.Load("data/MSFT.csv")
This fails at runtime because the location of data/MSFT.csv
is assumed not to be relative to the script's location, but the location of the temporary project file where the package is restored:
System.IO.DirectoryNotFoundException: Could not find a part of the path '/var/folders/jt/zl19fbpd387_btngqwry6c5h0000gn/T/nuget/5312--c70c53f0-8e79-44c4-be8e-9157262e6715/data/MSFT.csv'
Secondly, it actually doesn't work even when you've loaded it. At design-time you will get correct names for columns, but at runtime it has no idea what they are:
#r "nuget: FSharp.Data"
open FSharp.Data
type Stocks = CsvProvider<"data/MSFT.csv">
let location = __SOURCE_DIRECTORY__ + "/data/MSFT.csv"
let msft = Stocks.Load(location)
let firstRow = msft.Rows |> Seq.head
firstRow.``Adj Close``
This fails with the following:
/Users/phillip/scratch/test.fsx(23,10): error FS0039: The type 'Row' does not define the field, constructor or member 'Adj Close'.
Third, GetSample
does not work:
#r "nuget: FSharp.Data"
open FSharp.Data
type Stocks = CsvProvider<"data/MSFT.csv">
let msft = Stocks.GetSample()
msft.Rows |> Seq.head
In this case, it fails to find any rows at all, despite the data being passed as a static parameter to the provider.
I experienced the same issue for F# Interactive version 11.0.0.0 for F# 5.0
Code that worked fine in jupyter notebook fails when running in a fsx file on the above environment.
Secondly, it actually doesn't work even when you've loaded it. At design-time you will get correct names for columns, but at runtime it has no idea what they are..
I have also observed this, I can load the file from the local file system when passing in the full path, but it is only the first column with no header information.
Looking further into this, I can see the two problems are distinct from one another.
- Using
#r "nuget
seems to modify the default path which breaks using relative paths when running in FSI. DesignTime still seems okay. By design, if the runtime is FSI the provider uses the DefaultResolutionFolder, which appears to be hardcoded as an empty string for the CsvProvider.
member x.Resolve(uri:Uri) =
if uri.IsAbsoluteUri then
uri, isWeb uri
else
let root =
match x.ResolutionType with
| DesignTime -> if String.IsNullOrEmpty x.ResolutionFolder
then x.DefaultResolutionFolder
else x.ResolutionFolder
| RuntimeInFSI -> x.DefaultResolutionFolder
| Runtime -> AppDomain.CurrentDomain.BaseDirectory.TrimEnd('\\', '/')
Uri(Path.Combine(root, uri.OriginalString), UriKind.Absolute), false
- Using
#r "nuget
with an absolute file path, results in finding the file okay, but there is definitely another gremlin lurking causing the file to be read incorrectly.
I am sorry to report that this project never worked. See: https://github.com/fsharp/FSharp.Data/issues/1306. This has never been addressed. Given the importance of type providers for F# as a language this is I think a very bad thing.
So, I never use type providers but resort to old fashioned string splitting.
@cartermp @kevinransom I'm concerned that the #r "nuget: ..."
package referencing of package containing type providers is not setting TypeProvideConfig's ResolutionFolder correctly when actually instantiating the type providers. It should be being set to the folder of the script where the #r
is but it is likely being set to the temporary folder.
Setting ResolutionFolder works e.g.
#r "nuget: FSharp.Data"
open FSharp.Data
type Stocks = CsvProvider<"data/MSFT.csv", ResolutionFolder= __SOURCE_DIRECTORY__ >
let msft = Stocks.GetSample()
msft.Rows |> Seq.head
I'll take a look through the code in dotnet/fsharp to try to understand where the fix is and how we can get it under test
The F# fix is here: https://github.com/dotnet/fsharp/pull/10866
Until then you should set ResolutionFolder explicitly when using the FSharp.Data type providers with relative resources.
Am I correct that the ResolutionFolder workaround does not work for CsvProvider.Load?
> type TiingoCsv = CsvProvider<"../data-cache/tiingo-sample.csv",ResolutionFolder=__SOURCE_DIRECTORY__>
- TiingoCsv.GetSample().Rows |> Seq.take 2 |> Seq.iter (printfn "%A") // works
- TiingoCsv.Load("../data-cache/tiingo-sample.csv")
- ;;
(10/1/2020 12:00:00 AM, 9.77M, 10.25M, 9.69M, 10.09M, 4554055, 9.77M, 10.25M,
9.69M, 10.09M, 4554055, 0.0M, 1.0M)
(10/2/2020 12:00:00 AM, 9.39M, 9.78M, 9.3M, 9.38M, 4340484, 9.39M, 9.78M, 9.3M,
9.38M, 4340484, 0.0M, 1.0M)
System.IO.DirectoryNotFoundException: Could not find a part of the path 'C:\Users\nicho\AppData\Local\Temp\nuget\data-cache\tiingo-sample.csv'.
at System.IO.FileStream.ValidateFileHandle(SafeFileHandle fileHandle)
at System.IO.FileStream.CreateFileOpenHandle(FileMode mode, FileShare share, FileOptions options)
at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options)
at [email protected](Unit unitVar) in C:\GitHub\dsyme\FSharp.Data\src\CommonRuntime\IO.fs:line 219
at Microsoft.FSharp.Control.AsyncPrimitives.CallThenInvoke[T,TResult](AsyncActivation`1 ctxt, TResult result1, FSharpFunc`2 part2) in F:\workspace\_work\1\s\src\fsharp\FSharp.Core\async.fs:line 386
at Microsoft.FSharp.Control.Trampoline.Execute(FSharpFunc`2 firstAction) in F:\workspace\_work\1\s\src\fsharp\FSharp.Core\async.fs:line 123
--- End of stack trace from previous location ---
at Microsoft.FSharp.Control.AsyncResult`1.Commit() in F:\workspace\_work\1\s\src\fsharp\FSharp.Core\async.fs:line 337
at Microsoft.FSharp.Control.AsyncPrimitives.RunSynchronouslyInCurrentThread[a](CancellationToken cancellationToken, FSharpAsync`1 computation) in F:\workspace\_work\1\s\src\fsharp\FSharp.Core\async.fs:line 858
at Microsoft.FSharp.Control.AsyncPrimitives.RunSynchronously[T](CancellationToken cancellationToken, FSharpAsync`1 computation, FSharpOption`1 timeout) in F:\workspace\_work\1\s\src\fsharp\FSharp.Core\async.fs:line 878
at Microsoft.FSharp.Control.FSharpAsync.RunSynchronously[T](FSharpAsync`1 computation, FSharpOption`1 timeout, FSharpOption`1 cancellationToken) in F:\workspace\_work\1\s\src\fsharp\FSharp.Core\async.fs:line 1142
at <StartupCode$FSI_0025>.$FSI_0025.main@()
Stopped due to error
However, this works:
> TiingoCsv.Load(Directory.GetCurrentDirectory() + "/../data-cache/tiingo-sample.csv");;
val it : CsvProvider<...> =
FSharp.Data.Runtime.CsvFile`1[System.Tuple`8[System.DateTime,System.Decimal,System.Decimal,System.Decimal,System.Decimal,System.Int32,System.Decimal,System.Tuple`6[System.Decimal,System.Decimal,System.Decimal,System.Int32,System.Decimal,System.Decimal]]]
{Headers = Some
[|"date"; "close"; "high"; "low"; "open"; "volume";
"adjClose"; "adjHigh"; "adjLow"; "adjOpen"; "adjVolume";
"divCash"; "splitFactor"|];
NumberOfColumns = 13;
Quote = '"';
Rows = seq
[(10/1/2020 12:00:00 AM, 9.77M, 10.25M, 9.69M, 10.09M, 4554055,
9.77M, 10.25M, 9.69M, 10.09M, 4554055, 0.0M, 1.0M);
(10/2/2020 12:00:00 AM, 9.39M, 9.78M, 9.3M, 9.38M, 4340484,
9.39M, 9.78M, 9.3M, 9.38M, 4340484, 0.0M, 1.0M);
(10/5/2020 12:00:00 AM, 9.46M, 9.59M, 9.2502M, 9.44M, 2804969,
9.46M, 9.59M, 9.2502M, 9.44M, 2804969, 0.0M, 1.0M);
(10/6/2020 12:00:00 AM, 9.13M, 9.835M, 9.1M, 9.56M, 4535421,
9.13M, 9.835M, 9.1M, 9.56M, 4535421, 0.0M, 1.0M); ...];
Separators = ",";}
@nhirschey , I think that is a bug in the csv provider.
Am I correct that the ResolutionFolder workaround does not work for CsvProvider.Load?
That's correct, that's a runtime behaviour setting, you'll need to do what you did