FSharp.Data icon indicating copy to clipboard operation
FSharp.Data copied to clipboard

Package does not work with #r nuget (requires ResolutionFolder=__SOURCE_DIRECTORY__)

Open cartermp opened this issue 3 years ago • 8 comments

I think this package is doing some custom stuff, because it simply doesn't work with #r "nuget".

Firstly, Load simply does not work as advertised. Consider the following script:

#r "nuget: FSharp.Data"

open FSharp.Data

type Stocks = CsvProvider<"data/MSFT.csv">
let msft = Stocks.Load("data/MSFT.csv")

This fails at runtime because the location of data/MSFT.csv is assumed not to be relative to the script's location, but the location of the temporary project file where the package is restored:

System.IO.DirectoryNotFoundException: Could not find a part of the path '/var/folders/jt/zl19fbpd387_btngqwry6c5h0000gn/T/nuget/5312--c70c53f0-8e79-44c4-be8e-9157262e6715/data/MSFT.csv'

Secondly, it actually doesn't work even when you've loaded it. At design-time you will get correct names for columns, but at runtime it has no idea what they are:

#r "nuget: FSharp.Data"

open FSharp.Data

type Stocks = CsvProvider<"data/MSFT.csv">

let location = __SOURCE_DIRECTORY__ + "/data/MSFT.csv"
let msft = Stocks.Load(location)

let firstRow = msft.Rows |> Seq.head

firstRow.``Adj Close``

This fails with the following:

/Users/phillip/scratch/test.fsx(23,10): error FS0039: The type 'Row' does not define the field, constructor or member 'Adj Close'.

Third, GetSample does not work:

#r "nuget: FSharp.Data"

open FSharp.Data

type Stocks = CsvProvider<"data/MSFT.csv">
let msft = Stocks.GetSample()

msft.Rows |> Seq.head

In this case, it fails to find any rows at all, despite the data being passed as a static parameter to the provider.

cartermp avatar Dec 29 '20 22:12 cartermp

I experienced the same issue for F# Interactive version 11.0.0.0 for F# 5.0

Code that worked fine in jupyter notebook fails when running in a fsx file on the above environment.

Secondly, it actually doesn't work even when you've loaded it. At design-time you will get correct names for columns, but at runtime it has no idea what they are..

I have also observed this, I can load the file from the local file system when passing in the full path, but it is only the first column with no header information.

auslavs avatar Jan 08 '21 00:01 auslavs

Looking further into this, I can see the two problems are distinct from one another.

  1. Using #r "nuget seems to modify the default path which breaks using relative paths when running in FSI. DesignTime still seems okay. By design, if the runtime is FSI the provider uses the DefaultResolutionFolder, which appears to be hardcoded as an empty string for the CsvProvider.
member x.Resolve(uri:Uri) =
  if uri.IsAbsoluteUri then
    uri, isWeb uri
  else
    let root =
      match x.ResolutionType with
      | DesignTime -> if String.IsNullOrEmpty x.ResolutionFolder
                      then x.DefaultResolutionFolder
                      else x.ResolutionFolder
      | RuntimeInFSI -> x.DefaultResolutionFolder
      | Runtime -> AppDomain.CurrentDomain.BaseDirectory.TrimEnd('\\', '/')
    Uri(Path.Combine(root, uri.OriginalString), UriKind.Absolute), false
  1. Using #r "nuget with an absolute file path, results in finding the file okay, but there is definitely another gremlin lurking causing the file to be read incorrectly.

auslavs avatar Jan 08 '21 10:01 auslavs

I am sorry to report that this project never worked. See: https://github.com/fsharp/FSharp.Data/issues/1306. This has never been addressed. Given the importance of type providers for F# as a language this is I think a very bad thing.

So, I never use type providers but resort to old fashioned string splitting.

halcwb avatar Jan 09 '21 09:01 halcwb

@cartermp @kevinransom I'm concerned that the #r "nuget: ..." package referencing of package containing type providers is not setting TypeProvideConfig's ResolutionFolder correctly when actually instantiating the type providers. It should be being set to the folder of the script where the #r is but it is likely being set to the temporary folder.

Setting ResolutionFolder works e.g.

#r "nuget: FSharp.Data"

open FSharp.Data

type Stocks = CsvProvider<"data/MSFT.csv", ResolutionFolder= __SOURCE_DIRECTORY__ >
let msft = Stocks.GetSample()

msft.Rows |> Seq.head

I'll take a look through the code in dotnet/fsharp to try to understand where the fix is and how we can get it under test

dsyme avatar Jan 13 '21 14:01 dsyme

The F# fix is here: https://github.com/dotnet/fsharp/pull/10866

Until then you should set ResolutionFolder explicitly when using the FSharp.Data type providers with relative resources.

dsyme avatar Jan 13 '21 17:01 dsyme

Am I correct that the ResolutionFolder workaround does not work for CsvProvider.Load?

> type TiingoCsv = CsvProvider<"../data-cache/tiingo-sample.csv",ResolutionFolder=__SOURCE_DIRECTORY__>
- TiingoCsv.GetSample().Rows |> Seq.take 2 |> Seq.iter (printfn "%A") // works
- TiingoCsv.Load("../data-cache/tiingo-sample.csv")
- ;;
(10/1/2020 12:00:00 AM, 9.77M, 10.25M, 9.69M, 10.09M, 4554055, 9.77M, 10.25M,
 9.69M, 10.09M, 4554055, 0.0M, 1.0M)
(10/2/2020 12:00:00 AM, 9.39M, 9.78M, 9.3M, 9.38M, 4340484, 9.39M, 9.78M, 9.3M,
 9.38M, 4340484, 0.0M, 1.0M)
System.IO.DirectoryNotFoundException: Could not find a part of the path 'C:\Users\nicho\AppData\Local\Temp\nuget\data-cache\tiingo-sample.csv'.
   at System.IO.FileStream.ValidateFileHandle(SafeFileHandle fileHandle)
   at System.IO.FileStream.CreateFileOpenHandle(FileMode mode, FileShare share, FileOptions options)
   at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options)
   at [email protected](Unit unitVar) in C:\GitHub\dsyme\FSharp.Data\src\CommonRuntime\IO.fs:line 219
   at Microsoft.FSharp.Control.AsyncPrimitives.CallThenInvoke[T,TResult](AsyncActivation`1 ctxt, TResult result1, FSharpFunc`2 part2) in F:\workspace\_work\1\s\src\fsharp\FSharp.Core\async.fs:line 386
   at Microsoft.FSharp.Control.Trampoline.Execute(FSharpFunc`2 firstAction) in F:\workspace\_work\1\s\src\fsharp\FSharp.Core\async.fs:line 123
--- End of stack trace from previous location ---
   at Microsoft.FSharp.Control.AsyncResult`1.Commit() in F:\workspace\_work\1\s\src\fsharp\FSharp.Core\async.fs:line 337
   at Microsoft.FSharp.Control.AsyncPrimitives.RunSynchronouslyInCurrentThread[a](CancellationToken cancellationToken, FSharpAsync`1 computation) in F:\workspace\_work\1\s\src\fsharp\FSharp.Core\async.fs:line 858
   at Microsoft.FSharp.Control.AsyncPrimitives.RunSynchronously[T](CancellationToken cancellationToken, FSharpAsync`1 computation, FSharpOption`1 timeout) in F:\workspace\_work\1\s\src\fsharp\FSharp.Core\async.fs:line 878
   at Microsoft.FSharp.Control.FSharpAsync.RunSynchronously[T](FSharpAsync`1 computation, FSharpOption`1 timeout, FSharpOption`1 cancellationToken) in F:\workspace\_work\1\s\src\fsharp\FSharp.Core\async.fs:line 1142
   at <StartupCode$FSI_0025>.$FSI_0025.main@()
Stopped due to error

However, this works:

> TiingoCsv.Load(Directory.GetCurrentDirectory() + "/../data-cache/tiingo-sample.csv");;
val it : CsvProvider<...> =
  FSharp.Data.Runtime.CsvFile`1[System.Tuple`8[System.DateTime,System.Decimal,System.Decimal,System.Decimal,System.Decimal,System.Int32,System.Decimal,System.Tuple`6[System.Decimal,System.Decimal,System.Decimal,System.Int32,System.Decimal,System.Decimal]]]
    {Headers = Some
                 [|"date"; "close"; "high"; "low"; "open"; "volume";
                   "adjClose"; "adjHigh"; "adjLow"; "adjOpen"; "adjVolume";
                   "divCash"; "splitFactor"|];
     NumberOfColumns = 13;
     Quote = '"';
     Rows = seq
              [(10/1/2020 12:00:00 AM, 9.77M, 10.25M, 9.69M, 10.09M, 4554055,
                9.77M, 10.25M, 9.69M, 10.09M, 4554055, 0.0M, 1.0M);
               (10/2/2020 12:00:00 AM, 9.39M, 9.78M, 9.3M, 9.38M, 4340484,
                9.39M, 9.78M, 9.3M, 9.38M, 4340484, 0.0M, 1.0M);
               (10/5/2020 12:00:00 AM, 9.46M, 9.59M, 9.2502M, 9.44M, 2804969,
                9.46M, 9.59M, 9.2502M, 9.44M, 2804969, 0.0M, 1.0M);
               (10/6/2020 12:00:00 AM, 9.13M, 9.835M, 9.1M, 9.56M, 4535421,
                9.13M, 9.835M, 9.1M, 9.56M, 4535421, 0.0M, 1.0M); ...];
     Separators = ",";}

nhirschey avatar Feb 03 '21 18:02 nhirschey

@nhirschey , I think that is a bug in the csv provider.

KevinRansom avatar Feb 03 '21 19:02 KevinRansom

Am I correct that the ResolutionFolder workaround does not work for CsvProvider.Load?

That's correct, that's a runtime behaviour setting, you'll need to do what you did

dsyme avatar Feb 04 '21 16:02 dsyme