machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

DataFrame.LoadCsv should consider supporting csv files living on Azure Blob Storage

Open pgovind opened this issue 3 years ago • 1 comments

Somewhat related to https://github.com/dotnet/machinelearning/issues/5905 that I filed.

Just something to consider. I'm not sure if this is something the ML.NET/library team can/should do on their own (or with the Azure C# SDK folks), but it'd be nice integration to have. I can see a use case where there is a data pipeline that writes to blob storage, and the last step of the pipeline could be a regular job that reads in the csv file from blob storage and does some transforms/processing of the DataFrame.

pgovind avatar Jan 28 '22 07:01 pgovind

Right now the user can get a Stream using the Azure C# SDK and pass that stream into DataFrame.LoadCsv and it should work since we accept a stream input (though I haven't tested it directly). If we wanted to add support for blob store directly into ML.NET/DataFrame then we would have to deal with all the authentication as well, and that will drastically complicate things.

For now we think its best to let the end user get the stream however they want (blob/file/uri/etc), but we will add this to our backlog and if we get more requests for it we can consider adding it in the future.

michaelgsharp avatar Feb 01 '22 21:02 michaelgsharp