machinelearning
machinelearning copied to clipboard
DataFrame.LoadCsv should consider supporting csv files living on Azure Blob Storage
Somewhat related to https://github.com/dotnet/machinelearning/issues/5905 that I filed.
Just something to consider. I'm not sure if this is something the ML.NET/library team can/should do on their own (or with the Azure C# SDK folks), but it'd be nice integration to have. I can see a use case where there is a data pipeline that writes to blob storage, and the last step of the pipeline could be a regular job that reads in the csv file from blob storage and does some transforms/processing of the DataFrame.
Right now the user can get a Stream
using the Azure C# SDK and pass that stream into DataFrame.LoadCsv
and it should work since we accept a stream input (though I haven't tested it directly). If we wanted to add support for blob store directly into ML.NET/DataFrame then we would have to deal with all the authentication as well, and that will drastically complicate things.
For now we think its best to let the end user get the stream however they want (blob/file/uri/etc), but we will add this to our backlog and if we get more requests for it we can consider adding it in the future.