chicago-crimes icon indicating copy to clipboard operation
chicago-crimes copied to clipboard

Add .Net Interactive .ipynb CSV data loading example with C#

Open RandomFractals opened this issue 3 years ago • 4 comments

Use .Net Interactive Notebooks extension: https://marketplace.visualstudio.com/items?itemName=ms-dotnettools.dotnet-interactive-vscode

and Microsoft.Data.Analysis api: https://learn.microsoft.com/en-us/dotnet/api/microsoft.data.analysis.dataframe?view=ml-dotnet-preview

RandomFractals avatar Oct 28 '22 12:10 RandomFractals

Something is off while trying to load smaller 2022 crimes CSV data file with msft DataFrame:

chicago-crimes-dotnet-csv-read

RandomFractals avatar Oct 28 '22 12:10 RandomFractals

@colombod from .Net Interactive team suggested to try the latest preview version of .Net ML libs using:

#i "nuget:https://pkgs.dev.azure.com/dnceng/public/_packaging/MachineLearning/nuget/v3/index.json"
#r "nuget:Microsoft.Data.Analysis,0.20.0-preview.22514.1"

This is using a daily build that will be out soon for the Dataframe nuget.

Sample ml project notebook:

https://github.com/microsoft/dotnetconf-studentzone/blob/main/Using%20ML.NET%20for%20Machine%20Learning/WaterConsumptionMLproject.ipynb

RandomFractals avatar Nov 01 '22 10:11 RandomFractals

Updated .Net Interactive notebooks setup to use new Polyglot Notebooks ext.:

https://marketplace.visualstudio.com/items?itemName=ms-dotnettools.dotnet-interactive-vscode

Changed imports to ML .Net preview nugets listed above.

Still getting load CSV data error, even for the smaller 33Mb file:

crimes-dotnet-load-csv-error

RandomFractals avatar Nov 04 '22 20:11 RandomFractals

ML .net nuget is very beta and can't parse CSV with missing data fields yet.

Devs suggested to try 3rd party parquet library instead:

https://github.com/G-Research/ParquetSharp.DataFrame

RandomFractals avatar Nov 22 '22 18:11 RandomFractals