machinelearning Easyer way to create dynamic DataViews

Is your feature request related to a problem? Please describe. In my company we want add ML blocks to our arsenal (made with Blockly) with witch you could train and run models. I've read in the docs and some issues that the model must be defined beforehand declaring a Class with some Attributes. And apparently it's not easy to create a dynamic model. Our idea is to feed SQL DataSets to the Trainer.

Describe the solution you'd like As a user I would like to define the model based on the shape of the input data. For example, a SQL DataSet, a CSV etc. After that, each column metadata could be added programatically.

Describe alternatives you've considered Both seem overcomplicated to me: https://stackoverflow.com/questions/56761728/add-custom-column-to-idataview-in-ml-net https://stackoverflow.com/questions/66893993/ml-net-create-prediction-engine-using-dynamic-class/66913705#66913705

Additional context Experienced c# developer, new to ML.NET

Jul 31 '21 20:07 vgb1993

@luisquintanilla @briacht I have seen this come up several times now, so its obviously something people want. Not sure how it will fit with our priorities, but its definitely something we should look more into.

@vgb1993 could you give an example of what you would like to see/what you are thinking?

Aug 19 '21 21:08 michaelgsharp

Yes, I've seen this request quite a few times now too! I think this would be good to investigate

Aug 19 '21 22:08 briacht

Here's some raw brainstorming:

Create a sample showing how to build a dynamic model programatically with the existing tools.
Create a new FluentApi to replace (or complement rather) the Attribute configuration (like in EF Core). Expose it as a Nuget package and document it.
Reference the columns by name or by index instead of strongly typed property access.
Create a nuget that encapsulates the solutions exposed in stackoverflow links above.
Create a connector for simple cases, like SQL datasets and CSVs. The connector could figure out the input schema types on it's own based on the data.
Create a schema specification using json. Perhaps even the pipeline?

Any thoughts? Any preferences? Any draw backs? I'm not aware of the internal implementation of ML.Net so perhaps someone could have better ideas.

At the end of the day what we want is to create and run ML models at runtime. If we can define a dynamic model we can build a software to make it work. Wich ultimately makes ML.Net more accessible.

Aug 27 '21 03:08 vgb1993

Yes! This problem I encountered today.

For @vgb1993's points above,

1 - Yes 3 - Yes 5 - Yes, CSV for us

Others are maybe/sure.

Feb 15 '22 04:02 jberd126

Currently the easiest way to do a dynamic dataview is using the Microsoft.Data.Analysis.DataFrame because it can dynamically load in a text file and create the schema automatically and then use that in ML.NET. Take a look at this for an example.

That being said, some of the other approaches mentioned above are things we are considering, but don't have a timeline for them as of yet.

Feb 15 '22 18:02 michaelgsharp

@michaelgsharp, I do agree that a Microsoft.Data.Analysis.DataFrame may be a solution, the issue is that it cannot be streamed (the entire dataframe has to fit into memory to use it).

Are you aware of any partitioning work on dataframes such as the Python dask library for pandas?

Feb 16 '22 14:02 jberd126

Personally I am not but I haven't really looked into it much. @luisquintanilla are you aware of anything?

Yeah, the memory thing can be an issue. I'm not aware of a workaround for now, but this is something we are keeping in mind for future work.

Feb 18 '22 03:02 michaelgsharp

@michaelgsharp we have same issue with dynamic data loading (mostly from SQL db), and dynamicaly creating models for each labeling/value prediction scenario (I guess this is main problem, we need separate model for each scenario). Predict C from A, B Predict B from C, A Predict X from D, F Every form field combination which user could potentionaly want is new scenario, and need its own classes, model, and project.

Here ideas so far

Building classes, models using "dynamic"/runtime.
Reflections
Create c# code dynamicaly (stringbuilders etc), then compile it and get output model to db

I guess all of mentioned should work but all of them are ridiculous... Do you know any API update/ETA, for common model creation?

Aug 04 '22 10:08 rzechu

I was hoping this would have been added to 2.0.0. I'm currently using one huge class that contains all my potential features, loading my data from stored procedures and then using ML.Transforms.DropColumns to remove the fields not found before training. It works, but it's far from ideal, and has become stifling. Has anyone found a better work around?

Nov 09 '22 23:11 danmcmyler

I am using only Microsoft.AutoML and I am able to do workaroud by load data dynamically with loading from SQL, and using input columns as input/labels,column names. I also use "sql placeholders like" SELECT '0' as InputOrLabelColumn /* for text */ SELECT CAST(0 as real) as InputOrLabelColumn /*for numbers */

And loading IDataView from normal SQL query (with mapping SQL columns to ML.NET data type columns for IDataView) SELECT Text, CAST(intcol as real) intcol, Result as FROM Table

For prediction just need to match datatypes (fake 1 row SQL SELECT statement) SELECT 'text', CAST(0 as real) intcol to got Result

Works for training AutoML API training and prediction. I never used casual Featurize/Fit/Transform methods yet

Nov 10 '22 08:11 rzechu

Not sure how many people struggled with this but i managed to solve one of the RunTime type problems without having to use json or csv or textfiles as my DataView, thus a RunTime dynamic DataView of a live IEnumerable of objects. In my case I already have defined classes but I have hundred of them, they are all used in the same way but having to create a training method etc seperatley and not being able to just implement a common interface was the difference between essentially rewriting my whole program each time a new class was added, or just implementing an interface and using the new class the same as all the existing classes. With this each time I hard code a new type I just implement an interface an it can train using the same method as my existing types and my columns labels etc are dynamically generated. Any common variables I dont want in my DataView I can put in my interface and use reflection to ignore by comparing properties to interface properies. https://github.com/BurnOutTrader/DynamicMLPipeline/tree/main [email protected]:BurnOutTrader/DynamicMLPipeline.git

A similar approach could probably be taken to just create objects from .csv files etc and apply schema properties dynamically but I think ML.Net has that built in now.

Nov 09 '23 05:11 BurnOutTrader

machinelearning machinelearning copied to clipboard

Easyer way to create dynamic DataViews

machinelearning
machinelearning copied to clipboard