machinelearning issues

Tokenizers Library Design

8

LLM tokenizers are a crucial component in Large Language Models (LLMs) like GPT-3 or BERT. They are responsible for the tokenization process, which involves breaking down natural language text into...

tarekgh

Tokenizers

api-approved

blocking

Added error handling, removed unwanted null check and enhanced readability

2

The most significant changes include the removal of the null check for "args" in the "Main" method, the addition of a "try-catch" block to handle exceptions during the execution of...

ravibaghel

community-contribution

Make ML.NET tests target net8.0 and net9.0 instead of net6.0

3

This is a portion of the work from https://github.com/dotnet/machinelearning/pull/6749 This moves the tests forward to net8.0, and cleans up RemoteExecutor (which I noticed in @tarekgh's PR). I minimized the product...

ericstj

Modify IDataView in AutoML Experiment After Transform and Before Evaluate

2

**Is your feature request related to a problem? Please describe.** No **Describe the solution you'd like** Add the option to modify the idataview (such as in preFeaturizer) but a "postFeaturizer"...

superichmann

enhancement

AutoML.NET

untriaged

Dll version of Microsoft.ML.OnnxRuntime.dll is 0.0.0.0

1

**System Information (please complete the following information):** - OS & Version: either Windows 10 or 11 - ML.NET Version: versions 1.7 and 3.0 - .NET Version: .netframework 4.7 **Describe the...

sportbilly21

untriaged

Accessing data by column after adding columns to a DataFrame returns error data

1

fix #7135 Describe the bug Accessing data by column after adding columns to a DataFrame returns error data ```` var df = DataFrame.LoadCsvFromString("a1,a2\n1,2\n3,4"); var dc0 = DataFrameColumn.Create("a0", new int[] {...

feiyun0112

community-contribution

[Tokenizers] Question regarding performance

Hi, thanks for the effort put into the Microsoft.ML.Tokenizers! I'm the author of the last performance improvements in `SharpToken` library. Since MLTokenizers are faster now than SharpToken I looked into...

r-Larch

question

untriaged

Tokenizers

Allow developers to supply their own function to infer column data types from data while loading CSVs

**Is your feature request related to a problem? Please describe.** Currently when you use `LoadCsv` or `LoadCsvFromString` without supplying data types for each column, the code will try to guess...

sevenzees

enhancement

untriaged

Handle null/empty values better for training and consumption

9

**System Information (please complete the following information):** - Model Builder Version: ml.net CLI 16.1.1 - Visual Studio Version 8.7.6 for macs **Describe the bug** Tried to change model input to...

bizbizzz

Get Loss During Training for Visualization (Learning Curve Graph)

**Is your feature request related to a problem? Please describe.** I need a way to visualize how my model is learning during training, which is a comparison between training loss...

AutumnEvans418

enhancement

untriaged

Evaluation

machinelearning
machinelearning copied to clipboard

Metadata

Tokenizers Library Design

Added error handling, removed unwanted null check and enhanced readability

Make ML.NET tests target net8.0 and net9.0 instead of net6.0

Modify IDataView in AutoML Experiment After Transform and Before Evaluate

Dll version of Microsoft.ML.OnnxRuntime.dll is 0.0.0.0

Accessing data by column after adding columns to a DataFrame returns error data

[Tokenizers] Question regarding performance

Allow developers to supply their own function to infer column data types from data while loading CSVs

Handle null/empty values better for training and consumption

Get Loss During Training for Visualization (Learning Curve Graph)

← Metadata

Owner

Metadata

machinelearning machinelearning copied to clipboard

Metadata

← Metadata

Owner

Metadata

machinelearning
machinelearning copied to clipboard