machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

Bugs / ApplyWordEmbedding with custom path not working.

Open ErwanL08 opened this issue 1 year ago • 0 comments

System Information (please complete the following information):

  • OS & Version: [e.g. Windows 11]
  • ML.NET Version: [e.g. ML.NET v3.0.0]
  • .NET Version: [e.g. .NET 8.0]

Describe the bug I try to Embedded a list of sentences in French, the main goal is to generated a embedded dataset for after apply the cosine Similarity. The default FastTextWikipedia300D is the english wiki, so i download the french one from https://fasttext.cc/docs/en/pretrained-vectors.html (the wiki.fr.vec is in the output build directory and always copy). i try a lot of code but i cant figure why it s not working , i also try Issues 5532 . the generated output are always the same :

image

After some work i notice that if the wiki.en.vec is manually set in the folder "AppData\Local\mlnet-resources\WordVectors" it s working when i m using FastTextWikipedia300D .

So there is an issue when you manually set the full path location in ApplyWordEmbedding.

To Reproduce Steps to reproduce the behavior:

var cast = allDataEnumerable.Select(x => new TextData() { Text = x.TextCleaned }).ToList();
var dataView = mlContext.Data.LoadFromEnumerable(cast);

var pipeline = mlContext.Transforms.Text.NormalizeText("Text")
    .Append(mlContext.Transforms.Text.TokenizeIntoWords("Tokens", "Text"))
    .Append(mlContext.Transforms.Text.ApplyWordEmbedding("Features", @"c:/wiki.fr.vec", "Tokens"));


var transformer = pipeline.Fit(dataView);
var transformedData = transformer.Transform(dataView);


var predictionEngine = mlContext.Model.CreatePredictionEngine<TextData, TextFeatures>(transformer);

foreach (var item in allDataEnumerable)
{
    var prediction = predictionEngine.Predict(new TextData() { Text = item.TextCleaned});

    Console.WriteLine($"Number of Features: {prediction.Features.Length}");

    // Print the embedding vector.
    Console.Write("Features: ");
    foreach (var f in prediction.Features)
        Console.Write($"{f:F4} ");

    Console.WriteLine(); 
}
  public class TextFeatures 
  {
      [VectorType(300)]
      public float[] Features { get; set; }
  }

ErwanL08 avatar Dec 20 '23 22:12 ErwanL08