catalyst icon indicating copy to clipboard operation
catalyst copied to clipboard

Cannot process Chinese correctly

Open TomoakiChenSinica opened this issue 1 year ago • 0 comments

Language Which language(s) this issue relates to. Chinese

Describe the bug A clear and concise description of what the bug is. I cannot process chinese sentence correctly.

To Reproduce Steps to reproduce the behavior

  1. I ran a code like the code block in Screenshots.
  2. I got the result like:
{"Language":"zh","Length":5,"Value":"往前走五步","TokensData":[[{"Bounds":[0,4],"Tag":"PROPN"}]]}

Expected behavior A clear and concise description of what you expected to happen. Tokenize and tag correctly

Screenshots If applicable, add a code example to help explain your problem.

Here is my code:

Catalyst.Models.Chinese.Register(); //You need to pre-register each language (and install the respective NuGet Packages)

Storage.Current = new DiskStorage("catalyst-models");
var nlp = await Pipeline.ForAsync(Language.Chinese);
var doc = new Document("諸葛亮是三國時代著名軍師", Language.Chinese);
nlp.ProcessSingle(doc);
Console.WriteLine(doc.ToJson());   

Additional context Thank you for your help!

TomoakiChenSinica avatar Jul 18 '23 05:07 TomoakiChenSinica