ntextcat How to use your library?

Could you give a small example of using your library?

win 7x64 vs - 2017

Installed "ntextcat" through "nuget" I need to determine the language of the text that is entered in "textBox2.Text". Result output in "textBox1.Text" It is supposed to enter the text: European languages, languages with hieroglyphs (Chinese, Japanese) and others

Found sample code. I get a string error var identifier = factory.Load("NTextCat 0.2.1.1\\LanguageModels\\Core14.profile.xml");

cod

using NTextCat;

namespace rsh
{
    public partial class Form2 : Form
    {
        public Form2()
        {
            InitializeComponent();
        }

        private void button1_Click(object sender, EventArgs e)
        {
            var factory = new RankedLanguageIdentifierFactory();
            var identifier = factory.Load("NTextCat 0.2.1.1\\LanguageModels\\Core14.profile.xml");
            var languages = identifier.Identify(textBox2.Text);
            var mostCertainLanguage = languages.FirstOrDefault();

            textBox1.Text = mostCertainLanguage.Item1.Iso639_3;
        }
    }
}

How to solve the problem?

2018-10-14_18-48-10

Oct 14 '18 15:10 it19862

How to detect unsupported languages text as unknown, not to another language. for example "Aţi văzut ce moacă a făcut?" is Romanian, but NTextCat detects it as English.

Apr 06 '20 17:04 mohammad-khoddami

I don't understand the problem from the description. If your code works correctly, then identifier would contain the language code (for example, eng for English). Perhaps you get an error and could post its screenshot?

May 06 '20 16:05 ivanakcheurov

@mohammad-khoddami , you can assess how confident NTextCat is with the language tag.

var factory = new RankedLanguageIdentifierFactory();
var identifier = factory.Load("Core14.profile.xml");
var languages = identifier.Identify("some text");
var mostCertainLanguage = languages.FirstOrDefault();

var languageCode = mostCertainLanguage.Item1.Iso639_3;
var confidenceLevel = mostCertainLanguage.Item2;

May 06 '20 16:05 ivanakcheurov

How is the confidence level measured? I get values like 3495.569 for a long Spanish text that is detected properly

But I get values like 3924.144 for text in Czech which is incorrectly detected as English

Nechť již hříšné saxofony ďáblů rozezvučí síň úděsnými tóny waltzu, tanga a quickstepu.

or 3928.28 for text in Bulgarian which is incorrectly detected as Russian

Ах чудна българска земьо, полюшвай цъфтящи жита.

I suppose the models are not too accurate?

I've tried with Wiki82.profile.xml and Wiki280.profile.xml and I get better results with Wiki82.profile.xml because with Wiki280.profile.xml the texts are often detected as aa.

One thing I've noticed is that the detected language ISO code is not correct. With Core14.profile.xml I get 3 digits code properly in mostCertainLanguage.Item1.Iso639_3 but when using Wiki82.profile.xml or Wiki280.profile.xml I get 2 letter code there (which is incorrect).

Apr 12 '22 10:04 diegosasw

@ivanakcheurov

Hello, thank you very much for your work.

May I ask about the profiles as well?

As was asked above, what the weight numbers mean? As I understood the closer they to 4000 the less accurate they are, but what is the point after which we can consider them as accurate? > 3700, > 3500?
I'm using wiki82.profile.xml, and sometimes I'm getting "simple" or "new" language as a result from pure english text. What do they mean?

Oct 23 '23 11:10 andreyka26-git

I suppose this library is abandoned. Any luck @andreyka26-git ?

Nov 29 '23 15:11 diegosasw

ntextcat ntextcat copied to clipboard

How to use your library?

ntextcat
ntextcat copied to clipboard