natural icon indicating copy to clipboard operation
natural copied to clipboard

Set language

Open ran-j opened this issue 5 years ago • 9 comments

hello.

Instead of doing:

Portugese | X |   |   | PorterStemmerPt
Russian | X |   |   | PorterStemmerRu
Swedish | X |   |   | PorterStemmerSv

Can you create a config ? , like:

natural.Language('PT-BR');

and the just use just

'strings'.stem()

ran-j avatar Aug 31 '18 20:08 ran-j

I'm not sure if I understand what you mean. At the moment stemmers for different languages are separate functions. Is it that you can set the language of natural and after that use stemmers like that? So if I say natural.setLanguage('Pt') that all modules like stemmer and tokenizer and sentimenter are set to Portugese?

Interesting idea. We need to improve the language system of natural, so maybe this is a way to go.

Hugo-ter-Doest avatar Sep 01 '18 08:09 Hugo-ter-Doest

This is secondary, but the timing is perfect to ask the change in the documentation: "Portugese" is misspelling of "Portuguese".

(And yes, a general language configuration is a good idea.)

PauloQuerido avatar Sep 01 '18 11:09 PauloQuerido

I'm saying to use one function to all languages and set the language that you wanna use example:

//set the PT language
natural.Language('PT-BR');

//and the use the function:
'strings'.stem()

To set the language that I want to use dynamic.

//set the PT language
natural.Language('PT-BR');

//and the use the function:
'strings'.stem()

//set the EN language
natural.Language('EN-ES');

//and the use the function:
'strings'.stem()

ran-j avatar Sep 01 '18 20:09 ran-j

It seems that you are using ISO language codes. PT-BR is Brazilian Portuguese. But what is EN-ES, or do you mean EN-US?

Hugo-ter-Doest avatar Sep 02 '18 14:09 Hugo-ter-Doest

Yes kkk I mean EN-US was just a example.

Did you get the idea ?

ran-j avatar Sep 02 '18 15:09 ran-j

Yeah I get the idea. It will take quite some refactoring of modules that support multiple languages. Plus I'm afraid we have to introduce a global setting that can be seen by all modules. Something like:

function setLanguage(l) {
  global.language = l;
}

Also, a default language must be set in natural's index file.

Hugo-ter-Doest avatar Sep 02 '18 19:09 Hugo-ter-Doest

yes like that, The code will get more organized and easy to change the language. Im sking that becouse Im creating a chatbot and that will be helpfull for me

ran-j avatar Sep 02 '18 22:09 ran-j

I did some work on runtime language support. Please have a look at this branch. There is a section at the top of the README about language support and I refactored the Porter stemmer. Als some tests were added for the config module and the generic Porter stemmer.

Hugo-ter-Doest avatar Sep 11 '18 20:09 Hugo-ter-Doest

Yes, like that and how can I help to improve PT language functions ? (Sentiment,Stemmer,Tokenizer)

ran-j avatar Sep 11 '18 21:09 ran-j