parsedatetime icon indicating copy to clipboard operation
parsedatetime copied to clipboard

Add the ability to: to have multiple spellings of the name of the month

Open axsapronov opened this issue 10 years ago • 5 comments

In Russian language we have many forms of the name of the month (conjugate) For example:

25 сентября 2006 23:05 and 25 сентябрь 2006 23:05

Can you change variable self.Months from list to dict?:

self.Months = {
'1' : ['января', 'январь'],
'2' : ....
'12' : ...
}

axsapronov avatar Sep 18 '15 05:09 axsapronov

hmm, that is an interesting change - we should be able to do that as these variables are what generate the regex...

bear avatar Sep 18 '15 16:09 bear

There's another approach that possibly may help to solve such problems: using WordNet (English) / RussNet (Russian) to lemmatize words.

philiptzou avatar Sep 18 '15 23:09 philiptzou

I could see WordNet being used for this as a secondary method similar to how ICU is used but not as the primary - adding 12+ megs to the package (that is the current size of WordNet for English) doesn't feel right.

Now if WordNet could be used to help build other parts of the regex constants and also start us down the path of getting a lexical graph... that would be awesome

bear avatar Sep 19 '15 18:09 bear

@bear totally agreed. I'll consider WordNet as an opt-in function. So if you want better result, you'd better install nltk and WordNet.

philiptzou avatar Sep 19 '15 20:09 philiptzou

Needed for v2 - see Issue #121

bear avatar Sep 21 '15 03:09 bear