parsedatetime
parsedatetime copied to clipboard
Add the ability to: to have multiple spellings of the name of the month
In Russian language we have many forms of the name of the month (conjugate) For example:
25 сентября 2006 23:05 and 25 сентябрь 2006 23:05
Can you change variable self.Months from list to dict?:
self.Months = {
'1' : ['января', 'январь'],
'2' : ....
'12' : ...
}
hmm, that is an interesting change - we should be able to do that as these variables are what generate the regex...
There's another approach that possibly may help to solve such problems: using WordNet (English) / RussNet (Russian) to lemmatize words.
I could see WordNet being used for this as a secondary method similar to how ICU is used but not as the primary - adding 12+ megs to the package (that is the current size of WordNet for English) doesn't feel right.
Now if WordNet could be used to help build other parts of the regex constants and also start us down the path of getting a lexical graph... that would be awesome
@bear totally agreed. I'll consider WordNet as an opt-in function. So if you want better result, you'd better install nltk and WordNet.
Needed for v2 - see Issue #121