talisman icon indicating copy to clipboard operation
talisman copied to clipboard

stemmer/fr: restored & completed carry.js from original publication

Open drzraf opened this issue 7 years ago • 2 comments

raw from the PDF

  • step 3 was completed
  • tweaks were automatically removed Maybe another way to add them in a non-confusing way could be found? (custom "talisman" step?)

drzraf avatar Oct 15 '17 04:10 drzraf

Hello @drzraf. Thanks for your PR. Love you script to convert from the PDF to the rules :).

If I remember correctly, I think I avoided the STEP3 rules after some ones because I thought (probably wrongly it seems) that they were an erroneous repetition of some earlier rules. For instance (m > 0) issaient ε is also in STEP1 but I never thought this could be useful to re-run them.

Did it fix your issue with the word tristesse by the way? Can you add some unit tests to reflect the new cases taken into account please (it also seems that some test cases are now broken)?

Concerning my edits, I will add another version side by side which will be called revisited or whatever.

Thanks

Yomguithereal avatar Oct 15 '17 10:10 Yomguithereal

According to an answer of the author of the algorithm, it is to be expected from any desuffixation algorithm. As is Porter, these are part of the expected edge-cases. Other examples he gave: "perte", "mort", "éléments", "order", ...

drzraf avatar Oct 16 '17 02:10 drzraf