sanskrit_parser icon indicating copy to clipboard operation
sanskrit_parser copied to clipboard

Not all rules work with spaces

Open kmadathil opened this issue 6 years ago • 1 comments

Eg: "ity api" doesn't work as of now.

We have two options

  1. add spaces (optional) into adeSa rules as in 7be2be230c9a08526e64cd9079c4d63e576ed7f5
  2. find a better way.

I think 2. is feasible. I think we can do this

  1. Find all spaces in the string, remember their positions
  2. Remove all spaces
  3. Build a list of forced break positions (where spaces would've been).
  4. While doing the recursive split, use this list to break at the right spots.

kmadathil avatar Nov 15 '18 02:11 kmadathil

Actually, a simple fix based on a minor change to sandhi.py (removing spaces while checking sandhi candidates), seems to fix things. Please review the PR

kmadathil avatar Nov 17 '18 00:11 kmadathil