sanskrit_parser
sanskrit_parser copied to clipboard
Not all rules work with spaces
Eg: "ity api" doesn't work as of now.
We have two options
- add spaces (optional) into adeSa rules as in 7be2be230c9a08526e64cd9079c4d63e576ed7f5
- find a better way.
I think 2. is feasible. I think we can do this
- Find all spaces in the string, remember their positions
- Remove all spaces
- Build a list of forced break positions (where spaces would've been).
- While doing the recursive split, use this list to break at the right spots.
Actually, a simple fix based on a minor change to sandhi.py (removing spaces while checking sandhi candidates), seems to fix things. Please review the PR