sanskrit_parser
sanskrit_parser copied to clipboard
Parsers for Sanskrit / संस्कृतम्
Opening a new issue to discuss scoring approaches to find the most likely split, such as those raised in https://github.com/kmadathil/sanskrit_parser/issues/84#issuecomment-393878940, https://github.com/kmadathil/sanskrit_parser/issues/84#issuecomment-393893866 Regarding @drdhaval2785's comment - Lexeme frequency based scoring is...
Locate the DCS10K and DCS4K datasets mentioned in [this paper](https://www.aclweb.org/anthology/D18-1276.pdf). Also, look at the [larger dataset mentioned in this later paper. ](https://www.aclweb.org/anthology/W17-2214.pdf) From these, create a set of testcases for...
The UI page contains no information on what the three different options mean, or what output to expect. It would be good to add a few examples on how to...
Recent publications use the [sandhikosh](https://github.com/sanskrit-sandhi/SandhiKosh) described in [this paper](https://www.aclweb.org/anthology/L18-1712/) as a benchmark. Let's add it to our testing and see where we stand. (Related to #84)
SOLVED THIS ISSUE by using Devanagari instead of DEVANAGARI. Have other problems, though. (Abhijit) # Option 1 from sanskrit_parser.base.sanskrit_base import SanskritObject, DEVANAGARI # Option 2 parser = Parser(input_encoding="DEVANAGARI", output_encoding=output_encoding, replace_ending_visarga='s')...
Side effect of the new api - need to make the right help message come up too ``` $ scripts/sanskrit_parser vakya --help unable to import 'smart_open.gcs', disabling that module usage:...
DCS word frequencies have been publicly available for a while now - [here](https://github.com/sanskrit-coders/stardict-sanskrit/tree/master/sa-kAvya/dcs-frequency) and also on [couchdb](https://github.com/sanskrit-coders/dict-api) . You might find it useful to pare down possible sandhi-splits etc..
कविका तु खलीनोऽस्त्री कविकं कर्षणीत्यपि ।_Split --- Please enter your issue below --- नह्येकमपि समीचीनः सन्धिच्छेदो लभ्यते। @vvasuki-ना प्रेषितः सन्देशः खलीन इति शब्दोस्माकं कोशेषु नैव दृश्यते । तस्मादस्य वाक्यस्य विच्छेदोप्यसम्यक्कृतः...
1. Update pass/fail list generators 2. Change test names to reflect new names 3. Add Vakya Analyzer tests (supercedes #46)
> Simplify the usage of SanskritxxxString. I think the user of the library should only need to use/know about one of them, say SanskritObject, which handles normalization, etc. and the...