sanskrit_parser issues

Finding the most likely split using scoring

10

Opening a new issue to discuss scoring approaches to find the most likely split, such as those raised in https://github.com/kmadathil/sanskrit_parser/issues/84#issuecomment-393878940, https://github.com/kmadathil/sanskrit_parser/issues/84#issuecomment-393893866 Regarding @drdhaval2785's comment - Lexeme frequency based scoring is...

avinashvarna

Create tests for Vakya Analyzer

17

Locate the DCS10K and DCS4K datasets mentioned in [this paper](https://www.aclweb.org/anthology/D18-1276.pdf). Also, look at the [larger dataset mentioned in this later paper. ](https://www.aclweb.org/anthology/W17-2214.pdf) From these, create a set of testcases for...

kmadathil

Add some examples on options in UI

2

The UI page contains no information on what the three different options mean, or what output to expect. It would be good to add a few examples on how to...

avinashvarna

help wanted

good first issue

Add Sandhikosh to testing

11

Recent publications use the [sandhikosh](https://github.com/sanskrit-sandhi/SandhiKosh) described in [this paper](https://www.aclweb.org/anthology/L18-1712/) as a benchmark. Let's add it to our testing and see where we stand. (Related to #84)

avinashvarna

Input Encoding choices

4

SOLVED THIS ISSUE by using Devanagari instead of DEVANAGARI. Have other problems, though. (Abhijit) # Option 1 from sanskrit_parser.base.sanskrit_base import SanskritObject, DEVANAGARI # Option 2 parser = Parser(input_encoding="DEVANAGARI", output_encoding=output_encoding, replace_ending_visarga='s')...

am4096

Help message is uninformative

3

Side effect of the new api - need to make the right help message come up too ``` $ scripts/sanskrit_parser vakya --help unable to import 'smart_open.gcs', disabling that module usage:...

kmadathil

help wanted

good first issue

Use word frequencies for trimming split graphs?

15

DCS word frequencies have been publicly available for a while now - [here](https://github.com/sanskrit-coders/stardict-sanskrit/tree/master/sa-kAvya/dcs-frequency) and also on [couchdb](https://github.com/sanskrit-coders/dict-api) . You might find it useful to pare down possible sandhi-splits etc..

vvasuki

कविका तु खलीनोऽस्त्री कविकं कर्षणीत्यपि

9

कविका तु खलीनोऽस्त्री कविकं कर्षणीत्यपि ।_Split --- Please enter your issue below --- नह्येकमपि समीचीनः सन्धिच्छेदो लभ्यते। @vvasuki-ना प्रेषितः सन्देशः खलीन इति शब्दोस्माकं कोशेषु नैव दृश्यते । तस्मादस्य वाक्यस्य विच्छेदोप्यसम्यक्कृतः...

kmadathil

Revamp tests

1. Update pass/fail list generators 2. Change test names to reflect new names 3. Add Vakya Analyzer tests (supercedes #46)

kmadathil

help wanted

Revamp SanskritObject/SanskritString class hierarchy

3

> Simplify the usage of SanskritxxxString. I think the user of the library should only need to use/know about one of them, say SanskritObject, which handles normalization, etc. and the...

kmadathil

sanskrit_parser
sanskrit_parser copied to clipboard

Metadata

Finding the most likely split using scoring

Create tests for Vakya Analyzer

Add some examples on options in UI

Add Sandhikosh to testing

Input Encoding choices

Help message is uninformative

Use word frequencies for trimming split graphs?

कविका तु खलीनोऽस्त्री कविकं कर्षणीत्यपि

Revamp tests

Revamp SanskritObject/SanskritString class hierarchy

← Metadata

Owner

Metadata

sanskrit_parser sanskrit_parser copied to clipboard

Metadata

← Metadata

Owner

Metadata

sanskrit_parser
sanskrit_parser copied to clipboard