sanskrit_parser
sanskrit_parser copied to clipboard
Create tests for Vakya Analyzer
Locate the DCS10K and DCS4K datasets mentioned in this paper. Also, look at the larger dataset mentioned in this later paper.
From these, create a set of testcases for the vakya analyzer.
Also, figure out their actual definitions for Precision, Recall and F-Score
An even better option may be the smaller 1300 sentence testset found in this even later paper. . Advantage, it is available on github
The paper author has provided another publication with a better description of this work
Thanks for the investigation. I've been a bit busy but will try do some catching up this weekend, by reading up on the papers.
The KISS paper has some superficial similarities. I found the supplementary material helpful in understanding the methodology, but probably need to read it a few more times to completely understand it.
We should probably plan out a proper sequence of next steps (based on priorities). Once we are ready to discuss this step, it may be helpful to have a call.
I have received the DCS10K and KISS datasets from Amrith Krishna. KISS has been committed into the DB. DCS10K will be added after I figure out how to (too many directories).
I have added basic test infrastructure and added a test_parser.py. I will close this after I get KISS tests working.
https://zenodo.org/record/803508# is from 2017, so there has been a DCS update after it.
smaller 1300 sentence testset
There is this set of sentences that J. Huet trained on as well.
https://kmadathil.github.io/sanskrit_parser/ui/index.html?api_url_base=https://sanskrit-parser.appspot.com/ what did I do wrong? Nothing ever returned
Web service is closed, due to not many users. Readme was also updated recently to explicitly say so, as far as I remember. Not able to locate it now.
@drdhaval2785 Actually, we created a different web service on Google App Engine which is always enabled.
@gasyoun Thanks for reporting this issue. Looking at the logs, it does seem to be related to the parsing. I see logs of the form:
ERROR:sanskrit_parser.parser.datastructures:Partition 4: eva went to zero length!
@kmadathil can you please take a look to see if this works from the command line? I can also take a look, but probably in the weekend.
@gasyoun Please try a different input. This is an error condition that somehow is hanging the API
Actually, sorry. The log I was looking at was for a slightly shorter input than what was in the reported issue. It appears that this input is causing the parse to take > 30s (which is the time limit on App Engine), and the process gets killed. GAE instances are not super-high performance, so we may need further optimizations.
It appears that this input is causing the parse to take > 30s
How many words can I input?
I've sped this case up using on_the_fly constraint checking (explained in the Sphinx document). This case takes about 8 seconds on my computer
time python scripts/sanskrit_parser vakya "sA tu mahASvetAyA eva muKam avalokitavatI" --input SLP1 --min-cost --max-paths 10
...
real 0m8.508s
user 0m8.256s
sys 0m0.248s
@avinashvarna - thanks for the idea! Please update appspot to v0.2.3
I updated, but the online version still times out for this input (runs in a container after all).
I updated, but the online version still times out for this input (runs in a container after all).
So no way to test the scripts on the web, only locally?
Please hold on while we update the web service. We are working through some deployment issue with the sped-up code. It should work for you after that.
It should work for you after that.
Oh, ok, I can wait for a few hours anyway ))
So no way to test the scripts on the web, only locally?
If you are comfortable with python notebooks, you can use Binder and modify this notebook for your input to test it out online.
python notebooks, you can use Binder and modify this notebook
Would ask for a video intro, if possible, please.