sanskrit_parser icon indicating copy to clipboard operation
sanskrit_parser copied to clipboard

Create tests for Vakya Analyzer

Open kmadathil opened this issue 4 years ago • 17 comments

Locate the DCS10K and DCS4K datasets mentioned in this paper. Also, look at the larger dataset mentioned in this later paper.

From these, create a set of testcases for the vakya analyzer.

Also, figure out their actual definitions for Precision, Recall and F-Score

kmadathil avatar Feb 03 '21 01:02 kmadathil

An even better option may be the smaller 1300 sentence testset found in this even later paper. . Advantage, it is available on github

The paper author has provided another publication with a better description of this work

kmadathil avatar Feb 03 '21 01:02 kmadathil

Thanks for the investigation. I've been a bit busy but will try do some catching up this weekend, by reading up on the papers.

The KISS paper has some superficial similarities. I found the supplementary material helpful in understanding the methodology, but probably need to read it a few more times to completely understand it.

We should probably plan out a proper sequence of next steps (based on priorities). Once we are ready to discuss this step, it may be helpful to have a call.

avinashvarna avatar Feb 05 '21 16:02 avinashvarna

I have received the DCS10K and KISS datasets from Amrith Krishna. KISS has been committed into the DB. DCS10K will be added after I figure out how to (too many directories).

I have added basic test infrastructure and added a test_parser.py. I will close this after I get KISS tests working.

kmadathil avatar Feb 11 '21 22:02 kmadathil

https://zenodo.org/record/803508# is from 2017, so there has been a DCS update after it.

smaller 1300 sentence testset

There is this set of sentences that J. Huet trained on as well.

apte-verified.txt

gasyoun avatar Mar 09 '21 04:03 gasyoun

https://kmadathil.github.io/sanskrit_parser/ui/index.html?api_url_base=https://sanskrit-parser.appspot.com/ what did I do wrong? Nothing ever returned

gof

gasyoun avatar Mar 16 '21 15:03 gasyoun

Web service is closed, due to not many users. Readme was also updated recently to explicitly say so, as far as I remember. Not able to locate it now.

drdhaval2785 avatar Mar 16 '21 15:03 drdhaval2785

@drdhaval2785 Actually, we created a different web service on Google App Engine which is always enabled.

@gasyoun Thanks for reporting this issue. Looking at the logs, it does seem to be related to the parsing. I see logs of the form:

ERROR:sanskrit_parser.parser.datastructures:Partition 4: eva went to zero length!

@kmadathil can you please take a look to see if this works from the command line? I can also take a look, but probably in the weekend.

avinashvarna avatar Mar 16 '21 16:03 avinashvarna

@gasyoun Please try a different input. This is an error condition that somehow is hanging the API

kmadathil avatar Mar 16 '21 17:03 kmadathil

Actually, sorry. The log I was looking at was for a slightly shorter input than what was in the reported issue. It appears that this input is causing the parse to take > 30s (which is the time limit on App Engine), and the process gets killed. GAE instances are not super-high performance, so we may need further optimizations.

avinashvarna avatar Mar 16 '21 17:03 avinashvarna

It appears that this input is causing the parse to take > 30s

How many words can I input?

gasyoun avatar Mar 16 '21 22:03 gasyoun

I've sped this case up using on_the_fly constraint checking (explained in the Sphinx document). This case takes about 8 seconds on my computer

time python scripts/sanskrit_parser vakya "sA tu mahASvetAyA eva muKam avalokitavatI" --input SLP1  --min-cost --max-paths 10
...
real    0m8.508s
user    0m8.256s
sys     0m0.248s

@avinashvarna - thanks for the idea! Please update appspot to v0.2.3

kmadathil avatar Mar 20 '21 03:03 kmadathil

I updated, but the online version still times out for this input (runs in a container after all).

avinashvarna avatar Mar 21 '21 22:03 avinashvarna

I updated, but the online version still times out for this input (runs in a container after all).

So no way to test the scripts on the web, only locally?

gasyoun avatar Mar 23 '21 21:03 gasyoun

Please hold on while we update the web service. We are working through some deployment issue with the sped-up code. It should work for you after that.

kmadathil avatar Mar 23 '21 21:03 kmadathil

It should work for you after that.

Oh, ok, I can wait for a few hours anyway ))

gasyoun avatar Mar 23 '21 23:03 gasyoun

So no way to test the scripts on the web, only locally?

If you are comfortable with python notebooks, you can use Binder and modify this notebook for your input to test it out online.

avinashvarna avatar Mar 24 '21 03:03 avinashvarna

python notebooks, you can use Binder and modify this notebook

Would ask for a video intro, if possible, please.

gasyoun avatar Apr 01 '21 11:04 gasyoun