sanskrit_parser icon indicating copy to clipboard operation
sanskrit_parser copied to clipboard

Paninian Generator

Open kmadathil opened this issue 4 years ago • 8 comments

FYI - I have begun coding a Paninian generator. The goal is to implement the ashtadhyayi plus vartikas as needed. As of now, a basic skeleton that handles some pada-sandhi rules has been committed. Over time, I hope to add more rules, and move the process backward, eventually going through the following steps.

  1. Semantic tag input
  2. Prakriti + Pratyaya selection
  3. Prakriti + Pratyaya transformations
  4. Anga Transformation
  5. Samhita - intra pada
  6. Samhita - inter pada

Take a look at the generator branch - the sandhi.yaml file encodes the sutras I have so far, and process_yaml.py turns them into executable code. prakriya.py is the skeleton execution engine.

Run cd sanskrit_parser/generator ; python test.py to try it out.

kmadathil avatar Oct 03 '20 18:10 kmadathil

I think @drdhaval2785 has implemented similar generators. See https://github.com/drdhaval2785/SanskritVerb which I believe now has the older Subanta generation repo merged in. It includes a sandhi generator as well. Should we look at leveraging it before reimplementing?

avinashvarna avatar Oct 03 '20 19:10 avinashvarna

Would be happy to help.

drdhaval2785 avatar Oct 04 '20 07:10 drdhaval2785

Sure, we should. @drdhaval2785 - I had looked at this, and I remember we'd discussed this briefly as well. Is this completely in PHP, or is there a python version available? I remember you mentioning that this is a linear application of sutras based on the SK order - do I recollect it right? What would be the best way to leverage this?

kmadathil avatar Oct 05 '20 16:10 kmadathil

This is purely in PHP. No python version available. I do not have the time for converting it to Python. I will go through your code and let you know what bottlenecks I went through, so that you can make your designing decisions better. I regretted about some of my choices, but it was too late.

drdhaval2785 avatar Oct 06 '20 01:10 drdhaval2785

Thank you very much. It would be great if you could point to parts of your php that you think are best to reuse (I'm sure there are a lot). We can take up the conversion. The architecture I've tried to pick is classic Paninian, rather than SK based - so not a linear run of sutras.

On Mon, Oct 5, 2020 at 6:01 PM Dr. Dhaval Patel [email protected] wrote:

This is purely in PHP. No python version available. I do not have the time for converting it to Python. I will go through your code and let you know what bottlenecks I went through, so that you can make your designing decisions better. I regretted about some of my choices, but it was too late.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kmadathil/sanskrit_parser/issues/144#issuecomment-703968709, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACKEWNQZFCTBFDJLZ4PVQXDSJJT6XANCNFSM4SDFV4ZQ .

kmadathil avatar Oct 06 '20 16:10 kmadathil

Current status

  • YAML format for Sutras defined and parser implemented. This allows Sutras to be coded easily. This is way better than coding directly in Python, but I'm not 100% happy with the format yet
  • Implemented ~300 sutras.
  • Paninian Prakriya Engine implemented (with some current limitations, such as nitya/anitya tests)
  • Can generate prakriya for ajanta pum/strI/napum prAtipadikas.
  • Basic test suite added, with manual and pytest versions
    • pytest suite takes too much memory while the manual version (same underlying code) takes very little.

Eventually, this will allow us to replace the INRIA/Sanskrit_data databases with our own pada generator. Also, it will allow us to solve the overgeneration problem in the sandhi splitter by validating output splits with this generator.

kmadathil avatar Jan 03 '21 02:01 kmadathil

$ time python ../../scripts/sanskrit_generator -t rAma -p jas --verbose
unable to import 'smart_open.gcs', disabling that module
INFO     Inputs [rAma, as]
INFO     rAma ['prAtipadika', 'pum']
INFO     as ['pratyaya', 'svAdi', 'sup', 'jas', 'suw', 'bahuvacana', 'praTamA', 'viBakti']
INFO     End Inputs

Prakriya
Input ['rAma', 'as']
Root
Prakriya Node
0 Prakriya Start ['rAma', 'as'] 0-> ['rAma', 'as']
End
Child
Prakriya Node
1 1.1.43 : suqanapuMsakasya  ['rAma', 'as'] 0-> ['rAma', 'as']
Sutras that were tiggered but did not win
1.4.17 : svAdizvasarvanAmasTAne 
1.4.18 : yaci Bam 
1.4.13 : yasmAt pratyayaviDistadAdi pratyaye'Ngam 
End
Child
Prakriya Node
2 1.4.13 : yasmAt pratyayaviDistadAdi pratyaye'Ngam  ['rAma', 'as'] 0-> ['rAma', 'as']
End
Child
Prakriya Node
3 7.3.109: jasi ca  ['rAma', 'as'] 0-> ['rAma', 'as']
Sutras that were tiggered but did not win
6.1.97 : ato guRe 
6.1.102: praTamayoH pUrvasavarRaH 
6.1.101: akaH savarRe dIrGaH 
End
Child
Prakriya Node
4 6.1.102: praTamayoH pUrvasavarRaH  ['rAma', 'as'] 0-> ['rAma', 'as']
Sutras that were tiggered but did not win
6.1.97 : ato guRe 
6.1.101: akaH savarRe dIrGaH 
End
Child
Prakriya Node
5 6.1.101: akaH savarRe dIrGaH  ['rAma', 'as'] 0-> ['rAmA', 's']
End
Child
Prakriya Node
6 6.1.105.1: dIrGAjjasi ca  ['rAmA', 's'] 0-> ['rAmA', 's']
End
Leaf Node
Final Output [['rAmA', 's']] = ['rAmAs']


Output: ['rAmAs']

real    0m10.504s
user    0m10.268s
sys     0m0.232s

kmadathil avatar Jan 03 '21 02:01 kmadathil

replace the INRIA/Sanskrit_data databases with our own pada generator

Have you seen P. Scharf's code? Based on it such picture can be generated:

KVfpnPuQMCc

gasyoun avatar Apr 01 '21 12:04 gasyoun