Scribe-Data
Scribe-Data copied to clipboard
Expand Scribe-Data Hindi queries
Terms
- [X] I have searched open and closed feature requests
- [X] I agree to follow Scribe-Data's Code of Conduct
Description
This issue would expand the queries for Hindi that are found in src/scribe_data/language_data_extraction/Hindustani/Hindi. As of now the nouns query is likely fairly good, but we need to add verb conjugations to the verbs query as is done in other languages :)
Data types to include:
- [x] Nouns
- [ ] Verbs
- [x] Adjectives
- [x] Adverbs
- [ ] Prepositions
- [x] Emoji keywords
Contribution
Happy to support with this and answer any questions that come up! Also happy to review when it's time 😊
Hi @andrewtavis while I am working on this, do you think you can assign this issue to me so there's no confusion later?
Yes definitely, @KesharwaniArpita :) Thanks for your willingness to help!
Hi @andrewtavis ,I’ve raised a PR for expanding the Hindi verb extraction query. Initially, I was working with verb tenses, but after digging into the data on Wikidata through lexemeIDs, I realized that for Hindi, the available data focuses more on "कारक" (Kārak) forms (which express the relationships between words in a sentence) rather than tenses. I believe the same is going to be the case with other languages(Like Urdu and Bengali).
So, I shifted gears and built a modified query based on these Kārak forms—things like direct case, gerund, intransitive phase, and more. I’ve tested the updated query, and saved the results too.
Would love to get your thoughts on this approach and any suggestions you might have!
Also can I checkout the other languages?
Thanks so much for your hard work, @KesharwaniArpita! Do you want to but a note for this into the Urdu issue and check for Bengali as well. I'm pretty sure that Bengali verbs are modeled the way that they should be as the Wikidata Bengali community is very good 😊
By all means check out other languages as you already have!
We'll get to the review in the coming days :)
@andrewtavis Sure. Thank you!!!! Should I raise the issue for Bengali?
The Bengali verbs have already been checked a while ago, so maybe you can check that query and see if you'd change/expand it in any way :)
CC @mhmohona, who originally wrote the Bengali query :)
Sure!!! I'll look into that.
I want to participate in this issue too, can I do that? I am new to Sparql but could collaborate and contribute. Thanks! 😊
Hi @SethiShreya ! 😊 I'd love to collaborate. Even I'm new to SPARQL. Let's work together! Looking forward to your thoughts. Thanks! 🙌
Yeah that would be great, lets connect sometime to collaborate further
It would be helpful if you could tell me how much progress have done on this issue, and what are the features that are needed to be added
Till now, I have worked on verbs(enhanced the query here) and adjectives(created the query here) in hindi. You can check them out. I was just checking out the query_nouns.sparql , maybe we extend the query to include the gender too. As of now, it only includes the number.
Thanks for sharing, I will look into it @KesharwaniArpita
As discussed with @KesharwaniArpita, there are things that we can expand for the Hindi language: gender for nouns, Adjectives, Prepositions, Adverbs, etc. We have discussed collaborating, so I will be working on Gender for nouns and she on Adjectives. Is is correct? @andrewtavis
Sounds great, @SethiShreya! Thank you both for the collaboration and coordination!
@KesharwaniArpita I reviewed the files on Hindi nouns and gender is already done, right?
@andrewtavis I want to work on Punjabi language(an Indian language) query, can you please make an issue for that?
@SethiShreya ,You can try working on conjuctions or prepositions and there cases if you like to?
Just added a list of data types that we want to include to this issue :) Have marked those that are already done or have PRs open, and we can work on the others 😊 If the data type can't work, then we can move to the others and open up specific issues later :)
Ok so I've updated the issue :) Hindi and Urdu are now under Hindustani. We'll figure out how to run things soon, but I was realizing that things were getting duplicated negatively where queries from either would be slightly different or have info that the other didn't. Big thing here is we need to expand out the queries process to get all the verbs across a couple of files.
Discussions for now should be across #212 and #238 :)
Thanks all!
Checking this, looks like we're all done here until we expand the data types 😊 Thanks all!