Scribe-Data icon indicating copy to clipboard operation
Scribe-Data copied to clipboard

Expand Scribe-Data Hindi queries

Open andrewtavis opened this issue 1 year ago • 20 comments

Terms

Description

This issue would expand the queries for Hindi that are found in src/scribe_data/language_data_extraction/Hindustani/Hindi. As of now the nouns query is likely fairly good, but we need to add verb conjugations to the verbs query as is done in other languages :)

Data types to include:

  • [x] Nouns
  • [ ] Verbs
  • [x] Adjectives
  • [x] Adverbs
  • [ ] Prepositions
  • [x] Emoji keywords

Contribution

Happy to support with this and answer any questions that come up! Also happy to review when it's time 😊

andrewtavis avatar Oct 03 '24 08:10 andrewtavis

Hi @andrewtavis while I am working on this, do you think you can assign this issue to me so there's no confusion later?

KesharwaniArpita avatar Oct 03 '24 11:10 KesharwaniArpita

Yes definitely, @KesharwaniArpita :) Thanks for your willingness to help!

andrewtavis avatar Oct 03 '24 11:10 andrewtavis

Hi @andrewtavis ,I’ve raised a PR for expanding the Hindi verb extraction query. Initially, I was working with verb tenses, but after digging into the data on Wikidata through lexemeIDs, I realized that for Hindi, the available data focuses more on "कारक" (Kārak) forms (which express the relationships between words in a sentence) rather than tenses. I believe the same is going to be the case with other languages(Like Urdu and Bengali).

So, I shifted gears and built a modified query based on these Kārak forms—things like direct case, gerund, intransitive phase, and more. I’ve tested the updated query, and saved the results too.

Would love to get your thoughts on this approach and any suggestions you might have!

Also can I checkout the other languages?

KesharwaniArpita avatar Oct 04 '24 00:10 KesharwaniArpita

Thanks so much for your hard work, @KesharwaniArpita! Do you want to but a note for this into the Urdu issue and check for Bengali as well. I'm pretty sure that Bengali verbs are modeled the way that they should be as the Wikidata Bengali community is very good 😊

By all means check out other languages as you already have!

We'll get to the review in the coming days :)

andrewtavis avatar Oct 04 '24 08:10 andrewtavis

@andrewtavis Sure. Thank you!!!! Should I raise the issue for Bengali?

KesharwaniArpita avatar Oct 04 '24 08:10 KesharwaniArpita

The Bengali verbs have already been checked a while ago, so maybe you can check that query and see if you'd change/expand it in any way :)

CC @mhmohona, who originally wrote the Bengali query :)

andrewtavis avatar Oct 04 '24 08:10 andrewtavis

Sure!!! I'll look into that.

KesharwaniArpita avatar Oct 04 '24 08:10 KesharwaniArpita

I want to participate in this issue too, can I do that? I am new to Sparql but could collaborate and contribute. Thanks! 😊

SethiShreya avatar Oct 05 '24 11:10 SethiShreya

Hi @SethiShreya ! 😊 I'd love to collaborate. Even I'm new to SPARQL. Let's work together! Looking forward to your thoughts. Thanks! 🙌

KesharwaniArpita avatar Oct 05 '24 11:10 KesharwaniArpita

Yeah that would be great, lets connect sometime to collaborate further

SethiShreya avatar Oct 05 '24 11:10 SethiShreya

It would be helpful if you could tell me how much progress have done on this issue, and what are the features that are needed to be added

SethiShreya avatar Oct 05 '24 11:10 SethiShreya

Till now, I have worked on verbs(enhanced the query here) and adjectives(created the query here) in hindi. You can check them out. I was just checking out the query_nouns.sparql , maybe we extend the query to include the gender too. As of now, it only includes the number.

KesharwaniArpita avatar Oct 05 '24 11:10 KesharwaniArpita

Thanks for sharing, I will look into it @KesharwaniArpita

SethiShreya avatar Oct 05 '24 11:10 SethiShreya

As discussed with @KesharwaniArpita, there are things that we can expand for the Hindi language: gender for nouns, Adjectives, Prepositions, Adverbs, etc. We have discussed collaborating, so I will be working on Gender for nouns and she on Adjectives. Is is correct? @andrewtavis

SethiShreya avatar Oct 05 '24 16:10 SethiShreya

Sounds great, @SethiShreya! Thank you both for the collaboration and coordination!

andrewtavis avatar Oct 05 '24 18:10 andrewtavis

@KesharwaniArpita I reviewed the files on Hindi nouns and gender is already done, right?

SethiShreya avatar Oct 06 '24 14:10 SethiShreya

@andrewtavis I want to work on Punjabi language(an Indian language) query, can you please make an issue for that?

SethiShreya avatar Oct 06 '24 15:10 SethiShreya

@SethiShreya ,You can try working on conjuctions or prepositions and there cases if you like to?

KesharwaniArpita avatar Oct 06 '24 18:10 KesharwaniArpita

Just added a list of data types that we want to include to this issue :) Have marked those that are already done or have PRs open, and we can work on the others 😊 If the data type can't work, then we can move to the others and open up specific issues later :)

andrewtavis avatar Oct 09 '24 08:10 andrewtavis

Ok so I've updated the issue :) Hindi and Urdu are now under Hindustani. We'll figure out how to run things soon, but I was realizing that things were getting duplicated negatively where queries from either would be slightly different or have info that the other didn't. Big thing here is we need to expand out the queries process to get all the verbs across a couple of files.

Discussions for now should be across #212 and #238 :)

Thanks all!

andrewtavis avatar Oct 10 '24 21:10 andrewtavis

Checking this, looks like we're all done here until we expand the data types 😊 Thanks all!

andrewtavis avatar Oct 16 '24 12:10 andrewtavis