Scribe-Data
Scribe-Data copied to clipboard
Expand Urdu data queries
Terms
- [X] I have searched open and closed feature requests
- [X] I agree to follow Scribe-Data's Code of Conduct
Description
This issue would look into expanding the src/scribe_data/language_data_extraction/Hindustani/Urdu files with as much data as are possible from the current data on Wikidata. We can use code for getting data from other languages, and from there we can check Urdu data on Wikidata for what conjugations are available. We can then expand the query with optional selections of certain forms as is done in other SPARQL queries. The query can be tried on the Wikidata Query Service UI during development :)
Data types to include:
- [x] Nouns
- [ ] Verbs
- [x] Adjectives https://github.com/scribe-org/Scribe-Data/issues/242
- [x] Adverbs
- [ ] Prepositions
- [ ] Emoji keywords
- [x] Move Hindi and Urdu to Hindustani sub-directories
Contribution
Happy to support with any answers to questions as well as a PR review once one is opened with the changes 😊
Hi Andrew, Could this issue be assigned to me?
Assigned, @OkpePhillips! Please let us know if you need any assistance :)
@andrewtavis , @OkpePhillips Can I also join in?? I am working on hindi and I have noticed that the words for both the languages are same(the languages are pretty similar too). And as i can read a little bit of so I also have started for urdu. Would it be okay ?
Let's. make sure that @OkpePhillips can make the contribution, @KesharwaniArpita, as you already have a few issues, but feel free to support and discuss 😊
Yep definely. Actually when I started expanding queries for Hindi I realized that both Urdu and Hindi being Hindustani languages (and I having some basic knowledge of Urdu) had similar construct so their verbs were also classified in almost similar categories. If @OkpePhillips likes they can refer to hindi expansions which might help in understanding the construct.
Thank you @KesharwaniArpita, I will do that. I have been taking my time to understand the verb structure in Urdu.
could i try this issue as well?
There's already a PR up for this one, @hibaa03. There will be more issues made soon!
Just added a list of data types that we want to include to this issue :) Have marked those that are already done or have PRs open, and we can work on the others 😊 If the data type can't work, then we can move to the others and open up specific issues later :)
Feel free to work on prepositions or emoji keywords, @hibaa03 :)
Ok so I've updated the issue :) Hindi and Urdu are now under Hindustani. We'll figure out how to run things soon, but I was realizing that things were getting duplicated negatively where queries from either would be slightly different or have info that the other didn't. Big thing here is we need to expand out the queries process to get all the verbs across a couple of files.
Discussions for now should be across #212 and #238 :)
Thanks all!
I think we can call this all finished up for now 😊 Thanks all for the hard work here!