Scribe-Data icon indicating copy to clipboard operation
Scribe-Data copied to clipboard

Edit and expand Arabic data processes

Open andrewtavis opened this issue 1 year ago • 8 comments
trafficstars

Terms

Description

This issue would check and expand the queries and related data processes found in the scribe_data/extract_transform/languages/Arabic directory. It would be great to start with expanding the queries found in the nouns and verbs directory, and then from there we can discuss a formatting process :) The query_nouns.sparql and query_verbs.sparql files can be expanded on using the similar queries found for other languages. The formatting process should wait until the formatting process is expanded to focus on individual lexemes.

Contribution

Happy to support someone who has interest in working on this! 😊

andrewtavis avatar Mar 20 '24 22:03 andrewtavis

CC @mrbazzan who had interest in this! Please write in the issue and I'll assign! Also let me know if there are any questions :)

andrewtavis avatar Mar 20 '24 22:03 andrewtavis

I would like to start with this one.

mrbazzan avatar Mar 23 '24 13:03 mrbazzan

Fantastic, @mrbazzan! Let me know if there's anything we can do to help :)

andrewtavis avatar Mar 23 '24 14:03 andrewtavis

All I can think of right now is to copy some of the optional queries from German's query_nouns.sparql to Arabic's own but I don't really understand what that does or what's going on.

Is there anything else to be considered?

mrbazzan avatar Mar 23 '24 14:03 mrbazzan

Look into some of the Arabic lexemes and see if there are other properties or statements that could be added. There might not be though, so feel free to send along the basic files converted from German to Arabic!

andrewtavis avatar Mar 23 '24 15:03 andrewtavis

Look into some of the Arabic lexemes and see if there are other properties or statements that could be added.

What do you mean? I tried a couple of queries and they contain a lot of duplicate data.

mrbazzan avatar Mar 27 '24 10:03 mrbazzan

Duplicate how, @mrbazzan? So take the query below for Arabic verbs:

# All Arabic (Q13955) verbs.
# Enter this query at https://query.wikidata.org/.

SELECT
  ?lexeme
  (REPLACE(STR(?lexeme), "http://www.wikidata.org/entity/", "") as ?lexemeID)
  ?verb

WHERE {
  ?lexeme a ontolex:LexicalEntry ;
    dct:language wd:Q13955 ;
    wikibase:lexicalCategory wd:Q24905 ;
    wikibase:lemma ?verb .
}

From there you can check out a Lexeme like this one that also has statements for verb conjugations. Could you expand the query by referencing other verbs queries for other languages to then also get the conjugations for the verbs?

andrewtavis avatar Mar 27 '24 11:03 andrewtavis

Hello, @andrewtavis Sorry for the late reply (Holidays). I'll submit a draft PR just to show what I've been doing

mrbazzan avatar Apr 03 '24 14:04 mrbazzan

Closing this as the current state of the Arabic queries are in quite good share after #127 and other changes. Thanks for this, @mrbazzan!

andrewtavis avatar Jun 04 '24 21:06 andrewtavis