WiktionaryParser icon indicating copy to clipboard operation
WiktionaryParser copied to clipboard

Supporting German as base language

Open hjorthjort opened this issue 5 years ago • 12 comments

I want to use this project, but I would like to use German wiktionary. I intend to fork off this project and make the required adaptions. Is there any interest in merging the result back via a PR? It would require some structural changes, but adding more languages later might be easier.

hjorthjort avatar Jan 13 '20 19:01 hjorthjort

Sure, a PR to support German would be great! You can fetch results in your local language from the English Wiktionary though

suyashb95 avatar Jan 16 '20 15:01 suyashb95

I'm aware I can get definitions in English of words in other languages. The problem is that the English version of Wiktionary has much fewer German words than the German version, and I also think there is value in using the language your learning FOR learning, ones you reach that level of maturity, which is why I think being able to use different languages versions is nice.

What I'm learning is that German Wiktionary structures it's content much differently from English Wiktionary, so I think I will need to reinvent the wheel. Will make a PR when I'm done!

Suyash Behera [email protected] schrieb am Do., 16. Jan. 2020, 4:18 PM:

Sure, a PR to support German would be great! You can fetch results in your local language from the English Wiktionary though

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Suyash458/WiktionaryParser/issues/55?email_source=notifications&email_token=ACBGJJ6BLXZ5QHBPG5VOM4DQ6B3GHA5CNFSM4KGH4PQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJENUJA#issuecomment-575199780, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACBGJJYWJ5NKJNI7IE7A4UDQ6B3GHANCNFSM4KGH4PQA .

hjorthjort avatar Jan 16 '20 15:01 hjorthjort

Yeah I'd wrongly assumed that the page structures for different wikis would be somewhat similar. Good luck with the PR! Let me know if I can help in any way.

suyashb95 avatar Jan 16 '20 15:01 suyashb95

I'm aware I can get definitions in English of words in other languages. The problem is that the English version of Wiktionary has much fewer German words than the German version, and I also think there is value in using the language your learning FOR learning, ones you reach that level of maturity, which is why I think being able to use different languages versions is nice. What I'm learning is that German Wiktionary structures it's content much differently from English Wiktionary, so I think I will need to reinvent the wheel. Will make a PR when I'm done! Suyash Behera [email protected] schrieb am Do., 16. Jan. 2020, 4:18 PM: Sure, a PR to support German would be great! You can fetch results in your local language from the English Wiktionary though — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#55?email_source=notifications&email_token=ACBGJJ6BLXZ5QHBPG5VOM4DQ6B3GHA5CNFSM4KGH4PQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJENUJA#issuecomment-575199780>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACBGJJYWJ5NKJNI7IE7A4UDQ6B3GHANCNFSM4KGH4PQA .

Any update on this? I would be very interested in that code as well. Would not want to code it if someone else already did :D

felixvor avatar May 08 '20 11:05 felixvor

Started but got distracted. Don't have much. Code away!

Felix [email protected] schrieb am Fr., 8. Mai 2020, 1:48 PM:

I'm aware I can get definitions in English of words in other languages. The problem is that the English version of Wiktionary has much fewer German words than the German version, and I also think there is value in using the language your learning FOR learning, ones you reach that level of maturity, which is why I think being able to use different languages versions is nice. What I'm learning is that German Wiktionary structures it's content much differently from English Wiktionary, so I think I will need to reinvent the wheel. Will make a PR when I'm done! Suyash Behera [email protected] schrieb am Do., 16. Jan. 2020, 4:18 PM: … <#m_7723713051758412527_> Sure, a PR to support German would be great! You can fetch results in your local language from the English Wiktionary though — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#55 https://github.com/Suyash458/WiktionaryParser/issues/55?email_source=notifications&email_token=ACBGJJ6BLXZ5QHBPG5VOM4DQ6B3GHA5CNFSM4KGH4PQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJENUJA#issuecomment-575199780>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACBGJJYWJ5NKJNI7IE7A4UDQ6B3GHANCNFSM4KGH4PQA .

Any update on this? I would be very interested in that code as well. Would not want to code it if someone else already did :D

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Suyash458/WiktionaryParser/issues/55#issuecomment-625777585, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACBGJJ5AQBNP2GDSXRAJY7DRQPWP5ANCNFSM4KGH4PQA .

hjorthjort avatar May 08 '20 12:05 hjorthjort

@DieseKartoffel I haven't started working on this, feel free to go ahead!

suyashb95 avatar May 16 '20 10:05 suyashb95

Just noticed this recently... I've been doing some work on a local fork to support other languages, though I'm only interested in definitions. Basically the original code assumes there's a Table of Contents (TOC) and parses the page data from that. German words don't always have that. So basically I manually create one by checking the nested headers in the page and looking for ones that match the language code and then the part of speech.

https://github.com/rroessler1/WiktionaryParser/commit/ae8fb901e87caea947db718999bf8146c050a0fd

Though I think it'd be cleaner to have a base parsing class and then override certain methods for different languages, but for now I'm taking the lazy approach.

I'm happy to clean it up a bit and submit a PR, but I think it would need a slightly larger design discussion of the best way to support multiple languages going forward, which to my knowledge hasn't happened yet.

rroessler1 avatar Jun 10 '20 04:06 rroessler1

@rroessler1 I'd initially made this project for use in a Telegram dictionary bot(used a different dictionary service instead) and didn't think of supporting other languages. It certainly needs design changes which I think should handle different types of pages instead of specific languages. There could be words in languages other than German that don't have a ToC for example. I was thinking of handling parsing in stages where the first stage tries to figure out the structure of the page from a ToC or by checking the nested headers if a ToC isn't found. For now we could add your changes to this stage and incrementally support more languages. What do you think?

suyashb95 avatar Jun 10 '20 06:06 suyashb95

Agreed, I like the idea of handling it in stages and keeping it language-agnostic if possible.

But I think eventually there will have to be language-specific code as well. For example in German the meaning, origins, synonyms are all listed under one header under <p> tags, so I think the parser would just have to search for and extract the text under "Bedeutungen:", which would be a German-specific bit of code. Can't think of any way around this at the moment, unless you pass the responsibility off to the client. (example: essen)

rroessler1 avatar Jun 13 '20 01:06 rroessler1

Is there any updates on this? Maybe a side branch or something?

johnnybigoode-zz avatar Jun 20 '21 23:06 johnnybigoode-zz

@johnnybigoode I haven't been working on supporting other languages but maybe @rroessler1 has a fork that works?

suyashb95 avatar Jun 23 '21 15:06 suyashb95

I have a fork that supports Spanish French and German.

https://github.com/rroessler1/WiktionaryParser

It definitely works, but I haven't looked at it in a year so I'm not sure if it's missing new updates.

I would be happy to try and get it merged into here but definitely don't have time until August at the earliest.

rroessler1 avatar Jun 24 '21 03:06 rroessler1