wiktextract icon indicating copy to clipboard operation
wiktextract copied to clipboard

Unable to find all parts of speech for words

Open meetDeveloper opened this issue 2 years ago • 2 comments

I was browsing https://kaikki.org/dictionary/English/meaning/h/he/hello.html and I noticed we only have Interjection present whereas in Wiktionary https://en.wiktionary.org/wiki/hello we have noun and verb also present. Can you look into this? Thank you.

meetDeveloper avatar Feb 12 '22 04:02 meetDeveloper

Sorry I've been working mostly on other things for the last couple of weeks. This seems to be an error in wikitextprocessor, which incorrectly expands the translation "{{tt+|fr|!}}" for French under "is anyone there?". The problem disappears if I remove the trans-top or multitrans templates, so something to do with template nesting. I'm still working to find the root cause. I'm still working to actually fix it. (I've also been hunting down another issue in wikitextprocessor recently, and there is some possibility these are connected.)

tatuylonen avatar Feb 20 '22 23:02 tatuylonen

Found another example but this one looks little different not sure if the cause would be same or not.

For word real we have in Etymology 3, noun as parts of speech under which we have two different type of noun senses. I am unable to find following part of speech in dictionary here.

real (plural reais or reals)

  1. A unit of currency used in Brazil since 1994. Symbol: R$.
  2. A coin worth one real.

image

meetDeveloper avatar Mar 06 '22 04:03 meetDeveloper

@tatuylonen @kristian-clausal Is there any progress regarding this?

meetDeveloper avatar Sep 24 '22 12:09 meetDeveloper

@meetDeveloper Sorry, no. There's a bunch of fires going on elsewhere that Tatu is trying to figure out (issues that come up at the kaikki.org scale, so it's slow going), and I'm not qualified to look at this particular issue yet. It's on the list, it's a bug, but it's not at the top because template-parsing and nesting might have something to do with Lua, and we have a lot of issues with Lua in particular.

kristian-clausal avatar Sep 28 '22 06:09 kristian-clausal

Looking at the specific examples, the translation templates in the French section are borked because of an automated script 'bug' in https://en.wiktionary.org/w/index.php?title=hello&diff=54532137&oldid=54531846 which I don't even want to call a bug because it's a simple case of garbage-in-garbage-out.

* French: {{t|fr|eh oh}} ?, {{t|fr|oh eh}} ?, {{t+|fr|allô}} ?, {{t+|fr|ohé}} ! --> * French: {{t|fr|eh oh}}, {{t|fr|?}}, {{t|fr|oh eh}}, {{t|fr|?}}, {{t|fr|allô}}, {{t|fr|?}}, {{t|fr|ohé}}, {{t|fr|!}}

Obviously this needs to be corrected on wiktionary's side. This issue might or might not be duplicated, I'll try to see if I can find other examples of it in the dump. If the issue is that a single "!" breaks the template expansion (wouldn't be strange, currently we're handling stuff like {{!}} as a special case), then the fix might be just to find all the examples of the above kind of bot-script artifact and fix them on Wiktionary.

kristian-clausal avatar Oct 17 '22 07:10 kristian-clausal

There is a finite amount of templates that contain |!| or |!}} in them, but they all seem to be legit meaningful uses of ! as a parameter. The most common is simply {{en-noun|!}} which is just marking that the word has no attested plural. Did not find any {{t}}, {{tt}} or {{tt+}} examples similar to the one we had in this thread, except a Catalan example with a legitimate use in the article for interrobang (I think, I'm using the cache for grepping so context is lost easily).

A quick check on the article for daegeum seems to indicate that other templates with !-parameters don't seem to break down. If it is an issue that needs template nesting and some kind trigger like the !-parameter, this might be a really rare bug.

EDIT: The article for hello is now fixed, so should have sensible output after we process the next wiktionary dump.

kristian-clausal avatar Oct 17 '22 07:10 kristian-clausal

Separated the real-related post to its own issue.

kristian-clausal avatar Oct 17 '22 07:10 kristian-clausal

This seems like the original issue is now handled (at least "hello" is parsed correctly) and the reais-issue was separated (and then closed) in its own thread, so closing this.

kristian-clausal avatar Nov 29 '22 11:11 kristian-clausal