entity-fishing icon indicating copy to clipboard operation
entity-fishing copied to clipboard

Support for Swedish language

Open EmilStenstrom opened this issue 2 years ago • 6 comments

Hi!

In the list of languages I don't see Swedish. It's a small language, but has a very big wikipedia with ~2.5M articles. Can entity-fishing be trained on swedish, or is there some deeper reason that it's not included?

EmilStenstrom avatar Aug 15 '22 15:08 EmilStenstrom

Hi @EmilStenstrom !

Thank you for the request. Swedish should work well indeed given the size of its Wikipedia. I think it's the largest one not support by entity-fishing yet, with Dutch. It will try to include it in the next batch of supported languages.

kermitt2 avatar Aug 17 '22 12:08 kermitt2

That sounds awesome! Looking forward to testing it! :)

EmilStenstrom avatar Aug 18 '22 19:08 EmilStenstrom

Screenshot from 2023-01-20 18-44-34

kermitt2 avatar Jan 20 '23 17:01 kermitt2

Nice! Happy to see it disambiguate Swedish. Looking at that specific example, the things it mentions are not entities, but they are “concepts”. Translated: “year”, “consumption”, “health”. Is that intentional?

EmilStenstrom avatar Jan 21 '23 19:01 EmilStenstrom

Yes that's the goal, every Wikidata entities is disambiguated, based on the Wikipedia anchors - Wikidata calls "entities" the concepts and their instances. We can then refine based on the statements P279 and P31 to select what's wanted for a given task/application. Another one more:

Screenshot from 2023-01-21 21-16-59

kermitt2 avatar Jan 21 '23 20:01 kermitt2

Awesome! Using wikidata statements to select what you want is super powerful. Eager to try this out when 0.0.6 is released! :)

EmilStenstrom avatar Jan 21 '23 20:01 EmilStenstrom