operations icon indicating copy to clipboard operation
operations copied to clipboard

Remove Wikibase extension from all OSM wikis

Open Firefishy opened this issue 3 years ago • 71 comments

The Wikibase extension causes us endless compatibility issues with Mediawiki version.

It is overly complex to install and requires specific knowledge to manage.

I propose we remove the extension and restore the wiki back to standard functionality as best as possible.

Firefishy avatar Oct 30 '22 03:10 Firefishy

I think the two main data consumers of the OSM wiki wikibase data are the iD editor and Sophox. The iD editor displays descriptions and images from the data items as follows (when you click on the i):

image

and the :pen: links to the respective data item (in this case Q98). Sophox on the other hand has apparently not been importing updates to the data items in nearly 1.5 years, see https://github.com/Sophox/sophox/issues/27. Note that taginfo does not use the Wikibase data at all (and was not planning on supporting it either), see https://github.com/taginfo/taginfo/issues/248.

While I am a big fan of structured data and think that we should embrace it as much as possible for the documentation of our tagging conventions, I do think that Wikibase is a poor fit for our use case because it requires data items to have numeric IDs ... which does make sense for Wikidata but not for tags because keys and values are already unique identifiers. Unfortunately Wikibase really seems to be primarily targeted at Wikidata. Another example being that we cannot disable the label fields for data items and they're often mistakenly translated as well.

Lastly the current state of the data items is quite a mess because there are absolutely no mechanisms in place to synchronize template data with the data items ... it's all done manually. And when you create a new page on the wiki there's no data item created for the page so many tag pages don't even have data items.

So with these caveats in mind I guess it makes sense to abandon Wikibase if it is a maintenance burden. I think we'd definitely need to migrate the tag description translations somewhere else ... IMHO regular wiki pages are definitely not an option ... we'd need some service with a similarly userfriendly translation UI.

I think ideally we could switch to some Wikibase alternative that better fit our use case but I don't think there is any.

not-my-profile avatar Oct 30 '22 07:10 not-my-profile

there are absolutely no mechanisms in place to synchronize template data with the data items

IIRC, @nyurik used a bot called Yurikbot to do this kind of synchronization, but for some reason it's no longer active: https://wiki.openstreetmap.org/wiki/User_talk:Yurik#Yurikbot_2

mmd-osm avatar Oct 30 '22 11:10 mmd-osm

I would strongly be opposed to losing the knowledge base built up in the data items. Despite being hampered by opposition from one prolific and vocal wiki contributor and a loss of interest form Yuri, they are in better shape than the difficult-to-maintain arguments to the wiki templates. I also used data items effectively to stop the wiki renderer from choking.

If Wikibase is too hard to manage the most sensible alternative would be to remove tag documentation from the wiki entirely and move it to a new custom website.

AndrewHain avatar Oct 30 '22 13:10 AndrewHain

I would strongly be opposed to losing the knowledge base built up in the data items.

@AndrewHain Can you explain what this knowledge base is and how people can use it, to add to what @not-my-profile wrote above? This probably isn't best done here, but an OSM diary entry might be a good place for it.

SomeoneElseOSM avatar Oct 30 '22 13:10 SomeoneElseOSM

I appreciate the extra detail being provided.

Firefishy avatar Oct 30 '22 14:10 Firefishy

This ticket should not be considered a 100% decision point... It is more a cry of pain needing a solution 😜

Firefishy avatar Oct 30 '22 15:10 Firefishy

The Wikibase extension causes us endless compatibility issues with Mediawiki version.

For the benefit of those coming into this discussion, can you elaborate on the specific issues you anticipate going forward if we keep Wikibase around? Are these compatibility issues with MediaWiki itself or with other extensions we have installed? Do we have a staging environment to test our MediaWiki configuration with Wikibase before deployment, to catch issues before they disrupt ordinary wiki usage?

I think the two main data consumers of the OSM wiki wikibase data are the iD editor and Sophox.

In addition, the wiki itself depends on data items in various ways. For example, when you search for a key that doesn’t have an article yet or land on a 404 page for that key, the wiki displays an infobox synthesized from that key’s data item. If it’s a compound key, the 404 page also includes a breakdown of the key based on the data item of each component.

This functionality was added in response to concern about having to maintain articles in each language about arbitrary combinations of key components. An alternative would be to populate these articles using a bot, but the descriptions would have to be maintained somewhere more machine-readable than infoboxes in wikitext, essentially reinventing data items.

iD consumes data items through the MediaWiki API. It’s a public API, so we don’t know for sure what other software (QA tools?) rely on it for descriptions, images, or statements about which element types are valid for a given tag. A breaking change of this magnitude needs to be discussed broadly, in a similar manner as if taginfo or Nominatim were to introduce a breaking change in its API.

Sophox on the other hand has apparently not been importing updates to the data items in nearly 1.5 years, see https://github.com/Sophox/sophox/issues/27.

This is inaccurate. Sophox/sophox#27 is tracking a bug that Sophox has been omitting many OSM elements, mostly nodes. It is also behind on ingesting OSM planet data overall. However, it’s up-to-date with respect to data items, modulo a potential problem with dropping older data items: Sophox/sophox#31. I don’t think this seeming regression should be a major factor in taking an irreversible step like discontinuing data items.

I think ideally we could switch to some Wikibase alternative that better fit our use case but I don't think there is any.

The main alternative is Semantic MediaWiki, but introducing it would make Wikibase feel like a walk in the park, not only for @Firefishy but also for wiki contributors and data consumers.

Can you explain what this knowledge base is and how people can use it

https://wiki.openstreetmap.org/wiki/Data_items is a good starting point. If you find anything missing there, please ask on the talk page.

1ec5 avatar Oct 30 '22 17:10 1ec5

Dear all,

I second comments against loosing the knowledge and practices that came with wikibase. All we need is to improve it, not abandon it.

Lastly the current state of the data items is quite a mess because there are absolutely no mechanisms in place to synchronize template data with the data items ... it's all done manually

And there shouldn't be any. Tags structured data should be put once in data items with no redundancy in wiki. Redundancy is a temporary bridge, waiting for a more robust integration. It seems this had last long enough to be a valid point to abandon the whole architecture, this is not good. Let's finish the work prior to challenge it in the middle of the journey.

UX on wikibase editor has been criticized for years with not so much involvement to make it better. This is huge work, but : How can we achieve moving the whole documentation on a brand new system (we don't know yet) OSM community will manage on its own, if we're unable to improve existing and common tools?

flacombe avatar Oct 30 '22 18:10 flacombe

The problem with wikibase is basically as Firefishy said, that it requires significant configuration to make it work and that configuration frequently changes and is poorly documented.

As far as I know very people people (other than wikidata obviously) actually run wikibase and as such it's not well tested unless you happen to be running the exact same bleeding edge version as wikidata - also they run in a separate mediawiki instance while we try and run it all in the one instance.

The result is that it frequently breaks when we upgrade mediawiki and I have to ask Yuri for help and he has to ask the wikidata people and they mostly shrug and say they have no idea if the release version we're using will work and it's all just a shit show.

tomhughes avatar Oct 30 '22 19:10 tomhughes

https://github.com/Sophox/sophox/issues/27 is tracking a bug that Sophox has been omitting many OSM elements

Ah yeah my bad ... I got confused by the different Sophox issues.

However, it’s up-to-date with respect to data items, modulo a potential problem with dropping older data items:

I do not think that it's up to date with regards to data items. E.g. this query yields nothing despite the data item existing for 11 days.

The main alternative is Semantic MediaWiki,

I do have experience with SMW and am certain that it's not an improvement over Wikibase.

All we need is to improve it, not abandon it.

I don't think that we can address the limitations of the Wikibase software (such as numeric ids being required).

Tags structured data should be put once in data items with no redundancy in wiki.

This does not work because taginfo does not support it and the taginfo maintainer is not interested in supporting it. Hence the redundancy. And I can understand their point because the current data item situation really is a mess.

How can we achieve moving the whole documentation on a brand new system (we don't know yet) OSM community will manage on its own, if we're unable to improve existing and common tools?

Migrating data to a compatible system would be easy ... it's just that there is no such suitable system afaik. Some system written specifically with OSM in mind could work much better for us. You cannot adapt a system if your use case is not in the scope of the project ... unless you fork the project but Wikibase is way too complicated to be forked.

Even if we do not see a solution to the situation right now, I think it's good for us to have this discussion.

not-my-profile avatar Oct 30 '22 19:10 not-my-profile

I got the point about maintenance and updates. I'll try to forward it to Wikimedia people we know at OSM France chapter.

As far as I know very people people (other than wikidata obviously) actually run wikibase

Wikidata + OSM is still twice more instances than OSM itself on a custom software.

I don't think that we can address the limitations of the Wikibase software (such as numeric ids being required). How the numeric id is a limitation?

Can't the items be reach through P19 https://wiki.openstreetmap.org/wiki/Property:P19 ?

This does not work because taginfo does not support it and the taginfo maintainer is not interested in supporting it. Hence the redundancy. And I can understand their point because the current data item situation really is a mess.

How could that change if we don't manage to change ourselves?

Some system written specifically with OSM in mind could work much better for us

Shouldn't we address needs on core osm.org website or API instead? No one could provide our API or core software but that would be clever to take advantage from Wikipedia experience.

You cannot adapt a system if your use case is not in the scope of the project ...

Come on, can't we talk to be part of this scope?

flacombe avatar Oct 30 '22 19:10 flacombe

How the numeric id is a limitation?

We do not need numeric ids but we have to use them ... that's a limitation.

Can't the items be reach through P19?

Yes of course but it's very much an unnecessary layer of indirection / source of confusion/inconvenience.

Shouldn't we address needs on core osm.org website or API instead?

This is not an either or scenario. I do consider documentation to be a very important topic.

Come on, can't we talk to be part of this scope?

Have a look at T202676, where a comparatively very small (but useful) change was rejected. Numeric ids on the other hand are fundamental to Wikibase, this is not something that can be easily changed and I am very certain that this won't change ... so this is not really a talking matter.

not-my-profile avatar Oct 30 '22 20:10 not-my-profile

@tomhughes @Firefishy There's any technical reasons to the Wikibase central point not be a dedicated place, let's say, https://base.openstreetmap.org ? Not saying about the name, but mostly to simplify infra, because is possible to both not be same instance.

This doesn't solve 100% of the obvious challenge (which I've already aware by the issues with upgrading the OpenStreetMap Wiki; others please read at least this https://github.com/openstreetmap/operations/issues/760#issuecomment-1280802385) however even Wikipedia have an dedicated Wikibase (https://www.wikidata.org/) installation to isolate the chaos from all the wikis. This somewhat means that while the OSM use of Wikibase is actually more likely to be harder to maintain (including backup size, scalability issues, etc) than Wikipedia's approach.

And like I wrote on the Wiki:Talk, I agree that this issue should be raised as affecting the smoother upgrades on the Wiki.

fititnt avatar Oct 30 '22 20:10 fititnt

Wikipedia has a dedicated Wikibase instance because there are many different Wikipedia instances (e.g. en.wikipedia.org, de.wikipedia.org, es.wikipedia.org etc.). I am pretty sure that moving the Wikibase instance of the OSM to a different domain would not solve or improve anything. The wikibase client extension would still need to be kept in sync with the wikibase repository extension on the different instance. Even worse this would mean you have different authentication sessions etc ... so definitely no.

not-my-profile avatar Oct 30 '22 20:10 not-my-profile

The result is that it frequently breaks when we upgrade mediawiki and I have to ask Yuri for help and he has to ask the wikidata people and they mostly shrug and say they have no idea if the release version we're using will work and it's all just a shit show.

There is an uphill climb for third-party installations of MediaWiki, but for what it’s worth, Wikimedia’s own sister projects have often faced the same hurdles when it comes to maintaining extensions that Wikipedia doesn’t use.[^sister] Local chapters such as Wikimedia Deutschland and Wikimedia Italia have been instrumental to supporting non-Wikipedia use cases, so I applaud @flacombe’s outreach to Wikimedia France.

Perhaps the OSMF should explore joining the Wikibase Stakeholder Group, which sponsors improvements to Wikibase to make it more reusable independently of WMF infrastructure. Judging from the member list, we aren’t alone in needing this support structure. In parallel, the Wikibase Community User Group advocates for increased reusability and is open to individual participation.

however even Wikipedia have an dedicated Wikibase (https://www.wikidata.org/) installation to isolate the chaos from all the wikis

The reason Wikidata exists is to centralize data from myriad Wikimedia wikis, not to isolate Wikibase from those wikis. Quite the contrary: all the other wikis have Wikibase Client installed, and Wikimedia Commons has even overlaid another Wikibase instance to track structured data about media files.

Separating Wikibase from the main wiki wouldn’t by itself reduce maintenance overhead. There are hosted solutions like wikibase.cloud, but even if we manage to outsource Wikibase, I think the OSM Wiki would still need to install Wikibase Client and get existing data consumers to transition over to the new instance. Being a client of a separate Wikimedia Commons media repository has been its own headache…

[^sister]: For example, Proofread Page is what keeps the lights on at Wikisource, but at one point it fell into such disrepair that I and some other Wikisource contributors had to rush patches to fix parts of the site. Fortunately, things are much better with this extension these days, but I think the lesson is to consider MediaWiki like any other open-source project depending on volunteers to keep the software operational.

1ec5 avatar Oct 30 '22 21:10 1ec5

Structured and semantics documentation is a next step for OSM project. But losing the dataitem without replacement look to me like a step backward.

There is probably not a lot of users of dataitem, yet. But I personally I have many project of using it (pro and hobbies like Osmose-QA). What is missing from my point of view, is a working and up to date Sophox instance.

I am already in touch with @nyurik to “reboot” the Sophox instance. It is already on the way.

Removing dataitem will be a big loose for the future.

frodrigo avatar Oct 31 '22 14:10 frodrigo

I'd feel a lot more comfortable if we were to split off wikibase to a separate instance of mediawiki, splitting it from wiki.openstreetmap.org.

Firefishy avatar Oct 31 '22 14:10 Firefishy

@Firefishy there is very little point in splitting wikibase because it consists of two parts -- server and client, and the client must live in the osm wiki to be of any reasonable use -- which means you would actually increase the number of problems (cross-site integration) rather than keeping it relatively simple. This is exactly how it is done with all other wiki installations that chose to use wikibase. It was also mentioned by a few other people above.

nyurik avatar Oct 31 '22 15:10 nyurik

@Firefishy there is very little point in splitting wikibase because it consists of two parts -- server and client, and the client must live in the osm wiki to be of any reasonable use -- which means you would actually increase the number of problems (cross-site integration) rather than keeping it relatively simple. This is exactly how it is done with all other wiki installations that chose to use wikibase. It was also mentioned by a few other people above.

I mean, run as a completely separate unconnected instance of mediawiki. No connection between base.osm.org (eg) <-> wiki.osm.org.

Firefishy avatar Oct 31 '22 15:10 Firefishy

@Firefishy did you try upgrading the wiki to 1.39 and Wikibase produced errors? Is this why this issue exists?

lectrician1 avatar Oct 31 '22 15:10 lectrician1

We haven't even gone to 1.38 yet (when I looked a week or so back 1.39 wasn't out yet) because I got burnt so badly by the last upgrade that I can't face trying to do it again. We're still on 1.37 for the main wiki.

tomhughes avatar Oct 31 '22 15:10 tomhughes

In fact 1.39 is still not out, but we do want to move to it once it is, for PHP 8 support.

tomhughes avatar Oct 31 '22 15:10 tomhughes

And there shouldn't be any. Tags structured data should be put once in data items with no redundancy in wiki.

Note that it should be proposed, discussed and agreed on OSM Wiki before actually doing this.

when you search for a key that doesn’t have an article yet or land on a 404 page for that key, the wiki displays an infobox synthesized from that key’s data item. If it’s a compound key, the 404 page also includes a breakdown of the key based on the data item of each component.

that is quite useful! Though all other mentioned uses are not actually requiring data items or wikibase

matkoniecz avatar Oct 31 '22 15:10 matkoniecz

@tomhughes was the previous migrate issue came about because the upgrade was done in production without testing it on the staging servers first? As far as I know, that was the only incident with Wikibase. Or is that a different issues?

nyurik avatar Oct 31 '22 15:10 nyurik

We don't have a staging server...

The recent incident that happened with the 1.37 upgrade led to https://github.com/openstreetmap/chef/commit/ec0bc46d1275fc39a116736fc40372b0f6784fd0 as my first attempt to fix it and then a few days later after I got you involved https://github.com/openstreetmap/chef/commit/586ac89854fe37ae4d9a0b8dcda6797535cafb49 which was what finally got it working again.

I think there was at least one previous occasion where I had to get your help after an upgrade but I can't recall the details.

tomhughes avatar Oct 31 '22 16:10 tomhughes

If Wikibase is too hard to manage the most sensible alternative would be to remove tag documentation from the wiki entirely and move it to a new custom website.

Wiki worked before data items were added and will continue working (with some very minor functionality missing and some other functionality replaced by other solutions) in case of Wikibase being disabled.

I get that some people invested a lot of effort into data items but they are not irreplaceable. Please do not present OSM Wiki shutdown as a possible consequence. That is simply misleading.

I think we'd definitely need to migrate the tag description translations somewhere else ... IMHO regular wiki pages are definitely not an option ...

Regular wiki pages and preset translations fulfil well this needs, data item translations are of a very dubious use.

Quality is dubious, existing uses can be easily replaced by extracting data from infoboxes and serving that as an API (maybe there would be greater delay in updates, what would be worth lower risk of definitions being changed without any explanation and oversight)

iD consumes data items through the MediaWiki API. It’s a public API, so we don’t know for sure what other software (QA tools?) rely on it for descriptions, images, or statements about which element types are valid for a given tag. A breaking change of this magnitude needs to be discussed broadly, in a similar manner as if taginfo or Nominatim were to introduce a breaking change in its API.

Definitely. I posted info at OSM Wiki to notify interested people ( https://wiki.openstreetmap.org/wiki/Talk:Wiki#Proposal_to_remove_data_item_for_technical_reasons ).

matkoniecz avatar Oct 31 '22 16:10 matkoniecz

Note that it should be proposed, discussed and agreed on OSM Wiki before actually doing this.

This was said from a state of the art IT perspective. Every change will be discussed prior to be released.

existing uses can be easily replaced by extracting data from infoboxes and serving that as an API

This doesn't solve the inconsistency in translations issue the wikibase is intended to solve. Here is the only place I know where crawling inconsistent text is preferred over a structured database (wikibase or other one). Current taginfo already offers this API but fails in restoring the consistency (it should be restored by people with appropriate tools actually). This is not said to blame maintainers who achieve great things but the messy architecture we collectively encourage.

flacombe avatar Oct 31 '22 16:10 flacombe

We don't have a staging server...

Upgrading in production without staging first sounds like a deeper problem. Today the obstacles are Wikibase and MultiMaps; tomorrow it may be something else in our configuration unrelated to these extensions. A staging server won’t magically fix extension incompatibilities, but without this important workflow, any issue becomes a fire drill, and the mere threat of such fire drills leads to deletionist proposals.

Regular wiki pages and preset translations fulfil well this needs, data item translations are of a very dubious use.

How can we be confident that this is not a minority opinion?

Definitely. I posted info at OSM Wiki to notify interested people.

You’re more optimistic than I am that every consumer of the OSM Wiki’s MediaWiki API instance follows the Talk:Wiki page or this repository. Given Hyrum’s law (relevant xkcd), it’s inevitable that something would be broken in any transition of data items off the wiki, no matter whom we contact; it’s only a question of the extent to which we care. But as a courtesy, there probably should be a more visible heads-up on one of the mailing lists, ideally focused on the immediate technical issues – @Firefishy’s plea for help – rather than running a victory lap around data items.

1ec5 avatar Oct 31 '22 16:10 1ec5

Well are we expected to run a second staging server for every service we run?

I don't really think it's practical to get rid of wikidata at this point. I do wish I'd never let Yuri persuade me to add it though.

tomhughes avatar Oct 31 '22 16:10 tomhughes

This doesn't solve the inconsistency in translations issue the wikibase is intended to solve.

Maybe wikibase was intended to solve it, but is not solving it at all but makes it worse (no edit descriptions, ineffective watchlisting result in problem being even worse)

structured database

infoboxes are also structured and wikibase is inferior in many ways, not only in how much it is annoying for sysadmins but also in interface quality and broken watchlisting

How can we be confident that this is not a minority opinion?

Not very sure. But in general data items are doomed to have low quality translations due to inferior editing interface (yes, it manages to be worse for data quality than even editing parameters of infobox)

matkoniecz avatar Oct 31 '22 16:10 matkoniecz