Introducing dataSourceRef checking is a breaking/noisy change
The dataSourceRef checking introduced in https://github.com/NeTEx-CEN/NeTEx/pull/454 creates much feedback from our users.
Firstly, the change is performed in the master branch. But it creates a breaking change for users which use dataSourceRef (not a majority, indeed). These users need to manage several millions of validation errors because of this single change. As we're trying to convert users to "updating is life, updating is safe", this time, it wasn't ...
The dataSourceRef has been explained to these NeTEx users as an attribute to "tag" data origins. Now it has to be a real concept, a real model. If somebody has identified a real usage, great, but could it be introduced in the next branch ? With a documented usage ?
In a first time, we tried to search a "computer science" solution, especially by populating automatically DataSource models in NeTex files. But the XSD constraint doesn't give other solutions than defining into each XML file a DataSource model to match the dataSourceRef value (yes, I know, but ZIP files is the main usage). It requires creating hundred of DataSource (empty) models across the ZIP archive. It makes the technical solution very verbose and very ugly :(
Could the XSD constraint introduced in https://github.com/NeTEx-CEN/NeTEx/pull/454 be reverted ? At least in a first step to share real expectations behind the DataSource concept ? 🙏
If a fake dataSourceRef was added, just empty them. Problem solved.
I'm afraid that we cannot break datasets that were previously validating, especially in master
I don't agree. We did not change the schema, we add more validation rules. That this shows that the producer was producing non-standard implementations is on them.
Many NeTEx users and profiles (like the French Profile) promote a dataSourceRef / defaultDataSourceRef usage but don't make mandatory a related DataSource. Is it bad / is it good ? Whatever. That's the current usage. Many software programs are in production with this logic.
The change performed by https://github.com/NeTEx-CEN/NeTEx/pull/454 makes impossible for these NeTEx users to validate their NeTEx files (which were 100% valid before this PR). And it doesn't create 2/3 errors but several millions of errors on large files.
The NeTEx XSD is a helper and a reference for the NeTEx community. I don't see how this project can involve without integrating this kind of community feedback.
A great work has been performed in the next branch to improve validation rules. We're just asking to move this dataSourceRef change in this next branch. The same approach has been used for other validation rules. It will allow the impacted users to understand the need of DataSources and join the discussion around.
Many producers make wrong EPIP, it does not validate. Is it bad / it good. It is bad and does not support adoption.
For now I guess your producers have used a specific NeTEx XSD version, likely the version before EPIAP 1.3.1. If you want to upgrade to master and be more strict. That is your right as consumer. If "tomorrow" master contains the context of next, you will break anyhow.
Valid != Correct. And that is the problem, https://github.com/NeTEx-CEN/NeTEx/issues/463 shows all the NeTEx elements that do not even have constraints for them. Claiming that such element is in valid use, because the schema obviously misses those checks is stupid. Similar to keeping "interbational" because it should have been obvious to everyone that this is a typo.
With the introduction of more validation in the master branch (for that specific version of the schema, now being 1.3.x) implementers can (finally) see why their software does not behave as described as in the documentation.
The issue pointed by @albanpeignier is not about the change itself that is obviously in the good direction, but about the fact that it was done in the master branch: we can't break compatibility in master, even if it is for good improvements
- Nothing broke. Publications (finally) got validated.
- Master is master and receives fixes.
- It is obvious that when next is merged is master things will break.