Critical: Cannot dump XML on fedora package
This package: python-pyABF-2.3.6-2.fc38 ( https://koji.fedoraproject.org/koji/buildinfo?buildID=2067454 )
Causes createrepo_c to output:
C_CREATEREPOLIB: Critical: Cannot dump XML for python3-pyABF (3ab9a1aafc8cc037c279fad85f5d34726a4491f763871b80df494aa9f2856e14): Forbidden control chars found (ASCII values <32 except 9, 10 and 13). 17:15:51: CACHE HIT python3-pyasn1-modules-0.4.8-11.fc37.noarch.rpm
The only thing that looks at all out of the ordinary is a changelog entry:
- added patch [200~9cd568066b5ff8d6079e2d8cf8af388066cc9b79~
Indeed it is caused by that changelog entry.
Looking at the specfile extracted from source rpm it contains the Escape character: 0x1B ascii.
That really shouldn't be there.
This will be fixed in the package: https://src.fedoraproject.org/rpms/python-pyABF/pull-request/1
I’m leaving this bug open for now. It can be closed if the behavior of createrepo_c is expected under the circumstances.
IMHO it would be nice if createrepo_c filtered out Forbidden control chars and just warned on them. (Or had a option to fail on them).
We could/should also look at filtering them out with a git hook.
This was already discussed here: https://github.com/rpm-software-management/createrepo_c/issues/104, since then --error-exit-val became the default and I think failing is the correct behavior in this case.
Dropping the chars might be fine from a changelog but if they were in a provide or a filename it could cause quite a bit of mysterious trouble.
We could possibly introduce an opposite option like --no-error-exit-val. In this case that would drop the package and printed a warning. Although the same could be achieved by manually excluding the package.
I'm not sure a soft-failure that removes the package entirely is a great idea, the warnings may be missed (think automated scripts) and the package in question could well contain some important update, or else break dependency chains.
Probably best to fail earlier in that case than let things break later on.
In general we should reject requests to tolerate package issues and somehow proceed with remaining elements. I understand that it is more expected from DNF to be more tolerant for problems, but at the level of createrepo it is expected to not finish the request and report an issue. What we can discuss whether we should exit at the first problem (fail fast) or scan remaining data and report all problems. Failing has one advantage - the problem is discovered before repositories are distributed. It is much easier to debug the issue with one repo maintainer then with thousands of repo users.
Given that the failing changelog entry was autogenerated - shouldn't there be a check for such errors in the generator?
The changelog is generated from the git history. The esc is actually in git history, so to completely fix this we would have to edit history and break the repo. ;( But in any case I understand the logic here. We should probibly look at a hook that just rejects changelog entries/comments with control chars in the first place.
rpmbuild should probably reject control characters in requirement names, filenames & package names completely, and strip them from changelogs
createrepo_c should probably warn + strip
Alternatively, I believe it is possible to use control characters in XML but only if you encode them as XML character refeferences.
The consensus is to fail early (to preserve current behavior). Therefore I am closing the issue, thank you for the discussion.
This is causing unexpectedly painful headaches in Copr: https://github.com/fedora-copr/copr/issues/3868
We could possibly introduce an opposite option like --no-error-exit-val
Can we mimic what's being done in PULP? I tested it, and PULP still generates the repository (even though that breaks the repoquery --changelogs queries, the package is present and installs fine); see the Copr issue above.
Ok, I think we can be more tolerant for the changelogs. While it doesn't feel right to me I understand that createrepo_c is not the right step to halt the pipeline.
I believe that ideally it would be prevented during rpmbuild: https://github.com/rpm-software-management/rpm/issues/2742 but even with it fixed it probably couldn't be updated everywhere.