Advice needed: on the way to remove postprocessing
One major challange for new users and developers of the SwiftGtk project is the wrapper generation itself. The wrapper generation does three major tasks (in cases of both rhx's and mikolasstuchlik's scripts):
- a script needs to be able to execute generation at all packages (this is currenly done either by recursive execution or dumping swift package)
- a script needs to call gir2swift with correct prerequisities (this is currently done by either hand writing prerequisities, trusting the correctness of Swift package dependencies, but it alco can be done in the process of generation automatically, since .gir contains references to it's prerequisities)
- running postprocessing on generated files which eliminates edge-cases and other wrongly generated code.
The third step (postprocessing) is a major component. A lot of work and research was put into developing and maintaining it. At the same time, the postprocessing has a lot of weak spots:
- the intent is not clear since the postprocessing is not documented, nobody but the person who created a certain correction knows what was the original intention and whether it is even needed
- the postprocessing is the sole reason for maintaining the scripts at all otherwise we would be able to handle generation by fairly simple script or even gir2swift itself
- awk and sed are yet another prerequisity for already complicated project a developer already needs to have above average understanding of Swift, C, GLib/GObject, .gir/g-ir, bash, the structure of gir2swift itself, etc.
I would like to integrate the work done by postprocessing into the gir2swift and replace the awk, sed and gir2swift-manifest.sh by one gir2swift-manifest.json file, that would contain all the required information.
I would like to initiate the dicussion about the structure of such json file and I would like to ask following questions.
- What kinds of postprocessing is being performed? Example: renaming swift types, renaming c types, adding annotations to specific function arguments, removing specified substrings etc...
- Are the tasks platform dependent? Do we need an expression system? Example: { "platform": "Linux", "distributionContains": "Ubuntu" }, {"pkg-cofig-library": "glib", "until-version": "2.0.0"} etc...
- Could be the edgecases summarized? For example some .gir files do not use "priv" attribute on private properties, some .gir files fail to fill correct type getter (refer to GirRecord.swift:96) - I would like to handle cases like this in the json files too instead of gir2swift. (This could be fixed by "preprocessing" the XML file)
Yes, it would be nice to integrate the post-processing steps directly into gir2swift. From my experience, there a number of different reasons for why post-processing is needed. They all vary a bit in terms of what is required. Some examples I can think of:
- inadequate code generation in
gir2swift. These often fix syntax errors or insert keywords/attributes that Swift requires (such as@escaping, adding/removingoverride, etc.).
- ideally these should be fixed by making gir2swift smarter, so they are not needed anymore
- however, this can be a lot of work for only a few edge cases
- errors in the relevant
.girfiles.
- these are often version specific (so should be tagable)
- ideally, they should be upstreamed to ensure they eventually get fixed (so would be nice to tag them with an upstream PR number)
- errors in the relevant header files.
- sometimes what is defined varies between different platforms or versions
- examples include
#definethat is visible to Swift in some cases and invisible in other cases
- syntactic sugar
- there are cases where
gir2swiftgenerates correct code, but small changes can make the generated code more Swift-friendly
So what's a good way forward? Maybe initially, it would be good if the existing functionality was integrated directly into gir2swift. Probably the sed scripts are low-hanging fruit as a first step, as they provide simple search and replace based on regular expressions. Rather than a generic gir2swift-manifest.jsonfile, I would suggest using a module-specific file (similar to the.preambleand other files we already have), e.g.GObjectmight have aGObject-2.0.quirks` file.
Also, since these files need to be created by humans, I would not use json as a format. We probably want an expression system for versions and platforms (to avoid complex shell scripts such as the GLib one that needs to handle differently versioned .sed and .awk files), so something simple like the sed format would not be enough. Maybe something like yaml/StrictYAML might strike the right balance. Also, we will eventually needs something that can replace the state-based awk scripts (ideally with some domain knowledge that awk doesn't have, but gir2swift does: e.g. "the XYZ pointer type in the convenience initialisers implemented by functions containing new_abc_label that are part of the ABC struct).