SDL
                                
                                 SDL copied to clipboard
                                
                                    SDL copied to clipboard
                            
                            
                            
                        Feature Request: Provide machine readable API definitions with SDL3
Heya!
I’m the author of SDL.zig, an attempt to create a Zig binding for SDL2.
As auto-translating the headers does not convey enough information about the expected types, a lot of APIs are hand-adjusted to actually fit the intent of the SDL api. One example would be: SDL_Color* colors has to be translated to colors: [*]SDL_Color (pointer to many), and not colors: *SDL_Color (pointer to one).
Now with the beginning of SDL3 development: Is the SDL project open to provide a machine-readable abstract definition of the SDL APIs that allow precise generation of C headers, Zig bindings and possibly other languages (C#, Rust, Nim, …) so there’s only one authorative source for the APIs that convey enough information to satisfy all target languages?
Regards
- xq
PS.: I'm willing to spent time and effort on this, also happy to write both the generator and definitions.
Conceptually this is fine with me, as long as it doesn't decrease readability of the headers by end users. If it does, then I would suggest a separate API definition file that's machine readable.
Can you give a sample of what a small header like SDL_sensor.h might look like?
Can you give a sample of what a small header like SDL_sensor.h might look like?
Just a heads up: I'm working on that, just happens that i'm at a conference right now. Will definitly post results next week
GNOME's GObject-Introspection is in the same general space as this, and GNOME-adjacent libraries use it to generate bindings, either at compile-time for compiled languages (Vala, C++, Rust) or at runtime for dynamic languages (Python, JavaScript, Perl).
SDL probably can't usefully use GObject-Introspection directly, because GObject-Introspection is designed for GLib's object model, but it's worth looking at GObject-Introspection and seeing what sort of information they needed in order to autogenerate the bindings. It uses magic comments containing annotations; the most important one is usually transfer, which marks whether ownership is transferred between caller and callee.
Another very useful annotation is whether a char * is UTF-8 (like in GTK widgets), the OS's unspecified string encoding (like in Unix filenames and environment variables), or binary data (like in memmove()).
One example would be: SDL_Color* colors has to be translated to colors: [*]SDL_Color (pointer to many), and not colors: *SDL_Color (pointer to one).
In GObject-Introspection, this distinction would be something like:
/**
 * @colors: (array length=n_colors): the palette
 *
 * Set a palette of variable size that is passed as a pointer to (the first element of) an array.
 */
void Example_SetPalette(Picture *self, SDL_Color *colors, size_t n_colors);
/**
 * @colors: (array fixed-size=16): pointer to exactly 16 colors
 *
 * Set a palette of fixed size that is passed as a pointer to (the first element of) an array.
 */
void Example_SetVgaPalette(Picture *self, SDL_Color *colors);
/**
 * @which: an index within the palette
 * @color: (in) (transfer none): the color
 *
 * Change one member of the palette by copying the given color, which is passed by reference.
 */
void Example_SetPaletteEntry(Picture *self, int which, SDL_Color *color);
/**
 * @which: an index within the palette
 * @color: (out caller-allocates): the color
 *
 * Get one member of the palette and store it by overwriting the contents of a struct that is passed by reference.
 */
void Example_GetColorByIndex(Picture *self, int which, SDL_Color *color_out);
My proposal wouldn't go that far, but especially wouldn't use C as a data ground truth. I hopefully can finish my example later this day, as the above code doesn't contain even remotely enough data to generate nice Zig or C# code. Ownership transfer is a good point, though!
One insight from GNOME which might be equally useful in SDL is that the most convenient API/ABI for C is not necessarily convenient for bindings. A reasonable number of API entry points in GLib/GTK end up having two versions: one that is convenient for C programmers and marked as not visible to bindings (for example using varargs), and one that is convenient for binding programmers but de-emphasized for C programmers (for example always using an (array,length) pair even if that's not the most natural C representation). Usually one of them calls the other internally, or they both call into a common internal implementation.
@slouken: I created a example here: https://github.com/MasterQ32/SDL3-Api-Generator-Example
It implements the minimal stuff to render the Sensor API to both Zig and C. The generated code is not at the level that i want to generate, but it's pretty close.
One thing that's missing still is the ability to abstract something like function macros, which are not part of the linked api, but the compiled-in api.
One cool thing that is possible: The api generator can later parse the documentation comments and translate them into the language specific documentation format, which means everyone will get nice code comments in their IDE
Important note: I chose Lua for implementation just because it allows for a quick-and-dirty implementation. For a official API generator, i'd probably move to C, as we can remove dependencies by that.
@smcv:
One insight from GNOME which might be equally useful in SDL is that the most convenient API/ABI for C is not necessarily convenient for bindings.
That is true. I think we can model something like that.
Another very useful annotation is whether a char * is UTF-8 (like in GTK widgets), the OS's unspecified string encoding (like in Unix filenames and environment variables), or binary data (like in memmove()).
This one is actually a pretty cool idea. Your comments aren't incorporated yet into the API generator/data format, but it should not be that hard. Array lengths are also a pretty cool annotation, would allow Zig users to use slices ([]T, a pointer + length type) in the exposed API, and the C api is hidden from the user.
Maybe my binding generators are of interest to the discussion, outlined here:
https://floooh.github.io/2020/08/23/sokol-bindgen.html
TL;DR: I'm running my C headers through clang ast-dump, parse the resulting JSON output into a reduced 'intermediate JSON', and then generate language bindings from this (now automated via Github Actions: https://github.com/floooh/sokol/actions/runs/3122773475)
Depending on target language I'm injecting special cases (e.g. helper functions like this: https://github.com/floooh/sokol-zig/blob/680d37ebcde09794e66380ff30867ca3dafb9f2f/src/sokol/gfx.zig#L4-L26). I think it's important to be able to allow the final code generator to support special treatment for specific declarations, for instance printf()-like functions with variable argument lists usually can't be mapped directly to the target language. For such 'complicated cases' I don't attempt to find a generic solution, but simply inject a manually written function (in some cases not even calling the original function, but 'emulating' it in the target language, for instance here's such a 'formatted print replacement': https://github.com/floooh/sokol-zig/blob/680d37ebcde09794e66380ff30867ca3dafb9f2f/src/sokol/debugtext.zig#L29-L50
Clang ast-dump works ok for my case, because I can control the input C APIs (there's a blurb about "binding friendly APIs" in the blog post). The ast-dump output format isn't guaranteed to remain fixed, but so far (for just parsing declarations) it hasn't changed.
A more robust solution is proabably a "proper" tool based on libclang.
In any case, here's all the python for the binding generation:
https://github.com/floooh/sokol/blob/master/bindgen/
Start at gen_all.py, then look at gen_ir.py (takes the verbose output of clang ast-dump and turns it into a much simplified JSON), and then gen_zig.py, gen_nim.py and gen_odin.py which take the intermediate JSON and generate the bindings.
PS: the most 'interesting' problem seems to be "how to deal with strings". The currently supported languages can all consume zero-terminated C strings directly, and all language specific 'structs' directly map to their C counterparts (e.g. they are 'memory-layout-compatible'. For other languages this will be more tricky and may require a proper 'marshalling layer' between the target language and the C APIs.
Hope this makes sense :)
On the subject of strings, SDL2# ended up doing its own UTF8 marshaling:
https://github.com/flibitijibibo/SDL2-CS/blob/master/src/SDL2.cs
Aside from that we're pretty faithful to the original API, and it wouldn't be hard to annotate what type of string marshaling is necessary. Having a way to generate this would be nice to have, and after 10 years of maintaining SDL2# by hand I think we have enough information to automate this.
Speaking up as a Rust user of SDL2, and as someone that's made both hand-written and generator-written Rust bindings for SDL2 and GL, all of this is basically a good idea.
I don't have too much to add at the moment in terms of what would help from a Rust perspective. The one thing would be that I'd like if function arguments in the machine readable definition always used integers of fixed sizes, rather than C's default numeric types that vary by platform. However, if this can't be done it's still basically fine.
While I've not used the Rust bindings much, would it make sense to tweak SDL's API to make it more directly map to Rust?
e.g., the Rust bindings make up the concept of a "Canvas" in SDL_Renderer, in order to have something with the right lifetime. (As well as things like a TextureCreator?)
These of course aren't documented in SDL (other than the Rust bindings docs), and won't appear in any other SDL tutorials, etc. If we can find a closer match between SDL and what Rust needs, so the SDL bindings don't feel so much like a different library in places, I think that'd be much more pleasant to deal with on both sides.
I actually have my own separate crates called fermium (raw bindings) and beryllium (rust-friendly wrappers). I've never looked too closely at what the sdl2 crate is doing or what any of their internal logic for stuff is.
Jumping in here as I have experimented with this problem from a different angle with C# with some major pains and then some minor success. I have crossed friendly paths with @floooh for generating bindings in C# for sokol using libclang.
I have documented all my knowledge / findings into the README and other documentation over at https://github.com/bottlenoselabs/c2cs. Any constructive corrections or call outs is extremely welcome. I am probably on mount stupid.
My auto-generated bindings for SDL can be found here: https://github.com/bottlenoselabs/SDL-cs. There are challenges with the SDL API which makes automatic bindgen not so "friendly" when it comes to C#. I am free to discuss this in more which is probably the most value I can bring to this discussion.
I use the c2cs tool I created to automatically generate the C# bindings for FNA C dependencies for my fork of FNA called Katabasis; I sponsor @flibitijibibo. The purpose of this fork is to expand my own curiosity for the XNA/MonoGame APIs in a way that organic and makes sense (I have a strong love hate relationship with Microsoft).
EDIT:
I forgot to mention what's interesting about my solution is that I use libclang to extract a minimal necessary .json Abstract Syntax Tree for purposes of generating C# code. Technically speaking this .json file could also be used to generate code for Python or other languages but I have not experimented down this path due to my limited time.
I forgot to mention what's interesting about my solution is that I use libclang to extract a minimal necessary .json Abstract Syntax Tree for purposes of generating C# code. Technically speaking this .json file could also be used to generate code for Python or other languages but I have not experimented down this path due to my limited time.
The problem with this approach is that C sadly doesn't convey even remotely enough information to generate good APIs from. That's why i'm proposing a (not yet specified, but extensible) format to document all requirements to an API. For example char * foo in C doesn't say if i can pass NULL or not. It also doesn't say if the pointer is NUL terminated or if it expects only a single char or a fixed number of them. If we can express this information in a file and generate the code from there, we can create way better bindings for most languages (Consider C# ref Point vs Point[] in marshalling)
One example would be:
SDL_Color* colorshas to be translated tocolors: [*]SDL_Color(pointer to many), and notcolors: *SDL_Color(pointer to one).
I dont know Zig at all but could google that colors: [*c]SDL_Color can be used for automated translation (althought is as unsafe as C code is)
By the way: my LuaJIT SDL binding in https://github.com/sonoro1234/LuaJIT-SDL2
I dont know Zig at all but could google that colors: [*c]SDL_Color can be used for automated translation (althought is as unsafe as C code is)
Yes, that is correct. This conveys basically the following information:
- This pointer can be optionally be NULL
- This pointer can point to a single item
- This pointer can point to many items
- This pointer can point to a sequence terminated by a NUL element
Whereas *SDL_Color conveys this information:
- The pointer cannot be NULL
- The pointer points to a single element
and [*]SDL_Color conveys:
- The pointer cannot be NULL
- The pointer points to an unknown/externally defined number of elements, ranging from 0 ... (limit-1)
This means, we can translate a *SDL_Color to C# a ref SDL_Color or out SDL_Color parameter, whereas [*]SDL_Color can be translated to SDL_Color[]. At least in a marshalling context
@MasterQ32 I agree with you; I have encountered this problem and so has Silk.NET folks and many others. There appears to be a need for some form of annotations which can be used to direct bindgen more accurately.
Like @smcv mentioned earlier, the use of magic comments is one possible solution. This has advantages and disadvantages.
What I have noticed in experimentation is that libclang exposes getting any Clang attributes for a cursor. Another path forward is to direct bindgen using Clang attributes.
However, the path I'm choosing to go down myself is neither. I decided to just accept that C just does not expose enough information. Instead of trying to add more information to C code (via magic comments or attributes), I'm using auxiliary code to direct bindgen using a plugin mechanism. This works well for my use case because I don't have control over SDL, or sokol, or flecs, etc.
For example, the pattern of SDL_Color* being an array; that can be transformed appropriately to C# via auxiliary code in the form of a plugin. In your other example, of ref Point vs Point[], this pattern would also be handled by auxiliary code in the form of a plugin. Side note: using Point[] would probably not be the best idea and Span<Point> would probably be a better fit; something which I already do for fixed buffers.
Dear imgui apparently just released something like this, probably has a lot of work for C++ wrangling but still might be good for the other aspects of metadata generation: https://github.com/dearimgui/dear_bindings
gendyapi parse all the SDL headers, to generate the DYNAPI files. I've tried a re-write in python to fix some bug / improve (#6783)
And it's been very easy to add a json dump of all SDL API which can be useful for generating bindings. of course, extra tags you would need for allocation/pointers are missing ... but this should be easy to parse when added and specified.
I know this is the inverse solution of using a "unique source" and generates the header. but at least, it can help to generate the "unique source" from all header, if that should be chosen.
Yeah, this seems like a reasonable approach, we generate an API description from the header that can be marked up with more detail by people who are implementing language bindings.
At this point also it might be worth adding code to handle APIs that have been removed, or at least add a checklist that someone can check. It won't matter once we've finalized the ABI, but it might be useful now.
The ast-dump output format isn't guaranteed to remain fixed, but so far (for just parsing declarations) it hasn't changed.
A more robust solution is proabably a "proper" tool based on libclang.
the common lisp binding generation relies on c2ffi.
i'm not sure how c2ffi relates to clang ast-dump, and what justifies its existence (because i don't know much about ast-dump).
A very plain XML file might be best, like GL and Vulkan do.
The problem with this approach is that C sadly doesn't convey even remotely enough information to generate good APIs from.
my strategy is that i have the generated API in one package. it only deals with the basics, like string conversions/encoding, error return codes thrown as exceptions, etc. whatever can be done based on the info formally encoded in the C model.
then i have another package that is built on top of the generated one, and contains hand written "lispy" constructs that may use the full power of the host language.
FTR, this is a related feature request: https://github.com/libsdl-org/SDL/issues/2059 (typedef for error return codes).
Just to capture the suggestion in https://github.com/libsdl-org/SDL/issues/2059, if we generate separate API binding metadata, we could mark the SDL functions that return int as returning SDL_ReturnCode, which is defined to be 0 on success or < 0 on error, and SDL_GetError() would return useful information about what went wrong.
@slouken please keep in mind that the current infrastructure for binding generation works based on the C domain (clang based AST walker).
in e.g. common lisp, it's trivial to map a return code of a specific C type to be thrown as an exception when the value is negative. whatever custom binding machinery is introduced by SDL will require extra work, i.e. probably remain unsupported. i, for one, will not put in the extra work to add support for something that is unique for SDL.
therefore, whatever can be encoded cheaply in the C domain, is more useful when encoded there, not in some machinery that is unique to the SDL library.
I'm looking at creating some nodejs SDL bindings, because all the existing ones I can find on NPM are abandoned, out of date to the point of not even compiling, awful, incomplete, or all of the above. Having a machine-readable API definition maintained by the SDL team so that my binding-generation work is just "run regenerate.sh" would make that a lot easier, and I'd be happy to contribute to making it happen :)
(For context, I'm working on a gameboy emulator in a bunch of different languages, which also happens to be exercising a whole load of different language SDL bindings, if that's any use to anyone - https://github.com/shish/rosettaboy )
@shish, if you want to work on an API definition, feel free to submit one! It sounds like you have a real-world use case, so if you want to use that as a basis, go for it!
I would suggest enhancing src/dynapi/gendynapi.py to automatically create the basic definition file, and then add comments to that letting people know what additional markup can be added to fine tune the binding generation.
Note that you can run ./gendynapi.py --dump and it creates a "sdl.json" file with all SDL API inside.
eg, a list of entries like this:
  {
        "comment": "the full raw comment",
        "header": "SDL_render.h",
        "name": "SDL_CreateRenderer",
        "parameter": [
            "SDL_Window *REWRITE_NAME",
            "const char *REWRITE_NAME",
            "Uint32 REWRITE_NAME"
        ],
        "parameter_name": [
            "window",
            "name",
            "flags"
        ],
        "retval": "SDL_Renderer*"
    },    
this one matches the function:
extern DECLSPEC SDL_Renderer *SDLCALL SDL_CreateRenderer(
    SDL_Window *window, 
    const char *name, 
    Uint32 flags);
the output format can be changed/improved if needed
@shish, if you want to work on an API definition, feel free to submit one! It sounds like you have a real-world use case, so if you want to use that as a basis, go for it!
@slouken so this means you're open to the idea of having such an "official file" in the SDL repository?
@shish: I'm happy to support you with this task, if you want to tackle it. I'm taking a look at the gendyapi implementation