SDL icon indicating copy to clipboard operation
SDL copied to clipboard

Feature Request: Provide machine readable API definitions with SDL3

Open ikskuh opened this issue 3 years ago • 20 comments

Heya!

I’m the author of SDL.zig, an attempt to create a Zig binding for SDL2.

As auto-translating the headers does not convey enough information about the expected types, a lot of APIs are hand-adjusted to actually fit the intent of the SDL api. One example would be: SDL_Color* colors has to be translated to colors: [*]SDL_Color (pointer to many), and not colors: *SDL_Color (pointer to one).

Now with the beginning of SDL3 development: Is the SDL project open to provide a machine-readable abstract definition of the SDL APIs that allow precise generation of C headers, Zig bindings and possibly other languages (C#, Rust, Nim, …) so there’s only one authorative source for the APIs that convey enough information to satisfy all target languages?

Regards

  • xq

PS.: I'm willing to spent time and effort on this, also happy to write both the generator and definitions.

ikskuh avatar Oct 05 '22 13:10 ikskuh

Conceptually this is fine with me, as long as it doesn't decrease readability of the headers by end users. If it does, then I would suggest a separate API definition file that's machine readable.

Can you give a sample of what a small header like SDL_sensor.h might look like?

slouken avatar Oct 05 '22 16:10 slouken

Can you give a sample of what a small header like SDL_sensor.h might look like?

Just a heads up: I'm working on that, just happens that i'm at a conference right now. Will definitly post results next week

ikskuh avatar Oct 10 '22 06:10 ikskuh

GNOME's GObject-Introspection is in the same general space as this, and GNOME-adjacent libraries use it to generate bindings, either at compile-time for compiled languages (Vala, C++, Rust) or at runtime for dynamic languages (Python, JavaScript, Perl).

SDL probably can't usefully use GObject-Introspection directly, because GObject-Introspection is designed for GLib's object model, but it's worth looking at GObject-Introspection and seeing what sort of information they needed in order to autogenerate the bindings. It uses magic comments containing annotations; the most important one is usually transfer, which marks whether ownership is transferred between caller and callee.

Another very useful annotation is whether a char * is UTF-8 (like in GTK widgets), the OS's unspecified string encoding (like in Unix filenames and environment variables), or binary data (like in memmove()).

One example would be: SDL_Color* colors has to be translated to colors: [*]SDL_Color (pointer to many), and not colors: *SDL_Color (pointer to one).

In GObject-Introspection, this distinction would be something like:

/**
 * @colors: (array length=n_colors): the palette
 *
 * Set a palette of variable size that is passed as a pointer to (the first element of) an array.
 */
void Example_SetPalette(Picture *self, SDL_Color *colors, size_t n_colors);

/**
 * @colors: (array fixed-size=16): pointer to exactly 16 colors
 *
 * Set a palette of fixed size that is passed as a pointer to (the first element of) an array.
 */
void Example_SetVgaPalette(Picture *self, SDL_Color *colors);

/**
 * @which: an index within the palette
 * @color: (in) (transfer none): the color
 *
 * Change one member of the palette by copying the given color, which is passed by reference.
 */
void Example_SetPaletteEntry(Picture *self, int which, SDL_Color *color);

/**
 * @which: an index within the palette
 * @color: (out caller-allocates): the color
 *
 * Get one member of the palette and store it by overwriting the contents of a struct that is passed by reference.
 */
void Example_GetColorByIndex(Picture *self, int which, SDL_Color *color_out);

smcv avatar Oct 12 '22 13:10 smcv

My proposal wouldn't go that far, but especially wouldn't use C as a data ground truth. I hopefully can finish my example later this day, as the above code doesn't contain even remotely enough data to generate nice Zig or C# code. Ownership transfer is a good point, though!

ikskuh avatar Oct 12 '22 13:10 ikskuh

One insight from GNOME which might be equally useful in SDL is that the most convenient API/ABI for C is not necessarily convenient for bindings. A reasonable number of API entry points in GLib/GTK end up having two versions: one that is convenient for C programmers and marked as not visible to bindings (for example using varargs), and one that is convenient for binding programmers but de-emphasized for C programmers (for example always using an (array,length) pair even if that's not the most natural C representation). Usually one of them calls the other internally, or they both call into a common internal implementation.

smcv avatar Oct 12 '22 13:10 smcv

@slouken: I created a example here: https://github.com/MasterQ32/SDL3-Api-Generator-Example

It implements the minimal stuff to render the Sensor API to both Zig and C. The generated code is not at the level that i want to generate, but it's pretty close.

One thing that's missing still is the ability to abstract something like function macros, which are not part of the linked api, but the compiled-in api.

One cool thing that is possible: The api generator can later parse the documentation comments and translate them into the language specific documentation format, which means everyone will get nice code comments in their IDE

Important note: I chose Lua for implementation just because it allows for a quick-and-dirty implementation. For a official API generator, i'd probably move to C, as we can remove dependencies by that.

@smcv:

One insight from GNOME which might be equally useful in SDL is that the most convenient API/ABI for C is not necessarily convenient for bindings.

That is true. I think we can model something like that.

Another very useful annotation is whether a char * is UTF-8 (like in GTK widgets), the OS's unspecified string encoding (like in Unix filenames and environment variables), or binary data (like in memmove()).

This one is actually a pretty cool idea. Your comments aren't incorporated yet into the API generator/data format, but it should not be that hard. Array lengths are also a pretty cool annotation, would allow Zig users to use slices ([]T, a pointer + length type) in the exposed API, and the C api is hidden from the user.

ikskuh avatar Oct 12 '22 19:10 ikskuh

Maybe my binding generators are of interest to the discussion, outlined here:

https://floooh.github.io/2020/08/23/sokol-bindgen.html

TL;DR: I'm running my C headers through clang ast-dump, parse the resulting JSON output into a reduced 'intermediate JSON', and then generate language bindings from this (now automated via Github Actions: https://github.com/floooh/sokol/actions/runs/3122773475)

Depending on target language I'm injecting special cases (e.g. helper functions like this: https://github.com/floooh/sokol-zig/blob/680d37ebcde09794e66380ff30867ca3dafb9f2f/src/sokol/gfx.zig#L4-L26). I think it's important to be able to allow the final code generator to support special treatment for specific declarations, for instance printf()-like functions with variable argument lists usually can't be mapped directly to the target language. For such 'complicated cases' I don't attempt to find a generic solution, but simply inject a manually written function (in some cases not even calling the original function, but 'emulating' it in the target language, for instance here's such a 'formatted print replacement': https://github.com/floooh/sokol-zig/blob/680d37ebcde09794e66380ff30867ca3dafb9f2f/src/sokol/debugtext.zig#L29-L50

Clang ast-dump works ok for my case, because I can control the input C APIs (there's a blurb about "binding friendly APIs" in the blog post). The ast-dump output format isn't guaranteed to remain fixed, but so far (for just parsing declarations) it hasn't changed.

A more robust solution is proabably a "proper" tool based on libclang.

In any case, here's all the python for the binding generation:

https://github.com/floooh/sokol/blob/master/bindgen/

Start at gen_all.py, then look at gen_ir.py (takes the verbose output of clang ast-dump and turns it into a much simplified JSON), and then gen_zig.py, gen_nim.py and gen_odin.py which take the intermediate JSON and generate the bindings.

PS: the most 'interesting' problem seems to be "how to deal with strings". The currently supported languages can all consume zero-terminated C strings directly, and all language specific 'structs' directly map to their C counterparts (e.g. they are 'memory-layout-compatible'. For other languages this will be more tricky and may require a proper 'marshalling layer' between the target language and the C APIs.

Hope this makes sense :)

floooh avatar Oct 14 '22 06:10 floooh

On the subject of strings, SDL2# ended up doing its own UTF8 marshaling:

https://github.com/flibitijibibo/SDL2-CS/blob/master/src/SDL2.cs

Aside from that we're pretty faithful to the original API, and it wouldn't be hard to annotate what type of string marshaling is necessary. Having a way to generate this would be nice to have, and after 10 years of maintaining SDL2# by hand I think we have enough information to automate this.

flibitijibibo avatar Nov 22 '22 04:11 flibitijibibo

Speaking up as a Rust user of SDL2, and as someone that's made both hand-written and generator-written Rust bindings for SDL2 and GL, all of this is basically a good idea.

I don't have too much to add at the moment in terms of what would help from a Rust perspective. The one thing would be that I'd like if function arguments in the machine readable definition always used integers of fixed sizes, rather than C's default numeric types that vary by platform. However, if this can't be done it's still basically fine.

Lokathor avatar Nov 22 '22 06:11 Lokathor

While I've not used the Rust bindings much, would it make sense to tweak SDL's API to make it more directly map to Rust?

e.g., the Rust bindings make up the concept of a "Canvas" in SDL_Renderer, in order to have something with the right lifetime. (As well as things like a TextureCreator?)

These of course aren't documented in SDL (other than the Rust bindings docs), and won't appear in any other SDL tutorials, etc. If we can find a closer match between SDL and what Rust needs, so the SDL bindings don't feel so much like a different library in places, I think that'd be much more pleasant to deal with on both sides.

sulix avatar Nov 22 '22 09:11 sulix

I actually have my own separate crates called fermium (raw bindings) and beryllium (rust-friendly wrappers). I've never looked too closely at what the sdl2 crate is doing or what any of their internal logic for stuff is.

Lokathor avatar Nov 22 '22 16:11 Lokathor

Jumping in here as I have experimented with this problem from a different angle with C# with some major pains and then some minor success. I have crossed friendly paths with @floooh for generating bindings in C# for sokol using libclang.

I have documented all my knowledge / findings into the README and other documentation over at https://github.com/bottlenoselabs/c2cs. Any constructive corrections or call outs is extremely welcome. I am probably on mount stupid.

My auto-generated bindings for SDL can be found here: https://github.com/bottlenoselabs/SDL-cs. There are challenges with the SDL API which makes automatic bindgen not so "friendly" when it comes to C#. I am free to discuss this in more which is probably the most value I can bring to this discussion.

I use the c2cs tool I created to automatically generate the C# bindings for FNA C dependencies for my fork of FNA called Katabasis; I sponsor @flibitijibibo. The purpose of this fork is to expand my own curiosity for the XNA/MonoGame APIs in a way that organic and makes sense (I have a strong love hate relationship with Microsoft).

EDIT: I forgot to mention what's interesting about my solution is that I use libclang to extract a minimal necessary .json Abstract Syntax Tree for purposes of generating C# code. Technically speaking this .json file could also be used to generate code for Python or other languages but I have not experimented down this path due to my limited time.

lithiumtoast avatar Nov 23 '22 01:11 lithiumtoast

I forgot to mention what's interesting about my solution is that I use libclang to extract a minimal necessary .json Abstract Syntax Tree for purposes of generating C# code. Technically speaking this .json file could also be used to generate code for Python or other languages but I have not experimented down this path due to my limited time.

The problem with this approach is that C sadly doesn't convey even remotely enough information to generate good APIs from. That's why i'm proposing a (not yet specified, but extensible) format to document all requirements to an API. For example char * foo in C doesn't say if i can pass NULL or not. It also doesn't say if the pointer is NUL terminated or if it expects only a single char or a fixed number of them. If we can express this information in a file and generate the code from there, we can create way better bindings for most languages (Consider C# ref Point vs Point[] in marshalling)

ikskuh avatar Nov 23 '22 08:11 ikskuh

One example would be: SDL_Color* colors has to be translated to colors: [*]SDL_Color (pointer to many), and not colors: *SDL_Color (pointer to one).

I dont know Zig at all but could google that colors: [*c]SDL_Color can be used for automated translation (althought is as unsafe as C code is)

By the way: my LuaJIT SDL binding in https://github.com/sonoro1234/LuaJIT-SDL2

sonoro1234 avatar Nov 23 '22 10:11 sonoro1234

I dont know Zig at all but could google that colors: [*c]SDL_Color can be used for automated translation (althought is as unsafe as C code is)

Yes, that is correct. This conveys basically the following information:

  • This pointer can be optionally be NULL
  • This pointer can point to a single item
  • This pointer can point to many items
  • This pointer can point to a sequence terminated by a NUL element

Whereas *SDL_Color conveys this information:

  • The pointer cannot be NULL
  • The pointer points to a single element

and [*]SDL_Color conveys:

  • The pointer cannot be NULL
  • The pointer points to an unknown/externally defined number of elements, ranging from 0 ... (limit-1)

This means, we can translate a *SDL_Color to C# a ref SDL_Color or out SDL_Color parameter, whereas [*]SDL_Color can be translated to SDL_Color[]. At least in a marshalling context

ikskuh avatar Nov 23 '22 10:11 ikskuh

@MasterQ32 I agree with you; I have encountered this problem and so has Silk.NET folks and many others. There appears to be a need for some form of annotations which can be used to direct bindgen more accurately.

Like @smcv mentioned earlier, the use of magic comments is one possible solution. This has advantages and disadvantages.

What I have noticed in experimentation is that libclang exposes getting any Clang attributes for a cursor. Another path forward is to direct bindgen using Clang attributes.

However, the path I'm choosing to go down myself is neither. I decided to just accept that C just does not expose enough information. Instead of trying to add more information to C code (via magic comments or attributes), I'm using auxiliary code to direct bindgen using a plugin mechanism. This works well for my use case because I don't have control over SDL, or sokol, or flecs, etc.

For example, the pattern of SDL_Color* being an array; that can be transformed appropriately to C# via auxiliary code in the form of a plugin. In your other example, of ref Point vs Point[], this pattern would also be handled by auxiliary code in the form of a plugin. Side note: using Point[] would probably not be the best idea and Span<Point> would probably be a better fit; something which I already do for fixed buffers.

lithiumtoast avatar Nov 23 '22 11:11 lithiumtoast

Dear imgui apparently just released something like this, probably has a lot of work for C++ wrangling but still might be good for the other aspects of metadata generation: https://github.com/dearimgui/dear_bindings

flibitijibibo avatar Nov 25 '22 15:11 flibitijibibo

gendyapi parse all the SDL headers, to generate the DYNAPI files. I've tried a re-write in python to fix some bug / improve (#6783)

And it's been very easy to add a json dump of all SDL API which can be useful for generating bindings. of course, extra tags you would need for allocation/pointers are missing ... but this should be easy to parse when added and specified.

I know this is the inverse solution of using a "unique source" and generates the header. but at least, it can help to generate the "unique source" from all header, if that should be chosen.

1bsyl avatar Dec 07 '22 22:12 1bsyl

Yeah, this seems like a reasonable approach, we generate an API description from the header that can be marked up with more detail by people who are implementing language bindings.

slouken avatar Dec 09 '22 21:12 slouken

At this point also it might be worth adding code to handle APIs that have been removed, or at least add a checklist that someone can check. It won't matter once we've finalized the ABI, but it might be useful now.

slouken avatar Dec 09 '22 21:12 slouken

The ast-dump output format isn't guaranteed to remain fixed, but so far (for just parsing declarations) it hasn't changed.

A more robust solution is proabably a "proper" tool based on libclang.

the common lisp binding generation relies on c2ffi.

i'm not sure how c2ffi relates to clang ast-dump, and what justifies its existence (because i don't know much about ast-dump).

attila-lendvai avatar Dec 24 '22 01:12 attila-lendvai

A very plain XML file might be best, like GL and Vulkan do.

Lokathor avatar Dec 24 '22 01:12 Lokathor

The problem with this approach is that C sadly doesn't convey even remotely enough information to generate good APIs from.

my strategy is that i have the generated API in one package. it only deals with the basics, like string conversions/encoding, error return codes thrown as exceptions, etc. whatever can be done based on the info formally encoded in the C model.

then i have another package that is built on top of the generated one, and contains hand written "lispy" constructs that may use the full power of the host language.

attila-lendvai avatar Dec 24 '22 01:12 attila-lendvai

FTR, this is a related feature request: https://github.com/libsdl-org/SDL/issues/2059 (typedef for error return codes).

attila-lendvai avatar Dec 24 '22 01:12 attila-lendvai

Just to capture the suggestion in https://github.com/libsdl-org/SDL/issues/2059, if we generate separate API binding metadata, we could mark the SDL functions that return int as returning SDL_ReturnCode, which is defined to be 0 on success or < 0 on error, and SDL_GetError() would return useful information about what went wrong.

slouken avatar Dec 24 '22 16:12 slouken

@slouken please keep in mind that the current infrastructure for binding generation works based on the C domain (clang based AST walker).

in e.g. common lisp, it's trivial to map a return code of a specific C type to be thrown as an exception when the value is negative. whatever custom binding machinery is introduced by SDL will require extra work, i.e. probably remain unsupported. i, for one, will not put in the extra work to add support for something that is unique for SDL.

therefore, whatever can be encoded cheaply in the C domain, is more useful when encoded there, not in some machinery that is unique to the SDL library.

attila-lendvai avatar Dec 25 '22 00:12 attila-lendvai

I'm looking at creating some nodejs SDL bindings, because all the existing ones I can find on NPM are abandoned, out of date to the point of not even compiling, awful, incomplete, or all of the above. Having a machine-readable API definition maintained by the SDL team so that my binding-generation work is just "run regenerate.sh" would make that a lot easier, and I'd be happy to contribute to making it happen :)

(For context, I'm working on a gameboy emulator in a bunch of different languages, which also happens to be exercising a whole load of different language SDL bindings, if that's any use to anyone - https://github.com/shish/rosettaboy )

shish avatar Jan 09 '23 19:01 shish

@shish, if you want to work on an API definition, feel free to submit one! It sounds like you have a real-world use case, so if you want to use that as a basis, go for it!

I would suggest enhancing src/dynapi/gendynapi.py to automatically create the basic definition file, and then add comments to that letting people know what additional markup can be added to fine tune the binding generation.

slouken avatar Jan 10 '23 02:01 slouken

Note that you can run ./gendynapi.py --dump and it creates a "sdl.json" file with all SDL API inside. eg, a list of entries like this:

  {
        "comment": "the full raw comment",
        "header": "SDL_render.h",
        "name": "SDL_CreateRenderer",
        "parameter": [
            "SDL_Window *REWRITE_NAME",
            "const char *REWRITE_NAME",
            "Uint32 REWRITE_NAME"
        ],
        "parameter_name": [
            "window",
            "name",
            "flags"
        ],
        "retval": "SDL_Renderer*"
    },    

this one matches the function:

extern DECLSPEC SDL_Renderer *SDLCALL SDL_CreateRenderer(
    SDL_Window *window, 
    const char *name, 
    Uint32 flags);

the output format can be changed/improved if needed

1bsyl avatar Jan 10 '23 10:01 1bsyl

@shish, if you want to work on an API definition, feel free to submit one! It sounds like you have a real-world use case, so if you want to use that as a basis, go for it!

@slouken so this means you're open to the idea of having such an "official file" in the SDL repository?

@shish: I'm happy to support you with this task, if you want to tackle it. I'm taking a look at the gendyapi implementation

ikskuh avatar Jan 10 '23 10:01 ikskuh