rust-bindgen icon indicating copy to clipboard operation
rust-bindgen copied to clipboard

size_t_is_usize should default to true

Open dkg opened this issue 3 years ago • 7 comments

In #1671, there is a reasonable argument made that a principled mapping between size_t and usize is improper.

However, all platforms that i'm aware of do equate the two.

bindgen's defaults since 0.53 mean that any binding that exposes a size_t results in an incompatible API across platforms. That is, if some package foo exposes a bindgen-derived mapping of a C function size_t bar(), then bar will have a different signature on x86 (where it returns a u32) than on x86_64 (where it returns a u64). All downstream code that relies on that function will find itself dealing with the same divergent API across platforms. That's pretty weird and ugly for a typesafe langauge like rust. The behavior before 0.53 meant that the exposed API was congruent across platforms.

I think bindgen's default should be what the pre-0.53 behavior was, even though it is not in principle "correct". If we do that (so that size_t_is_usize defaults to true), then if bindgen is handling any size_t on a platform where usizesize_t it could fail and abort. A package that wants to can still set size_t_is_usize(false) if they want to be able to build on such a platform (and impose the consequent API type churn on their downstream dependencies).

dkg avatar Oct 16 '20 16:10 dkg

If bindgen does make this change, then it'll cause another API change in downstream projects that were already using bindgen between 0.53 and 0.55 that have not explicitly set size_t_is_usize(true) and exposing a size_t somewhere. but i think that risk of API change is worthwhile for the convergent API across platforms (and for keeping the generated API stable from versions of bindgen before 0.53 as well).

dkg avatar Oct 16 '20 16:10 dkg

I've separated out the idea of doing a check (failing with an error) as a different ticket (#1903), so that this ticket can focus specifically on changing the default.

dkg avatar Oct 16 '20 17:10 dkg

:+1:

To make sure I'm understanding, and to clarify why I think this is important:

The previous behavior allowed me as a crate author to pre-build bindings and include them with my crate. This let's my users use my crate without having to compile the bindgen dependency tree, which necessarily is very large and slow to build. Skipping this is a real improvement for my users, who, like all rust users would love to speed up their builds.

Maybe packaging prebuilt bindings is controversial - but it's far from niche:

  • https://github.com/georust/gdal/tree/master/gdal-sys/prebuilt-bindings
  • https://github.com/gyscos/zstd-rs/blob/master/zstd-safe/zstd-sys/src/bindings.rs
  • https://github.com/rusqlite/rusqlite/tree/master/libsqlite3-sys/bindgen-bindings

Typically (as with these crates) a bindgen feature is exposed to allow the user to opt-in to building bindgen and generating their own bindings at compile time.

It does seem like though usize is technically not the same thing as size_t and that they probably have different values on some unpopular platforms, that on all common platforms it "just works", and this change has caused the popular prebuilt bindings workflow to fail across popular platforms.

One other thing I thought was interesting - seemingly the libc crate makes the same "mistake": https://docs.rs/libc/0.2.30/src/libc/lib.rs.html#134

pub type size_t = usize;

michaelkirk avatar Dec 10 '20 03:12 michaelkirk

I wonder if the ultimate solution is to try to get c_size_t added to std::os::raw 😬

michaelkirk avatar Dec 10 '20 04:12 michaelkirk

The previous behavior allowed me as a crate author to pre-build bindings and include them with my crate.

Our crate, nettle-sys, does not pre-build bindings, and we are also negatively affected by this change. From my point of view, these are the negative effects of this change:

  • Updating bindgen from what we use now (0.51.1) to 0.53 breaks the API exposed by nettle-sys.
  • Our high-level crate, nettle-rs, has a usize and wants to hand it to the low-level crate. The change in bingen 0.53 means that nettle-sys' API now depends on the platform. How should the high-level crate handle this?

teythoon avatar Dec 21 '20 22:12 teythoon

Indeed, the change associated with #1671 breaks builds of my thesis work under newer versions of bindgen. I can work around the issue with --size_t-is-usize, but this would be annoying to incorporate into my build system because old versions of bindgen don't expect that flag and I'd have to do version detection to continue to support them. So for now, I'm stuck putting a wrapper script in my path that always passes the flag to the real bindgen. Ugh.

solb avatar Jun 04 '21 19:06 solb

This behaviour (the type being u32 on some platforms and u64 on others) is what appears to be leading to this issue: https://github.com/rvolosatovs/fvad/issues/1, whereby a crate can't compile on ARMv7 because the wrapper crate for the bindings has been developed on a system with u64s and accidentally built that into its assumptions.

I acknowledge the conflicting requirements but this does feel a lot worse than making the cheeky assumption or assertion that usize is size_t.

I don't think the library author stood much of a chance of noticing this and it's not clear to me that the users of a crate should be having to suspect that the type annotations of their dependencies will change per-platform.

I wonder if the ultimate solution is to try to get c_size_t added to std::os::raw grimacing

This sounds sensible to me, though in the meantime, wonder if this could be done with an external crate which defines c_size_t alternatively. I guess it's not ideal, but it would mean there's a definite type which clearly says 'I will change meanings on different platforms' and it would keep it confined to one place.

reivilibre avatar Jun 08 '22 22:06 reivilibre