uniffi-rs Handle name collisions with internal helpers

All bindings define some internal helper methods on the records and classes they generate—things like lift, lower, liftFrom, lowerInto, lowersToSize, toFFIValue, and fromFFIValue—as well as internal types like RustBufferStream. What happens if a type declares a method in its IDL with the same name as one of those internal helpers?

This is broadly related to #2, although more specialized.

Approach one

Blanket ban on defining methods with those names. This is probably the easiest and also most correct approach—we make the programmer think of a better name instead of trying to make it work. If we keep the set of reserved method names consistent across all our language bindings (even if they’re not necessarily consistent with the naming conventions of the target language—this is internal generated code, after all), we can keep the size of the blocklist down. Or, we can introduce language-specific blocklists. So RustBufferStream would be blocked only in Python, liftFrom, lowerInto, and lowersIntoSize only in Kotlin, and toFFIValue and fromFFIValue only in Swift.

A benefit of language-specific lists would also help us block accidental use of reserved words. For example, def is reserved in Python, but none of the others; trait can’t be an argument name in Rust, but can be used freely in others; func is reserved only in Swift, fun only in Kotlin, and so on. Of course, we could also block these globally—so you couldn’t use def, func, or trait as an argument name in any language.

Drawbacks: authors have to think of new names themselves, and it doesn’t handle extensions and subclasses (do we care?).

Approach two. Try to uniquify internal helper names. This would be more helpful for authors, but feels too magical, and has lots of corner cases and hacks. The idea is, we’d have a list of all user-defined method names, and generate a unique one for the internal helper only if there’s a collision. It keeps the generated code clean in most cases, since we’d only mangle names if we see a collision.

There are drawbacks to this, though. First, for methods, we have to rename the whole method globally, even if there’s only one collision. As soon as a type defines, say, a lower method, we have to call our internal one lower0 everywhere. It means we have to keep track of more state, making the parser more complicated. Also, each language has its own notion of what’s a collision and what’s not—in Swift, labels and return types are part of the name, so it’s perfectly OK to have lower(into writer: Writer), lower(b: Int), and lower() -> Foo; in other languages, it’s not. And it also doesn’t handle extensions and subclasses—what happens if you subclass or extend a generated type, and define a reserved name on it?

Approach three. Unconditionally append a random, short ID to internal symbols. Essentially, this means doing our own name mangling—things like Lowerable_AZBYCX1234. This makes the generated code uglier and harder to debug, and does it for all bindings, even if they don’t define colliding names. It also introduces churn, because regenerating the bindings would write a new ID (though we can make the scheme predictable—again, borrowing from how name mangling works in other languages). And it makes collisions virtually impossible.

Approach 3.1: prefix or suffix internal stuff (and reserved words!) with _. Also easy, and probably good enough!

I feel like approach one is the easiest and least magical option, but let’s discuss, and mention other approaches we think of!

┆Issue is synchronized with this Jira Task ┆Issue Number: UNIFFI-2

Jun 26 '20 15:06 linabutler

I guess approach four is, do nothing, and let authors figure it out when they get a compile error. While I think we could be a little more helpful (it’s not immediately obvious to make the connection from an error in codegen to “oh, right, I can’t name this argument func in Swift” or “hmmm, it doesn’t like my Lowerable type, I guess there’s already a helper that defines one”), our target languages do have good compiler errors.

Jun 26 '20 15:06 linabutler

Approach 1 seems totally fine to me. Its not like these names are very likely to cause collisions, but it would be good to give good error messages in the very unlikely case that collisions happen.

I think if it could piggyback off of whatever approach is used to complain about using other reserved words that would be good.

prefix or suffix internal stuff (and reserved words!) with _

Note that if we do this for C it's a problem, all symbols beginning with _ in the global namespace are reserved IIRC (and all symbols that are _ followed by a capital in any namespace, including local vars, are reserved).

That said suffix is probably fine so long as it doesn't have __, which is also reserved in all namespaces.

Jun 26 '20 16:06 thomcc

That said suffix is probably fine so long as it doesn't have __, which is also reserved in all namespaces.

It occurs to me that we should probably error for this too if a user symbol in rust has it after our transformations. I'd rather not invent our own mangling scheme to solve these (although in principal it would probably work), so I think we'll probably need a function that validates that an identifier is valid to expose.

Jun 26 '20 21:06 thomcc

I think if it could piggyback off of whatever approach is used to complain about using other reserved words that would be good.

Such a thing doesn't really exist yet, but 100% should and definitely sounds like the right place to do these checks..!

Jun 29 '20 08:06 rfk

Approach 3.1: prefix or suffix internal stuff (and reserved words!) with _. Also easy, and probably good enough!

In my head, I think I was more-or-less intending to take this approach, but @thomcc raises a good point about it conflicting with other naming conventions.

Another option I was thinking about, which makes more sense in some target languages than others, was to actually move all of those helpers out of the public classes themselves. Strawman: if we expose a record name Point as a class, then the only methods on that class are the ones defined by the component. Internally, we have a separate helper class named _Uniffi_PointRecord on which we define the helpers like lift, lower, etc. Instead of calling p.lower(), the generated code would call _Uniffi_PointRecord.lower(p).

This would keep the exported namespaces nice and clean, but might make the generated code much messier and harder to debug. It might also show up as visible artifacts when using the public interface, such as confusing lines in your tracebacks.

Starting with Approach One (the blanket ban) seems like the right move I think, since it's easier to loosen restrictions in future then to tighten them.

Unconditionally append a random, short ID to internal symbols. [...] though we can make the scheme predictable—again, borrowing from how name mangling works in other languages).

Unrelated to the public API surface, I was wondering whether we should try something like this as a safety measure for the extern C layer:

Predictably generate a "tag" for every item in the FFI layer, perhaps as a hash of its components in the IDL. It would be important for this value to be a deterministic function of the IDL.
Append this tag to all symbols exported from the dylib, so that e.g. the fxa_new function becomes fxa_new_1234.
Have both sides of the FFI expect to find the exported symbols under their tagged names.

That's a bit of extra complexity in the generated bindings, but it would help prevent accidentally using the Kotlin bindings from version X or a component with the dylib from version Y; it wouldn't even be able to load the dylib successfully because all the symbol names would be different!

Jun 29 '20 08:06 rfk

uniffi-rs uniffi-rs copied to clipboard

Handle name collisions with internal helpers

uniffi-rs
uniffi-rs copied to clipboard