icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

FFI errors bikeshed discussion

Open Manishearth opened this issue 2 years ago • 3 comments

In https://github.com/unicode-org/icu4x/pull/2045 I added errors for our existing FFI APIs. You can find them all here:

https://github.com/unicode-org/icu4x/blob/5887c4b9d2806db8ae50976a9721d86e81845db0/ffi/diplomat/src/errors.rs#L21-L71

The intent was to lay a framework without making final decisions, here are some bikesheddy decisions that cropped up.

  • I have given the errors some semblance of organization by grouping them in spaces of 256 such that future errors can be added with values near them. The first block is for common errors, and the second is for data errors.
  • The errors are all named with camelcase names, as PluralsOopsieError instead of ERROR_PLURALS_OOPSIE. I decided to stick to idiomatic rust, the names are idiomatic for C++ too, and these are strings in JS that are not quite idiomatic but I don't think it's a huge deal.
  • The errors are all suffixed with Error (PluralsOopsieError instead of PluralsOopsie). I don't think I like this.
  • The errors are all prefixed with some kind of category name (PluralsOopsie, PluralsWhoopsie) and organized accordingly.
  • Currently I have LocaleUndefinedSubtagError for when you try and get a .script() subtag for a locale without one defined (etc). In Rust this API returns None, but returning a null stringy type is fraught in some languages (Javascript), and thus Diplomat does not yet natively support doing so, nor am I convinced that it should. Shoudl we be returning a single NoneResult error type for all APIs like this, or should we make API-specific ones (e.g. LocaleUndefinedSubtagError)

Thoughts?

Manishearth avatar Jun 13 '22 16:06 Manishearth

Furthermore, component owners should review the errors I made up, and make PRs or issues to fix them if they need changes:

  • [ ] @zbraniecki datetime, plurals, locale_canonicalizer, locid
  • [ ] @sffc plurals, decimal (Shane already approved that PR but I had explicitly asked for focus on the general strategy rather than specific error ontologies, giving him another chance to disagree here)
  • [X] @dminor datetime, locale_canonicalizer
  • [ ] @nordzilla datetime
  • [x] @echeran properties
  • [ ] @younies bidi (Bidi does not have its own errors and just returns OutOfBoundsError if you request something out of bounds)
  • There are currently no segmenter errors

Basically, make sure they are at an appropriate level of granularity and that the names make sense. I've mostly just copied over names from our existing error enums but it may not always make sense over FFI.

Manishearth avatar Jun 13 '22 16:06 Manishearth

  • @zbraniecki - The redundancy of "Error" is important over FFI because there is a namespace soup
  • @zbraniecki - For returning Option<Script>, shouldn't the return value be null or a String? Can Diplomat handle that?
  • @Manishearth - Not today, but I think it could be extended to support that, but it is tricky to do it now. The big problem is that it's hard to differentiate empty string from null. And APIs that return both empty strings and null are iffy.
  • @zbraniecki - In JS, the convention should be to return null, and that should work with TypeScript as well.
  • @Manishearth - What should we do for 1.0?
  • @sffc - I'd prefer adding a new function (so that we can add a better function later) and return a result with unit error.
  • @Manishearth - We could name it try_script().
  • @robertbastian - I'm thinking across languages. How about exceptions in Java? I feel like we should distinguish error enums so they can be handled specially.
  • @Manishearth - Exceptions have a cost. It could go into an additional wrapper.
  • @sffc - Where do we draw the line between component-specific errors getting their own type or being folded in with a common error?
  • @Manishearth - I lean more toward common errors.

Proposed error enum:

    pub enum ICU4XError {
        UnknownError = 0x00,
        WriteableError = 0x01,
        OutOfBoundsError = 0x02,
        ParserError = 0x03,
        DataStructValidityError = 0x04,

        // general data errors
        // See DataError
        DataMissingResourceKeyError = 0x1_00,
        DataMissingVariantError = 0x1_01,
        DataMissingLocaleError = 0x1_02,
        DataMissingResourceOptionsError = 0x1_03,
        DataNeedsVariantError = 0x1_04,
        DataNeedsLocaleError = 0x1_05,
        DataExtraneousResourceOptionsError = 0x1_06,
        DataFilteredResourceError = 0x1_07,
        DataMismatchedTypeError = 0x1_08,
        DataMissingPayloadError = 0x1_09,
        DataInvalidStateError = 0x1_0A,
        DataCustomError = 0x1_0B,
        DataIoError = 0x1_0C,
        DataUnavailableBufferFormatError = 0x1_0D,

        // property errors
        PropertyUnknownScriptIdError = 0x2_00,
        PropertyUnknownGeneralCategoryGroupError = 0x2_01,

        // decimal errors
        DecimalLimitError = 0x3_00,
        DecimalSyntaxError = 0x3_01,
    }

sffc avatar Jun 17 '22 17:06 sffc

  • Regarding the grouping, that sgtm. Markus does similar grouping with extra spaces for enums in ICU4C. I think there's a typo in one of them, PTAL at #2102
  • Camel casing instead of uppercase snake-casing sgtm. Also prefixing with category name sgtm.
  • Regarding removing the existing suffixing with "-Error", I would agree so long as no target language would expose the error names to the user, or at least wouldn't do so in a way that would be confusing. In C/C++, we're dealing with glorified integers, but at least in JS, you're dealing with strings that represent the error name, right? (For JS, was there a plan to have UPPER_SNAKE_CASE that changed into PascalCase by the time the PR landed?) Thinking about Java, the Java library has lots of concrete instances of Exception and Error, but I suppose any Java APIs will bypass that exception throwing machinery by virtue of the fact that it's using WASM / code from Diplomat? If Java users wouldn't see any error names (unless it's an enum variant under an enum called ICU4XError), then that's a good sign that it should safe to turn ICU4XError::DataIoError into ICU4XError::DataIo.

echeran avatar Jun 23 '22 22:06 echeran

I'm going to mark this as closed, this has had sufficient time to be discussed and reviewed.

Manishearth avatar Sep 21 '22 19:09 Manishearth