[WIP/proof-of-concept] Convoluted little-endian support
The hvgl and hvpm tables, unlike everything else in OpenType, store everything in little-endian. font-codegen, font-types, and read-fonts currently don't support it at all.
There are a few paths we could take for implementing little-endian support:
- Don't add codegen support for it at all, and parse little-endian tables manually.
- Add second-class support for little-endian types in codegen and font-types. Continue treating big-endian as the default, but add enough little-endian support that we can automatically parse
hvglandhvpm. This is the approach that this PR takes. - Add full little-endian support in codegen and font-types. Codegen inputs will be treated as big-endian by default, but can be treated as little-endian on either field-level or table-level granularity. Things like the
Scalartrait will require implementing conversions to and from big-endian and little-endian, and all types which currently assume big-endian will have to abstract over endianness.
This definitely shouldn't be merged as-is, but I'm creating it to help spark some discussion around the design details here.
For this PR, I implemented little-endian support with a new ScalarLE trait which is a subtrait of Scalar--that lets it piggyback off some of the byte-conversion stuff, while not requiring all Scalar types to implement little-endian conversions that will likely go unused. It's definitely more convoluted like this, though.
Things like ArrayOfOffsets currently use the BigEndian<T> wrapper type. This PR also adds LittleEndian<T>. In order to make ArrayOfOffsets work with either, I've introduced the BytesWrapper trait (name very not final), which represents an unaligned byte slice that can be parsed into its underlying type. This of course means that all methods that were previously on BigEndian<T> are now trait methods on BytesWrapper, and require importing that trait everywhere we want to read a BigEndian<T>. That's very annoying.
For codegen, I implemented little-endian support at field-level granularity, but that's currently unnecessary since hvgl and hvpm are entirely little-endian and all other tables are entirely big-endian. If I want to implement this for real, I think I'd go back and move the #[little_endian] attribute to tables, but I did want to get others' opinions here.
Again, this is mainly a proof-of-concept, but it highlights some of the details that we'll want to sort out.
/cc @cmyr, since you did a lot of the codegen and font-types work
Thank you for getting this started!
I would us to take the "full little-endian support in codegen and font-types" path unless there is a strong reason to do otherwise.
I tend to think we should 1.0 with the featureset live in Chrome (https://github.com/googlefonts/fontations/pull/1329) before we start landing this. I will take that up with the team.
I think I'd go back and move the #[little_endian] attribute to tables, but I did want to get others' opinions here.
+1 to assignment at broad scope, I expect that will simplify things and I don't think we anticipate mixed-endian structures at time of writing.
I tend to think we should 1.0 with the featureset live in Chrome (#1329) before we start landing this. I will take that up with the team.
There was some discussion in #1329 that the release schedule is expected to slow down after 1.0, but little-endian support will definitely be a breaking change and hence a 2.0 for read-fonts and font-types. How is versioning going to work post-1.0? Will the branch for the next major version be the main one, or will it be the branch for the current one? Or will the commits just stay linear?
I expect that little-endian support may still need some tweaking even after it first lands, once hvgl outlining is actually implemented in skrifa.
I'll block out some time to take a look at this tomorrow!
I've actually played around with that approach in another branch. I also think I prefer it, although it might result in generating little-endian I/O functions for types that only exist in big-endian tables.
Right now I've hit a wall figuring out how to support writing little-endian tables--FontWrite is recursive, and I don't want to have to separately implement FontWriteBE and FontWriteLE traits or something. For that purpose, I think an endianness marker type would be beneficial--the FontWrite implementation for a nested structure would just pass the endianness marker down all the way the "leaves" (at compile time, as a type parameter), which would be the only types that need to care about endianness.
I'll take another pass over the API with marker types, and see if it makes things cleaner when it comes to implementing support for writing-little endian (and also if it could somehow help reduce boilerplace for newtypes which wrap a Scalar?)
One auxiliary thing I noticed: with the ArrayOfOffsets changes, I think it's possible to remove ArrayOfNullableOffsets. If ResolveOffset has a generic associated type Resolved<T>, it could just be T for Offset, but Option<T> for Nullable<impl Offset>. This would require swapping the resolved type for a nullable offset from Option<Result<T, ReadError>> to Result<Option<T>, ReadError>, which is a rather broad change and I'm not sure if it'd affect semantics at all. Not something we have to decide on now, but an interesting option for the future.
I'll think about this more in the light of day, but generally I think you're on the right path.
One additional idea that behdad mentioned is that it would be nice if the annotation indicating endianness was on the table level, instead of the field level; and then maybe we set it on the fields after everything is parsed, around the same time we do the sanity_check pass?