icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

IterableDataProvider::supported_locales is a misnomer

Open sffc opened this issue 1 year ago • 13 comments
trafficstars

Now that we have a more well-established notion of a "supported locale" (#58), we should rename this trait function.

I suggest: iter_locales, and maybe make it return an iterator (if not too hard).

sffc avatar May 06 '24 18:05 sffc

I believe this should be

fn supported_requests(&self) -> Result<HashSet<DataRequest>, DataError>

This is because after moving key attributes to the request, supported-locales is not enough to enumerate all supported requests. And if we're using the "requests" phrasing, this will not clash with the notion of supported locales. Also, for performance, I'd like this to return a HashSet instead of an impl Iterator, like we already do internally in datagen.

robertbastian avatar May 21 '24 10:05 robertbastian

I see you implemented this in #4981

I'm still a little worried about calling it "supported", since the set of supported requests is not finite, but it is not as bad as "supported locales". Calling it "iter" takes away any notion of the semantics of the return value.

sffc avatar May 31 '24 18:05 sffc

I made the change from locales to requests, as the PR requires that, and from Vec to HashSet, as that gets rid of a hash-set-to-vec conversion in icu_datagen. I have not made a final decision on the name.

robertbastian avatar May 31 '24 18:05 robertbastian

Yes, those two changes are good. I would say to use BTreeSet since it is in alloc but we can't do that easily because we don't have Ord, and HashSet is probably going to be a bit faster anyway since there could be hundreds or thousands of entries. Just need to bikeshed the name.

sffc avatar May 31 '24 19:05 sffc

As part of fixing this, we should make the value of the HashSet be something that can implement Borrow to a fully borrowed type so that lookups can be cheaper. https://github.com/unicode-org/icu4x/pull/4981#discussion_r1622838917

sffc avatar May 31 '24 23:05 sffc

I suggested up there that icu_provider should export a helper struct called something like LocaleWithAttributes that has all the impls we need.

Such a struct could be a field of DataRequest, but I am not proposing that.

sffc avatar May 31 '24 23:05 sffc

^ If we use this name, the function can be called iter_locales_with_attributes. Problem solved.

sffc avatar May 31 '24 23:05 sffc

Such a struct could be a field of DataRequest, but I am not proposing that.

No, because we use the fact that locale and attributes can be independently borrowed in fallback.

robertbastian avatar May 31 '24 23:05 robertbastian

It would be another public exhaustive struct so things could still be borrowed separately

sffc avatar May 31 '24 23:05 sffc

A struct over two borrowed values cannot be borrowed as a struct over the owned values. The pair we're currently using is basically a struct, so the problem is not solved by having a struct.

Re name, what about allowed_requests?

robertbastian avatar Jun 03 '24 10:06 robertbastian

Maybe just make them both cows?

pub struct LocaleAndAttributes<'a, 'b> {
    pub locale: Cow<'a, DataLocale>, // or LanguageIdentifier
    pub key_attributes: Cow<'b, KeyAttributes>, // or KeyAttributes<'b>
}

I'd rather have the indirection than all the extra cloning.

Re name, what about allowed_requests?

Maybe but I'm not sure "allowed" is much better than "supported". They both make it sound like these are the only requests you can pass in, but you can pass in any request.

sffc avatar Jun 03 '24 21:06 sffc

Yes I'm using Cows in #4995. This API can stay the same.

robertbastian avatar Jun 03 '24 21:06 robertbastian

Preliminary discussion:

struct DataIdentityOwned { pub locale: DataLocale, pub marker_attributes: DataMarkerAttributes }
struct DataIdentityBorrowed<'a> { pub locale: &'a DataLocale, pub marker_attributes: &'a DataMarkerAttributes }

struct DataRequest<'a> { id: DataIdentityBorrowed<'a>, metadata: DataRequestMetadata }

/// A [`DataProvider`] that can iterate over all supported [`DataLocale`] for a certain marker.
///
/// Implementing this trait means that a data provider knows all of the data it can successfully
/// return from a load request.
pub trait IterableDataProvider<M: DataMarker>: DataProvider<M> {
    /// Returns a list of supported `DataRequests`, in owned form.
    fn iter_requests(&self) -> Result<BTreeSet<DataIdentityOwned>, DataError>;
}
pub trait Bikeshed<M: DataMarker>: DataProvider<M> { 
    fn can_load(&self, req: DataRequest) -> Result<bool, DataError>;
}
  • check_req is internal to DatagenProvider, so supports_request does not need to be part of the iter trait. The iter trait's invariant should be that it returns the tight set of requests, which is what check_req internally does

robertbastian avatar Jun 20 '24 12:06 robertbastian

@robertbastian What is left on this issue?

sffc avatar Jul 19 '24 22:07 sffc

Nothing.

robertbastian avatar Jul 23 '24 09:07 robertbastian