axum Add extractor for user language

This is a draft implementation for the feature request #2197 (Adding a UserLanguage extractor).

Before adding tests and documentation, I wanted to make sure that this is the right direction for this feature.

Aug 28 '23 18:08 frenetisch-applaudierend

A few specific questions that popped up:

Should this be feature gated? It adds no additional dependencies, but I'm not sure exactly what the policy is.

Is the module layout ok? I tried to mimic what I saw in other parts, with creating a top-level module and re-export just the extractor from the extractor module.

I was thinking about the type for the user language code. Right now it is a String, since that's what's usually provided from the request and it does not constrain the supported values. However this means a user of the UserLanguage extractor has to care about handling all possible cases of upper/lowercase strings and locales with country ids (en-US vs en).

We could use unic_langid::LanguageIdentifier which would make this easier, but this means that only valid language identifiers would be supported (e.g. lang=German would not be supported anymore). Additionally this adds a new dependency which may not be desirable.

Another approach could be to make the extractor UserLanguage<T>, where T: TryFrom<String> with a default of String. This would allow users to provide their own type like LanguageIdentifier or a custom enum type with the supported languages. Though I'm not sure if it is worth the added complexity.

Aug 28 '23 18:08 frenetisch-applaudierend

Just wanted to check in on this. Do you have any comments on this, or is it fine for now and I should just continue?

Sep 26 '23 10:09 frenetisch-applaudierend

I haven't had time to look at it yet :/ I'd say you're free to continue. You can always ship it in its own crate if we decide it's not appropriate for axum. I think its an extractor that makes sense to have at least in some library somewhere.

Sep 26 '23 11:09 davidpdrsn

Sure, no worries, thanks for the feedback. I'll continue then to try and get it ready 👍

Sep 26 '23 12:09 frenetisch-applaudierend

I added some tests and documentation, as well as tried to be more consistent with other extractors.

Sep 29 '23 09:09 frenetisch-applaudierend

Hi @JonahKr

Thank you for your comments! I'll try to address then below.

First of all the Header Source in my opinion makes sense. You might additionally be able to contribute your work on the header to the headers library (https://github.com/hyperium/headers) so it can be extracted using the TypedHeader as well.

Good point, I'll look into that! That would also make the code here simpler, as some general logic can be extracted.

//Proposed Extractor
async fn posts(lang: Extension<UserLanguage>) {
// validate language and adjust query or smth
}

You might have misread the example a little here. Extension is only needed to configure the extractor, not to use it (and configuration is optional if the defaults fit your use case). So the handler would match the others as well:

// Proposed Extractor
async fn posts(lang: UserLanguage) {
    // Read preferred language and adjust query or smth
}

The current benefit of your proposal is that you can add an Extension which can be added to every handler in the attached router and receive a standardized language struct following predefined sources.

In my opinion, the main benefits of this proposal are readability and flexibility. Standardization in the current form is not (as you have noted).

Readability, because it should be immediately clear when seeing a UserLanguage extractor what the intention is. Comparatively, using a cookie extractor and maybe a header extractor and feeding them into a custom function needs more effort to understand what is going on.

Flexibility, because if you want to change where you read your user language from, it can be done from a single place. When using the existing extractors directly in your handlers you would have to change each handler.

Of course you can use the existing extractors to build your own extractor, which has the same benefits. But the premise here is that this need is universal enough to warrant it being part of the ecosystem "out of the box".

While for the header this makes total sense with a standardized format and the quality value syntax, the added value of the other two sources is only limited since it doesn't alleviate any handling for the end user. The languages still need to be validated for them to be usable anyway, so why not just use what's already there? I get what you are trying to achieve but in my opinion, the current design with the configuration and extension abstraction is either too much for what is currently implemented or might need some further development for the validation of languages following the IETF BCP 47 language tags or similar, at which point it might make sense to move this to an external crate altogether.

I think there are two different tasks here:

Getting a list of preferred languages for the given request
Making sense of this list and decide what to do with this information

This extractor only handles the first task, and the abstractions are built around this task. There is one extractor, one trait for the language source and one configuration to hold sources (plus some provided sources that in my opinion are universal enough to be shipped along). I don't feel this is going overboard, but maybe you could elaborate what you would see as an acceptable amount of abstraction, just for this task?

The second task is subject to the application itself. Even if you know that you get language tags as from the Accept-Language header (like de or de-DE), it's still not trivial to then map this over your supported languages. You would have to handle cases where you have to decide whether de should match de-CH, or de-AT should match de-DE etc. This logic is out of scope of this extractor in my opinion, especially considering that the translation tool of your choice might already implement that for you. You might also decide that for your application only having d and e as supported language ids is sufficient, and I feel that should be a supported use case as well.

I agree that it would be helpful if the language extracted would match some known specification. But as you mention, this would likely mean more complexity than is warranted. In the end I think retrieving the list of languages is enough beneficial to exist on its own.

Thanks again, and I hope I could address at least some of your questions and concerns?

Jan 07 '24 20:01 frenetisch-applaudierend

with the amount of code that this extractor adds, its need for configuration, and the complexity of it in general, IMHO it would be best to keep this extractor in a dedicated project, outside of axum (or axum-extra)

Dec 26 '24 13:12 Turbo87

Hi @Turbo87

Sure, I understand. Would parts of this PR make sense in axum(-extra) in your opinion? If so I‘m willing to strip the PR down to the essential bits.

Jan 09 '25 19:01 frenetisch-applaudierend

I think what would make sense in axum-extra is support for the Accept-Language header. Ideally that would come from corresponding support for that header in https://github.com/hyperium/headers, but it looks like they haven't been able to decide on an API for that for years, so it's probably not worth waiting on that. there is https://github.com/hyperium/headers/blob/headers-v0.4.0/src/disabled/accept_language.rs though, which appears to refer to a previous implementation.

it looks like actix has something similar too: https://docs.rs/actix-web/4.9.0/actix_web/http/header/struct.AcceptLanguage.html

Jan 10 '25 02:01 Turbo87

Honestly, I'm not even sure that an extractor for a specific header like Accept-Language should exist in addition to TypedHeader<ThatHeader>.

As such, I will close this. Let's discuss in an issue or discussion before the next PR.

Feb 23 '25 19:02 jplatte

axum axum copied to clipboard

Add extractor for user language

axum
axum copied to clipboard