mime icon indicating copy to clipboard operation
mime copied to clipboard

Propose Tenets/Values

Open seanmonstar opened this issue 2 years ago • 9 comments

This is my attempt at writing up some design tenets for the mime crate. I've tried a few times to create a good library, and have failed to find time to come up with a design to fit these (see #153). Hopefully writing these up will allow others to discuss and create a proposal and then implement one.

Note that these values aren't necessarily in the right order yet, but I think they largely match what is important.

Parse, don't validate

Construct a media type should parse it, ensuring the values are legal characters and the structure is correct. It should not be possible to build one manually without verification, so that other functions and libraries may rely on the value only containing legal characters.

This would imply that the internal fields are all private.

Easy construction

It would be preferable that constructing a media type were "easy", such that it didn't require a bunch of type imports. Fluid, type-safe construction is a plus, but only if it is convenient.

Specifically, it must be possible to construct a media type as a compile-time constant. We aren't going to have a full registry of all the possible types, and if someone knows their media type beforehand, it should be cheap/free to construct it.

Expressive matching

Besides creating parsed/valid media types, another significant reason to do so is to allow for expressive matching. Rust has power pattern matching, it would be a shame if we could do something to the effect of:

match mt.substance() {
    (Text, Plain) => (),
    (Text, _) => (),
    _ => (),
}

Another possible way of being expressive is using traits. Then, someone could restrict a media type further, by saying "only text media types, such as download<M: mime::Text>(m: M). Could be over-engineering, though.

Types vs Ranges

Some implementations conflate the concepts of a media types and media ranges. For instance, text/* is valid media range for an Accept header, but it is not a valid type to send in a Content-Type header.

Thus, it would make sense to make the common */* and such illegal, or be required to be a separate type, such as a MediaRange.

However, dealing with q-values is likely out of scope for this crate. (q-values are a more general concept that apply to several kinds of Accept-* headers).

Performance

It is preferable that parsing media types and rendering mime types be as fast as possible. The easiest thing to do would be to construct a bunch of separated Strings, but we should prefer less allocating and copying. This is another reason it would be wise to keep the fields and internals private; we can do performance optimizations freely.

seanmonstar avatar Aug 21 '23 21:08 seanmonstar

Thank you for that!

Two suggestions:

  • normalize either on the "media type" or the "MIME type" naming. mime::MediaRange looks a bit confusing to me.
  • choose which specification to follow, the web browsers one or just the IETF one. I would tend to suggest following the web browsers one that is more "practical" and reflects what web browsers are actually doing.

Tpt avatar Aug 24 '23 06:08 Tpt

Some ideas:

  • use this web site (or another one) to generate all known mime types.
  • Allow passing a path through ENV variable pointing to a file with mime types defined with the same format, to add some mime types that weren't inside the external source
  • parsing the string using a finite automata, generated at compile time

wiiznokes avatar Jun 05 '24 10:06 wiiznokes

MediaRange

Does this something have a standard (RFC)? I don't find one.

Expressive matching

I have a use case like:

    if content_type.type_() != mime::APPLICATION {
        return Err(make_error().into());
    }
    if content_type.subtype() != mime::JSON && content_type.suffix() != Some(mime::JSON) {
        return Err(make_error().into());
    }

So perhaps pattern matching can hardly work on this kind of scenarios. Better methods would help.

tisonkun avatar Aug 22 '25 14:08 tisonkun

I can imagine a structure like:

struct MediaType {
  ty: Cow<'static, str>,
  subty: Cow<'static, str>,
  suffix: Cow<'static, str>,
  params: Vec<Cow<'static, str>>,
}

.. that can be constructed at compile time (with params always empty, or else it should be constructed at the runtime).

Automation to prebuild all known registered entries is possible.

I don't know where comes MediaType range comes from. Seems users can have their own utility to check media types.

@seanmonstar

tisonkun avatar Aug 22 '25 14:08 tisonkun

Media ranges can be found in RFC 9110 SS 12.5.1. Accept.

I have a use case like:

Could that be solved with patterns like:

match expr {
    (APPLICATION, JSON, _) |
    (APPLICATION, _, Some(JSON) => {},
    _ => return make_err(),
}

seanmonstar avatar Aug 22 '25 14:08 seanmonstar

Media ranges can be found in RFC 9110 SS 12.5.1. Accept.

@seanmonstar Thanks for your information!

Combined with "However, dealing with q-values is likely out of scope for this crate.", I wonder how other hyperiumn projects currently use this crate.

I can see headers and reqwest depend on mime. This is the reason I'd look into this crate - it is already in the dependency tree.

I'm considering collect our current use case, and then make the minimal standard-respected feature set to implement. MediaType range, esp. with q-values, looks quite a more full-featured solution, while our users like https://github.com/hyperium/mime/issues/153#issuecomment-2401002882 want only constants, and I can understand why parsing and pattern matching can help.

tisonkun avatar Aug 22 '25 15:08 tisonkun

I’m thinking about taking a crack at this, and I wanted to check in about my potential approach. I started an experiment at danielparks/mime_const, but moving forward it makes sense to work on this crate.

Easy (compile-time) construction

I think the cleanest solution is likely to be const compile-time parsing. A macro requires a public method to construct the type at compile time, i.e. in const context, anyway, as @seanmonstar mentions in #153:

Macro required exposing an "__private_dont_use" field that the macro could set. const fn has advanced that we probably could live without a macro.

I propose this (copying jiff):

const MARKDOWN: MimeType = MimeType::constant("text/markdown; charset=utf-8");

We could make a ConstInto trait to allow something like the below, but it barely saves any typing and seems too generic — I’m inclined to wait for something to be added to core.

const MARKDOWN: MimeType = "text/markdown; charset=utf-8".const_into();

Downsides

panic!() in const context requires Rust 1.57; the current MSRV of the last release (0.3.17) was 1.16 (!) and the MSRV of the master branch is 1.31.1.

We could use [/* error message in comment */][0] to cause a panic, but it’s ugly, confusing, and it only gets us back to Rust 1.46, when if and match in const context were stabilized. I'm happy to go with that if people think it’s better.

Parse, don't validate

I think this might need to be compromised a bit for ultimate flexibility — we can’t support an arbitrary number of parameters without Vec, and that runs into problems because Drop isn’t allowed in const context. (For example, imagine you try to construct your Vec of parameters, but the last one is invalid and you have to return an error.) It might be possible to work around this, but I had trouble with it even when disallowing arbitrary parameters in const.

I think running the parse code and discarding the results should be fine for validation. There should never be a case where the parse code would have caught an error but validation wouldn’t.

Expressive matching

Easy to do with both constants (matches!(content_type.substance(), (Text, _))) and strs (matches!(content_type.str_substance(), ("text", _))).

Types vs Ranges

I’m inclined to reuse the code in master as much as possible. Seems doable.

Performance

Again, I’m inclined to reuse the code in master.

danielparks avatar Sep 12 '25 16:09 danielparks

I’ve been reading specs a little more, and I wonder if mime should support the extended parameters from in RFC5987 §3.2.1:

parameter     = reg-parameter / ext-parameter

reg-parameter = parmname LWSP "=" LWSP value

ext-parameter = parmname "*" LWSP "=" LWSP ext-value

parmname      = 1*attr-char

ext-value     = charset  "'" [ language ] "'" value-chars
              ; like RFC 2231's <extended-initial-value>
              ; (see [RFC2231], Section 7)

charset       = "UTF-8" / "ISO-8859-1" / mime-charset

mime-charset  = 1*mime-charsetc
mime-charsetc = ALPHA / DIGIT
              / "!" / "#" / "$" / "%" / "&"
              / "+" / "-" / "^" / "_" / "`"
              / "{" / "}" / "~"
              ; as <mime-charset> in Section 2.3 of [RFC2978]
              ; except that the single quote is not included
              ; SHOULD be registered in the IANA charset registry

language      = <Language-Tag, defined in [RFC5646], Section 2.1>

value-chars   = *( pct-encoded / attr-char )

pct-encoded   = "%" HEXDIG HEXDIG
              ; see [RFC3986], Section 2.1

attr-char     = ALPHA / DIGIT
              / "!" / "#" / "$" / "&" / "+" / "-" / "."
              / "^" / "_" / "`" / "|" / "~"
              ; token except ( "*" / "'" / "%" )

danielparks avatar Sep 15 '25 16:09 danielparks

Regarding types v. ranges — strictly speaking, */*, text/*, and **foo/*bar+++* are all valid RFC7231 (HTTP) media types. The type and subtype are tokens, which are defined in HTTP RFC7230 §3.2.6 to include both + and *.

RFC6838 (Media Type) might make more sense if you want to distinguish between ranges and types.

That said, it’s worth asking what’s gained from being restrictive. Is it worth drawing the line here? What about restricting type just to the official ones? Subtype too? I’m inclined to be less restrictive and represent types and ranges with the same Rust type.

danielparks avatar Nov 08 '25 14:11 danielparks