mime
mime copied to clipboard
1.0 Release Discussion / Wishlist
With Hyper on its way to 1.0, perhaps it's time to start discussing where this crate is at in relation to its own eventual 1.0 release, as well as create a place to document potential blockers for said release. That said, I haven't looked too closely at this crate lately so I don't know what any potential blockers might be.
Thoughts, @seanmonstar?
I just did a large redesign of this crate, resulting in v0.3. I realize reqwest is on the road to 1.0, and it publicly relies on mime, so there does need to be a way forward. I feel that the new design 0.3 is one that can be stabilized, with one possible caveat. I'm uncertain if I'd rather all the mime constants to be on the Mime type, or on the crate.
Should you do ContentType::new(mime::TEXT_CSV), or ContentType::new(Mime::TEXT_CSV). If the latter, then since associated constants won't stabilize until Rust 1.20, might have to wait till then.
Besides that, I think there just needs some usage out of 0.3, which should happen much more so when hyper releases 0.11, which contains the upgrade.
From a high-level perspective, I think I would prefer to see the constants contained within a submodule, perhaps namespaced by top-level type, e.g.:
/// `application/_` mime types
pub mod application {
pub const NAME = Name { "application" }; // maybe?
pub const WILDCARD = Mime { "application/*" }; // or STAR
pub const OCTET_STREAM = Mime { "application/octet_stream" };
}
// ..
/// `multipart/_` mime types
pub mod multipart {
pub const NAME = Name { "multipart" };
pub const FORM_DATA = Mime { "multipart/form-data" };
pub const ATTR_BOUNDARY = Name { "boundary" };
}
// ...
This would help avoid polluting the crate root namespace as more constants are added.
I think as far as desired APIs go, I do miss the ability to construct a Mime in a strongly typed manner, e.g. Mime(TopLevel::Multipart, SubLevel::FormData, vec![Attr::Boundary("boundary".to_string())]. Maybe add a method like Mime::add_param() which returns a new Mime with the given parameter and value appended? It can be done manually, of course, but having it as a method could avoid completely reparsing the MIME type and reduce potential mistakes as well.
A specialization of ToString which skips the formatting framework would be neat as well, but that can be added backwards-compatibly when specialization stabilizes.
-
Using modules to group by top level media type is an interesting idea! Seeing
mime::text::CSSand things is neat.There could be some confusing around which constant is a
Mimeand which is aName. For instance, would this confuse a user:mime::JSONvsmime::application::JSON? Is that a situation where it'd be better to beName::JSON?I'm not sure whether the current situation really "pollutes" the crate root. Using a flat design versus modules saves 1 character of typing per constant
mime::TEXT_PLAINvsmime::text::PLAIN, not much to bother considering. Maybe if you were looking in the docs, you'd want to look for constants that you can use with onlyapplicationtypes, or something? -
Constructing a
Mimewithout strings: I delayed this to get 0.3 finished sooner. Designing methods to handle both optional suffixes and parameters might warrant something like a builder... I imagine with macros 2.0, amime!macro can be used to both construct aMimeby combiningNames, and also by parsing strings at compile time. Examples of this macro:let form_data = mime!(MULTIPART / FORM_DATA; BOUNDARY = my_rando_str);const GITHUB: Mime = mime!(APPLICATION / "vnd.github.api.v3" + JSON);const INVENTED: Mime = mime!("sean/monstar; foo=bar");
-
Mutating a
Mime: I also skipped this to get 0.3 released. The actual mutating isn't too difficult, update the innersourceand any stored indices. However, like setting theboundaryparameter with a string, it would need to be a fallible method. Not all characters are valid in aMime. So eitherfn add_param() -> Result<(), ParseError>, or maybe adding aFromStrforName, and then you callmime.add_param(BOUNDARY, my_rando_str.parse()?);.
I'm not sure whether the current situation really "pollutes" the crate root.
Currently? Not really, I guess. If you plan to add a large number of constants more, maybe. Are you going to keep it about the same? Just constants for the most used types?
Maybe then, just a few submodules would do, though it would make the absolute paths longer:
pub mod attrs {
pub const BOUNDARY = Name { "boundary" };
pub const CHARSET = Name { "charset" };
}
// Is having `Name` constants strictly necessary since they can be converted to and from strings at nearly zero cost?
pub mod names {
pub const APPLICATION = Name { "application" };
// ...
pub const MULTIPART = Name { "multipart" };
// ...
pub const GIF = Name { "gif" };
}
pub mod types {
pub const APPLICATION_JSON = Mime { "application/json" };
pub const APPLICATION_OCTET_STREAM = Mime { "application/octet-stream" };
// ...
pub const IMAGE_WILDCARD = Mime { "image/*" };
pub const IMAGE_GIF = Mime { "image/gif" };
// ...
pub const MULTIPART_FORM_DATA = Mime { "multipart/form-data" };
}
You could use proc-macro-hack to get a compile-time macro now, however I don't see much point in combining Names since their underlying strings will have to be concatted anyway. It's slightly safer but I still miss the old API design; I understand why it was difficult to work with and extend though.
If you plan to add a large number of constants more, maybe. Are you going to keep it about the same? Just constants for the most used types?
Undecided. I'm sure there's a few other fairly common that I have forgotten. I'm not sure I'd want to just copy the IANA registry into this crate. With a macro to make new static Mimes, that feels unnecessary.
Is having
Nameconstants strictly necessary since they can be converted to and from strings at nearly zero cost?
Well, having constants instead of people typing strings does allow the catching of typos. mime.type_() == mime::APPLICATON will error, mime.type_() == "applicaton" will not. As to them being of type Name instead of &str, not all names are treated the same. Specifically, most names are case insensitive, so mime.type_() == "TEXT" and mime.get_param("charset").unwrap() == "UTF-8" should work.
You could use proc-macro-hack to get a compile-time macro now
I did start a macro trying to do just that, but hit 2 problems, 1 probably solvable, another not.
- The hack defines an item,
enum ProcMacroHack { ... }, and so you cannot use it multiple times in the same scope. I would consider trying to find a solution, except for number 2. - To allow the macro to construct a
Mimeoutside of themimecrate, the fields would need to be public. That would mean libraries can no longer rely on the a constructedMimeto only have valid characters. Macros 2.0 already are executed in scope of the defining crate, so they can access private fields, without having to expose them to the world.
I don't see much point in combining Names since their underlying strings will have to be concatted anyway.
With a compile time macro, they can be concatenated into a single static string, so the there's no runtime cost. And by allowing things like mime!(APPLICATION / CSV), you can again get typo checks instead of using a string.
There was some discussion in #55 about splitting Mime into two types, MediaType and MediaRange, such that text/* is a valid MediaRange, but not a valid MediaType. I like the concept. Would be a breaking change...
Makes sense.
You might also want to consider adding some support for RFC2231. Currently you can parse/uses MIME conforming with the RFC, but you have to put extra work on top for using such mimes.
A example from the RFC would be:
Content-Type: application/x-stuff;
title*0*=us-ascii'en'This%20is%20even%20more%20;
title*1*=%2A%2A%2Afun%2A%2A%2A%20;
title*2="isn't it!"
By adding a * to the end of you parameter name (e.g. filename => filename*) you
switch to a slightly different format for the values, which:
- add a
charsetfield followed by a language field separated by'from the rest - instead of quoting you now use percent encoding to encode from the charset specified into us-ascii valid for an parameter value
Additionally they add a mechanism to split long parameter values into multiple
sections by adding a *N (e.g. *0,*1,...) to the end of the parameter name,
but before any possible added * for encoding. Note that only the
first section contains a charset/language field, following sections
do not. Also following sections do not need to use encoding,
through if encoding is used in any section the first section has to
be encoded.
Like I said before this crate does parse such definitions just fine,
but it would be interesting to have e.g. a get_rfc2231_param,
method which iterates over all sections of a given parameter
and preferable does so in the section order (which can
differ from the order they appear in the MIME field).
(e.g. a ~ Iter<Item=(usize, bool, Name)> where the bool indicates,
if a section is encoded i.e. the full parameter name ended
in a * and usize if the number of the section, starting with 0)
Similar nice would be a way to construct such MIMES,
through this would mean pulling in percent_encoding
dependencies.
For now I will implement some of the part on top of Mime for my use case. I can post them here when done or, if they don't contain any application specific logic, make a pull request.
One question that's come up during the crate evaluation so far is whether mime is the best name for the crate. Since we're just dealing with media types then maybe media-types is more descriptive?
Splitting into MediaType and MediaRange seems like a good idea. I'm also thinking that if we did that then maybe it also makes sense to move the constants on to those respective types? That one seems like it's 6-of-one or half-a-dozen-of-the-other to me though.
Here are some other thinks which I think are a good idea:
- do not use
Namefor both names and values. All "names" are can be compared case insensitive but only value of the charset param is compared case insensitive. - do not provide
as_strfor the "new (1.)"Valuetype as quoted-pairs (e.g.\") can not be handled correctly with it instead provideas_str_repr(&self) -> &str(for the underlying representation) andto_content(&self) -> Cow<'a, str>for the actual content (comparasions can be implemented content based without needing any allocation even if quoted-pairs are used. - maybe drop special handling of the
charsetparameter. While charsets are case insensitive, this is a detail on a different abstraction level and case insensitive comparison is not enough at all. With other words if someone want's to compare the person has to use an additional API anyway which knows about alias names for charsets (e.g.utf8==utf-8). Any such API would implement case insensitive comparison by themselves, so no need to implement it here. - consider if it is possible to parse a "text/plain" into the same representation as TEXT_PLAIN, by e.g. using a trie data struct containing the most common type/subtypes (speedier comparisons matching etc. wrt. to parsed mime-types and mime-ranges, similar parsing speed (I guess) but higher binary size)
Would you consider cutting a new pre-1.0 release, so that I can use the serde feature in my create.
Any status on this a year later? Serde support still isn't available.
It'd be nice to have serde support by now.
The ability for applications/consumers to define const Mime instances would be super valuable.
All the APIs used to define the ones shipped in this crate are private. If someone needs to use a single mime-type but it's not shipped here, there doesn't seem to be any obvious workaround except "create a new instance each time you need a reference to one", which completely negates all the performance optimizations.