infer
infer copied to clipboard
How far should Infer support file extensions ?
This issue is following discussion started in #21 code review.
Infer has limited support for file extensions. Other crates address file extension conversion from/to mime type, but I feel frustrated with all of them. Let's clarify wether my concerns could/should be addressed by Infer
or in another crate.
User story
I am building web applications which serves some files on the file system or proxied from other network locations. I want to set the Content-Type
header which drives the web browser behavior in some contexts.
I am also using the mime-type to build URL's with file extensions. I am using mime_guess
for this task.
Video and image mime types can be detected by Infer
but many text types cannot: Javascript, CSS in particular. My algorithm is to detect file content with Infer
first, fallback to mime_guess
when I know the file extension second, and default to "application/octet-stream" when no match found.
Concerns
-
Data source consistency:
Infer
uses its own hard coded data source.mime-db
andmime_guess
share the same data source from jshttp/mime-db, but there is no versioning of the DB: their might be some different behaviors across compilations. Also for a given mime typeInfer
results may be different frommime_guess
results. The official source of media types is IANA -
API consistency:
Infer
andmime_db
works with&'static str
whilemime_guess
works withMIME
from themime
crate as well as a customMimeGuess
type. UsingInfer
withmime_guess
is thus not much convenient: you need boilerplate to convert MIME to &str and back. -
Type aliases: neither
Infer
,mime_guess
ormime_db
support type aliases. For instancetext/javascript
is a type alias ofapplication/javascript
:Infer
results tois_mime_supported
should betrue
in both cases. -
Extension aliases:
Infer
does not support extension aliases, whilemime_db
andmime_guess
do. -
DB Overloading:
Infer
allows to have custom types, and even overloading the default ones (since #13), whilemime-db
andmime_guess
does not. -
no_std, no_alloc:
Infer
allows forno_std
,no_alloc
since #18, whilemime-db
andmime_guess
does not. -
maintenance status:
Infer
is maintained, so ismime-db
, butmime_guess
does not seem to be maintained anymore. -
build system:
mime_db
has a build script which has network dependency, whileInfer
andmime_guess
can be built offline.
All this make it difficult for Infer
to work hand in hand with either mime-db
or mime_guess
.
Options
- Improve
mime_guess
but the maintainer does not answer to pull requests. I am abandoning this option for now. - Improve
mime-db
. I will eventually study this option once we clarified on 1. and 3. - Address all points above within
Infer
, which is what I want to discuss here. - Write another crate wrapping
Infer
andmime-db
or be a complete rewrite of all. I will consider this if 1, 2 or 3 fail.
Option 2 - Improving mime-db
We could improve mime-db
at a point where Infer
would remove its current limited support for file extensions and advert to use mime-db
instead. We could adjust both crates API so that they work hand-in-hand.
Option 3 - Merging mime_guess
and mime_db
into Infer
.
As per my user story above having a single crate to guess a mime type from either content or file extension makes sens. Such ideal crate would have the following features:
-
no_alloc
,no_std
. - A static DB built from IANA and versioned outside of the build script.
- Allow overloading the default DB at runtime (and eventually at compile time too).
- Handle type and extension aliases.
@abonander, @viz-rs you are welcome to join the discussion!
- I think it should be documented what version they are using, and we should decide how updating existing mime types, like in https://github.com/bojand/infer/commit/fcbd3c5e44216ddc07b2f45dd0c980e478855786, or even adding new ones should be handled, since that could change the output of
infer::get
. -
mime_guess
could probably also usestatic
lifetimes? Anyway, with it being an unmaintained crate there's not much we can do to fix it. - We could have
infer::mime_type
be a slice of possible mime types, with the first one being the most recent one and so on? - Same for point 3?
- We could open a PR to implement it for
mime-db
? - mime-db should also be able to be no_std, no_alloc
- .
- At the moment mime-db is using a particular version of jshttp/mime-db, but I feel like this doesn't make sense. Even when they switch back to master, they could just have a script that pings them when a new jshttp/mime-db version is released and have the mime database be embedded in the crate instead of having users witness it change constantly under their feet.
Since the idea is to leave infer and mime-db as separate crates anyway, I think we could have infer::Type
be a separate crate, like mime_guess
does with the mime::Mime
, and have both infer
and mime-db
return it. That way we could have something like
// infer::get and mime_db::get return Type
// Type implements Default with the the default mime type application/octet-stream and extension ""
let kind = infer::get(&buf).or_else(|| mime_db::get(&extension)).unwrap_or_default();
We would remove file extensions limited support and put something like this in all examples so that users could just use it for their crates.
mime_db
has a build script which has network dependency, while Infer and mime_guess can be built offline.
Looking more into it they don't require the network to build, since the build script is disabled. Maybe it's an odd way of doing it, but they have to change that line to make it run. Else the vendored extensions.rs and types.rs are used
I made an initial PR to improve mime-db
and make it no_std
https://github.com/viz-rs/mime-db/pull/3
Tank you for your time and comments here!
3. We could have `infer::mime_type` be a slice of possible mime types, with the first one being the most recent one and so on?
I would let infer
just do one thing and do it well: infer the type from raw data. mime_db
would then handle all types, extensions and their relationship. In the case of type aliases infer
would return the canonical (default) type defined by IANA, leaving users the choice to use mime_db
to query for aliases.
5. We could open a PR to implement it for `mime-db`?
Yes, once we clarified further our intentions with mime-db
.
At the moment mime-db is using a particular version of jshttp/mime-db, but I feel like this doesn't make sense. Even when they switch back to master, they could just have a script that pings them when a new jshttp/mime-db version is released and have the mime database be embedded in the crate instead of having users witness it change constantly under their feet.
I fully agree. Also maybe mime-db
could be built from IANA data directly.
Since the idea is to leave infer and mime-db as separate crates anyway, I think we could have
infer::Type
be a separate crate, likemime_guess
does with themime::Mime
, and have bothinfer
andmime-db
return it. That way we could have something like// infer::get and mime_db::get return Type // Type implements Default with the the default mime type application/octet-stream and extension "" let kind = infer::get(&buf).or_else(|| mime_db::get(&extension)).unwrap_or_default();
We would remove file extensions limited support and put something like this in all examples so that users could just use it for their crates.
It sounds great. Then we would have a Type
crate (leaving the question open to merge with the Mime
crate), a DB
crate which allows to query mime types and return some Type
s and an Infer
crate which infer mime types from raw data, returning static Type
s instance from the DB
crate. This makes sense to me.
I made an initial PR to improve
mime-db
and make itno_std
viz-rs/mime-db#3
Let's see if the maintainer is opened for changes and hopefully move forward on this issue with him. Otherwise we could progress on our own (it would be sad, but this would be a classic open source story).
So, now that mime-db v1 has been out for a month and no more progress has been made on this, I think we should give another look at the current situation.
Now something like this is possible:
let buf = // some file header
let ext = // some file extension
let mime_type = infer::get(&buf).map_or_else(
|| mime_db::lookup(ext).unwrap_or("application/octet-stream"),
|t| t.mime_type(),
); // this returns a &'static str
let extensions = mime_db::extensions(mime_type); // this returns an Option<impl Iterator<Item = &'static str>>
It's not perfect, for example mime_db::extensions
unfortunately returns an Option<impl Iterator<Item = &'static str>>
instead of just impl Iterator<Item = &'static str>
. Also in some cases it might have been useful to have a struct implementing Iterator
, instead of impl Iterator
.
At least now we have &'static str
instead of &str
, but I wouldn't have expected the creator to release v1 so quickly :sweat_smile: . The latest versions of Rust added some really cool things, like expanding what const functions can do, and I would have liked to give a look those too.
What do you think @ririsoft?
Hi @paolobarbolini,
The situation is clearly improving thanks to your commits.
My understanding is that you are favoring mime-db instead of mime_guess. This is totally fine for me, since mime_guess authors do not show much interest in the discussion.
I agree that mime-db released a v1.0 a little bit early. A v2 will be needed if we want to change the Option<impl Iterator<Item = &'static str>>
annoyance. But this is not a big deal.
I believe that Infer
should share its mime database with mime-db
. Is this something you have in mind too ? Shouldn't mime-db
be modified so that Infer
could depend on it ?
Last I believe that we should offer users a single API with combination of mime-db
and Infer
features. Should this be done in an external crate that would depends on Infer
and mime-db
? Should this be done at Infer
level ? What API are we talking about ? I am leaving this as open questions.
For instance, taking your example above, I would personnally like to have something like:
let buf = // some file header
let ext = // some file extension
let mime_type = infer::get_or_ext(&buf, ext); // this returns a &'static str and default to "application/octet-stream" if no match found.
let extensions = infer::extensions(mime_type); // this returns a struct that implements Iterator<Item = &'static str>
mime_guess is in passive maintenance mode. It's a tangent of a tangent of a side-project I haven't touched in years.
mime_guess is in passive maintenance mode. It's a tangent of a tangent of a side-project I haven't touched in years.
Thanks a lot for the clarification, it helps us move forward. Cheers.
My understanding is that you are favoring mime-db instead of mime_guess
I wouldn't mind working with mime_guess instead, on the bright side it would have the advantage of not having both mime-db and mime_guess on most web projects (because of crates like reqwest and warp), but first we would have to change the mime
crate to be able to get &'static str
out of it (or return a Cow
, which would be easier but a bit less elegant)
I believe that Infer should share its mime database with mime-db. Is this something you have in mind too ? Shouldn't mime-db be modified so that Infer could depend on it ?
The more I think about this API that I start to think that having a third crate just for "merging" infer's and mime-db's functionality together wouldn't be too bad. But I haven't really made progress on this, so I'll take some more time to decide.
In the meantime I'm going to work on a PR to fix the Option<impl Iterator<Item = &'static str>>
thing, just so I'm not constantly reminded by the huge mistake of returning an Option<Iterator>
:smile: