file-type
file-type copied to clipboard
Extending file-type
Dear maintainers. First, I really appreciate that file-type has been keep maintaining. Thank you for your efforts & contributions.
Background
Recently, file-type was refactored to utilize @Borewit's awesome strtok3 packages. With this changes, we can navigate binary stream easily & effectively. Now, file-type uses only one Tokenizer interface regardless of source. All public methods creates Tokenizer internally and pass created tokenizer to file-type matcher.
Beside these refactors, developers cannot easily extend file-type. If someone needs to add new file type support, There are only two choices:
-
Fork file-type package and customize Pros: It does not occur additional i/o if chunk was cached internally Cons: Less maintainability
-
Run custom file-type matchers if file-type returns
undefined. Pros: More maintainability Cons: It occurs additional I/O
As you can see, There are no effective way to extend file-type.
Idea
Expose Matcher interface
So, It would be nice if we support additional matchers.
Let's suppose there is a matcher like this:
interface MatchedType {
ext: string;
mime: string;
}
// write custom matcher
async function webpMatcher(tokenizer: ITokenizer): Promise<MatchedType | void> {
await tokenizer.ignore(8);
const buf = Buffer.alloc(4);
await tokenizer.peekBuffer(buf);
if (buffer.compare(Buffer.from([0x57, 0x45, 0x42, 0x50])) === 0) {
return { ext: "webp", mime: "image/webp" };
}
}
Then, Provide custom matchers like these ways:
// quick & dirty - provide custom matchers as options
fileType.fromFile(path, { matchers: [webpMatcher] });
// or provide factroy like axios does
const fileType = FileType.create({ matchers: [webpMatcher] });
// or provide instance methods to add matchers to specific position (head or tail)
const fileType = FileType.create();
fileType.appendMatcher(webpMatcher);
fileType.prependMatcher(webpMatcher);
I'm not sure which way is cleanest. Please leave your feedback!
Future
With this idea, Developers can easily extend file-type.
From file-type's perspective, file-type can have multiple 3rd-party type matchers which can be released to npm independently, or managed via Github organization/Scoped packages (e.g. @file-type/core for
built-it matchers, @file-type/image for image file type matchers)
Also it can be used solving unsupported features like https://github.com/sindresorhus/file-type/issues/264#issuecomment-568439196. For example, Developers can write custom TSV/CSV type matchers and publish to npm even if file-type does not provide built-in TSV/CSV file type matchers.
Interesting idea @mooyoul. The way I read it, you propose 2 things:
- Expand file-type with custom "matchers" (code recognize file signature)
- Split file-type engine from "matchers" (which requires 1)
Although modularity is a beautiful thing, it requires additional maintenance. Not necessary for 3rd party implementation (you could argue that those would save work), but at least splitting the engine with the core.
Although the intend of your example is clear to me, note that you should not read but peek to allow the next matcher to read from position 0.
(I will be very busy coming weeks, please forgive me not being responsive these days.)
We would definitely like to make it possible to provide your own matchers. There are some complications though. For example, sometimes detections are order-dependent. How should that be handled?
I don't think modularizing is a good idea for file-type. I would prefer to keep as much in this package as possible.
I like this version the most:
// or provide factroy like axios does
const fileType = FileType.create({ matchers: [webpMatcher] });