Interest in a ModelRepo type?
I've been using WhisperKit for quite a while now, but I always with figuring out which model I should use, how to download it, and so forth. Ie, all the stuff before you actually start extracting text.
Part of this seems to be the strong coupling of the text extraction engine with the models. Querying of what models are available, which would work on the current device, the relative ability of the models, the size of the models; all of this is exposed as static funcs of WhisperKit. To me this is a code smell which points to a new type that could take over the responsibilities of the static funcs.
I want to propose a new type called ModelRepo, which will manage the models in a directory. It will have funcs for the following:
- Querying the type of the current device
- Querying what models are available in general
- Querying what models are supported for any give device type
- Querying the models that are currently already downloaded
- Querying what models work on the current device
- Querying the recommended models for any device/current device
- Querying the recommended models for a device and given language
- Querying the recommended models that are already downloaded for a device and given language
- Downloading any model atomically, that is, to a temp location and moving into play on completion
- Downloading any model in the background
Some of these things are currently exposed via static funcs, but others — such as querying what models are currently downloaded — don't seem to be exposed, and yet are pretty fundamental. (Eg. it is common for an app to check what models it has on launch, and start a download if it needs to be ready when the user starts to extract text.)
An advantage of a ModelRepo is that you can set them up independent of a WhisperKit object. This is quite a common use case. Often you will want to be managing these resources independent of the UI state, and a specialized type would help.
Another advantage is that you could have multiple ModelRepos, and just switch them out. They could also operate in parallel. May not be many use cases for this, but perhaps it is useful for testing or the like.
The way I would envisage the refactor is that WhisperKit would keep supporting the older methods, but they would chain to a default ModelRepo internally to get things done. Additionally, a new init would be added that takes a ModelRepo when creating the WhisperKit object.
Is there any interest in this? I have been wanting something like it from the beginning. If there is some animo for this idea, I am happy to take it on and put in a Pull Request.
We have some example of how to do these checks in the Example/WhisperAX app, but I agree it would be great to have in the library itself. Previously we've only had one repo for all the models, but with MLX and any custom fine tune that someone converts using whisperkittools, we should definitely tighten up the model management and multi-repo support. Your proposal makes sense to me, thanks for considering backwards compatibility as well (and apologies for the delay with this response).