finalfusion-rust
finalfusion-rust copied to clipboard
Pretrained embedding fetcher
I think it would be nice to have a small utility data structure to fetch pretrained embeddings. I don't think this needs to be part of the finalfusion
crate, since it is not really core functionality. The basic idea is:
-
We'd have a repository
finalfusion-fetcher
with some metadata file (probably JSON), mapping embedding file identifiers to URLs. E.g.fasttext.wiki.nl.fifu
could map to http://www.sfs.uni-tuebingen.de/a3-public-data/finalfusion-fasttext/wiki/wiki.nl.fifu -
A small crate (possibly in the same repo), would provide a datastructure
Fetcher
With a constructor that retrieves the metadata and gives a fetcher:let fetcher = Fetcher::fetch_metadata().unwrap();
A user could then open embeddings:
let dutch_embeddings = fetcher.open("fasttext.wiki.nl.fifu").unwrap();
This method would check if the embeddings are already available. If not, fetch them, store them in a standard XDG location. Then it would open the embeddings stored in this location.
Similarly,
Fetcher::mmap
could be used to memory-map an embedding after downloading.
After this is implemented, the functionality could also be exposed in finalfusion-python
.
Sounds like a very convenient feature. Some feature to search for embeddings or to get a list of available files would also be nice to have.