Romain Beaumont
Romain Beaumont
Could even consider having some optional stats computation and even trainings at the end. A full dataset construction pipeline starting from urls. Important point: * Having well packaged and separated...
result will look like this https://colab.research.google.com/github/criteo/autofaiss/blob/master/docs/notebooks/autofaiss_multimodal_search.ipynb but bette packaged
https://github.com/rom1504/img2dataset#api the config is important. Figure out a way to expose all options and yet do the right things by default with as few as possible necessary arguments to pass
it should be possible to do url list -> index + metadata store at almost no memory usage and in one step when * img2dataset provides an iterator (or a...
the basic is now done next: * [ ] configuration system * [ ] incremental * [ ] scheduling
config modes: * everything defined by the user * lot of default * guess everything Each one will be useful for maximum convenience or configurability
also consider the option of having end2end be an example and let people do their preferred config in python
https://github.com/rom1504/img2dataset/tree/main/examples should be easy now that end2end is available, do it
do that then put it in the exploration demo
need to add cors (like https://github.com/rom1504/clip-retrieval/blob/main/clip_retrieval/clip_back.py#L352) to the metric endpoint https://github.com/rom1504/clip-retrieval/blob/main/clip_retrieval/clip_back.py#L342