Romain Beaumont
Romain Beaumont
awesome-semantic-search
Semantic search with embeddings: index anything
clip-retrieval
Easily compute clip embeddings and build a clip retrieval system with them
img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
image_embeddings
Using efficientnet to provide embeddings for retrieval
embedding-reader
Efficiently read embedding in streaming from any filesystem
laion-prepro
Get hundred of million of image+url from the crawling at home dataset and preprocess them
cc2dataset
Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/text, ...