CLIP
CLIP copied to clipboard
Question: How to produce a correct embedding for a dataset?
Hi, i was looking for a solution on various papers but I've not found the answer anywhere. I have a csv dataset, I would like to produce an embedding for each value of my csv, considering that I'm passing a string and an array representing an image as the inputs to the model. In the current state-of-the-art which is the better way to make the embedding for each value? Do i need to extract it from the model by passing the whole dataset as input or is it better if i give a single row of the dataset at the time?
Assuming your csv has image array and also the corresponding text description, it makes no difference whether you do it in one go or one by one. The results will be the same. I think what you mean is running in batches vs line-by-line. Running in batches allows you to use GPU, where multiple rows get processed simultaneously, and you get the results quickly. However, if you do it line-by-line, it will take longer.