open_flamingo About the random selection during pre-training

Hi, I have noticed that for each sample (i.e. document) used in pre-training. It firstly reads all of the images within a sample. Then, the number of images is bounded by the maximum image. Therefore, it only utilizes the first several images, and the rest of them will never be used. Is it desirable property?

Aug 01 '23 09:08 vateye

No you’re right this isn’t a perfect way of doing it. Ideally you should create multiple samples from a document if it is too long. Do you plan on adding this in a PR :)?

Aug 01 '23 16:08 anas-awadalla

@anas-awadalla could help out here if it's still relevant, though not sure if I understand the issue properly. Is the suggestion to only read images such that there are up to max_num_images valid_images? Or that we should create extra samples with the surplus images? If the latter, how should we deal with the accompanying text in the new samples?

Oct 06 '23 17:10 isaac-chung

Hello! So currently what is going on is that we are "read[ing] images such that there are up to max_num_images valid_images". I think a better way to go about this would be to images/text until you surpass the max_num_images limit and then create a separate sample with the remaining images/text.

Oct 06 '23 18:10 anas-awadalla

Got it, thanks. Not sure how many images there are usually in the samples. This seems to be effectively chunking/yielding a sample by max_num_images, and it looks like we can directly put a new method (similar to get_patches in the example link) directly into the pipeline to support expanding samples. [Update]: Or maybe I think we can just make preprocess_interleaved yield samples.

Looking through the code, preprocess_gpt_interleaved and preprocess_interleaved share quite a bit of code as well, so I could clean that up too in the same PR.

Oct 07 '23 18:10 isaac-chung

Sweet! That would be awesome

Oct 07 '23 18:10 anas-awadalla

Got a few questions in the PR before I make more changes 🙏

Oct 07 '23 18:10 isaac-chung

open_flamingo open_flamingo copied to clipboard

About the random selection during pre-training

open_flamingo
open_flamingo copied to clipboard