ammesatyajit comments

Results 12 comments of


                                            ammesatyajit

time for pretraining

So for the dataset, I used the HowTo100M dataset and filtered out the cooking videos. The specific ids for only cooking videos are listed in VideoBERT/data/ids.txt. Here are the steps...

Questions to understand this repo well :D

Hi, so what I used was the raw_caption_superclean.json for the captions file, which you can download with the raw caption zip file I believe. I did run the other repo...

Questions to understand this repo well :D

@FormerAutumn Sure! Im happy to answer any questions you have.

Questions to understand this repo well :D

So for the text next token prediction, there is no video involved, and I am just using the model for next word prediction in a sentence (similar to GPT). This...

Questions to understand this repo well :D

> @ammesatyajit Thanks for your kindness ! > > Do you know where to get the 'data/newest-data-max-len-20.npy' in https://github.com/MDSKUL/MasterProject/blob/master/stap5/globals.py ? > (I scan all the urls the author mentioned and...

Questions to understand this repo well :D

So video_next_tok_pred takes in the tokens from the validation set. It doesn't take in video clips. Hope that answers your question.

Questions to understand this repo well :D

Hi, sorry if the readme was slightly confusing. The 20736 centroids were stored in separate files due to the hierarchical k-means. The only purpose of concatenating them was so I...

Questions to understand this repo well :D

@joaanna Sorry for not replying earlier. I am not going to be able to provide a detailed response because I am a little busy at the moment due to personal...

Questions to understand this repo well :D

@FormerAutumn no problem. Vision transformer is really interesting, hope you find what you are looking for :)