STEVE-1
STEVE-1 copied to clipboard
Some questions of interest regarding the details of Prior training.
In the Appendix D.2 section of the paper, the Prior Training section, I understood how Steve-1 collected text-video pairs for training the Prior. I am particularly interested in two points 😄 :
- I am curious about how I can obtain the 2000 hand-labeled text examples/10000 augmented text examples because I want to try to have the Steve-1 Agent perform some tasks that are trained but not among those 11 tasks.
- How can I use mineclip to retrieve videos, is there a script for this? I am curious about how the
offset
operation mentioned in the paper is smoothly implemented.
Looking forward to your reply ❤️ @Shalev-Lifshitz