optimum-habana
optimum-habana copied to clipboard
VideoMAE Model Enabling and Examples
What does this PR do?
This PR gives examples and proves compatibility for VideoMAE with Gaudi 2 on graph mode and casted to BF16. Tests included ensure compatibility with these and a latency regression test for the graph mode + BF16 model.
No core code changes were made to enable the model.
Before submitting
- [x] Did you make sure to update the documentation with your changes?
- [x] Did you write any new necessary tests?
@pi314ever Daniel, pls list out performance benchmark btw Gaudi 2 and A100.
Performance (s) | A100 | Gaudi2 |
---|---|---|
BF16 | 0.02548 | 0.01313 |
FP32 | 0.05736 | 0.01962 |
Testing setup:
- 100 sequential model passthroughs of a single video buffer of 16 frames
- Recorded performance is average time per forward pass
can you rebase this?
Looks good to me
I added the patch. I ran the script with multiple video inputs from TempoFunk/webvid-10M:
python run_example.py -bg -w 3 \
--video_paths https://ak.picdn.net/shutterstock/videos/5629184/preview/stock-footage-senior-couple-looking-through-binoculars-on-sailboat-together-shot-on-red-epic-for-high-quality-k.mp4 \
https://ak.picdn.net/shutterstock/videos/21179416/preview/stock-footage-aerial-shot-winter-forest.mp4 \
https://ak.picdn.net/shutterstock/videos/1063125190/preview/stock-footage-a-beautiful-cookie-with-oranges-lies-on-a-green-tablecloth.mp4 \
https://ak.picdn.net/shutterstock/videos/1039695998/preview/stock-footage-japanese-highrise-office-skyscrapers-tokyo-square.mp4 \
https://ak.picdn.net/shutterstock/videos/9607838/preview/stock-footage-zrenjanin-serbia-march-fans-watching-live-concert-bokeh-blur-urban-background-x.mp4
Which gave outputs
Predicted class for stock-footage-senior-couple-looking-through-binoculars-on-sailboat-together-shot-on-red-epic-for-high-quality-k.mp4 is sailing and took 3.372e-01 seconds
Predicted class for stock-footage-aerial-shot-winter-forest.mp4 is sled dog racing and took 3.360e-01 seconds
Predicted class for stock-footage-a-beautiful-cookie-with-oranges-lies-on-a-green-tablecloth.mp4 is cooking sausages and took 3.349e-01 seconds
Predicted class for stock-footage-japanese-highrise-office-skyscrapers-tokyo-square.mp4 is marching and took 3.362e-01 seconds
Predicted class for stock-footage-zrenjanin-serbia-march-fans-watching-live-concert-bokeh-blur-urban-background-x.mp4 is slacklining and took 3.358e-01 seconds
The script was loosely adapted from example from the original model card and #783.
Thank you @pi314ever I suggest we adding the example of multiple videos in the README.me file and a note on the adoption strategy.
@regisss Could you kindly provide more review/comments for this?
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.