Savant [Draft] Visual Language Model Sample + Replay

[Draft] Visual Language Model Sample + Replay

Open bwsw opened this issue 1 year ago • 0 comments

We deploy a language model and formulate questions. Depending on answers, we run certain classic models for a duration until the next answer from VLM.

We run VLM once in 1 second with questions:

cars in the viewport?
people in the viewport?

We use Replay to store results and initialize replay every time the mode says positively; when replay, while the model continues to answer positively, we prolong replay processing; when it switches to negative, we stop it.

The replayed streams are sent to the secondary pipeline where YOLOv8-KeyPoint and YOLOv8 are deployed; we use only ROIs corresponding to VLM decisions to launch one or another (or both) models.

Can use the VLM service to run the model: https://docs.nvidia.com/jetson/jps/inference-services/vlm.html

Nov 18 '24 11:11 bwsw

Savant Savant copied to clipboard

[Draft] Visual Language Model Sample + Replay

Savant
Savant copied to clipboard