Savant
Savant copied to clipboard
[Draft] Visual Language Model Sample + Replay
We deploy a language model and formulate questions. Depending on answers, we run certain classic models for a duration until the next answer from VLM.
We run VLM once in 1 second with questions:
- cars in the viewport?
- people in the viewport?
We use Replay to store results and initialize replay every time the mode says positively; when replay, while the model continues to answer positively, we prolong replay processing; when it switches to negative, we stop it.
The replayed streams are sent to the secondary pipeline where YOLOv8-KeyPoint and YOLOv8 are deployed; we use only ROIs corresponding to VLM decisions to launch one or another (or both) models.
Can use the VLM service to run the model: https://docs.nvidia.com/jetson/jps/inference-services/vlm.html