rai icon indicating copy to clipboard operation
rai copied to clipboard

Key frames node for camera

Open adamdbrw opened this issue 5 months ago • 2 comments

Is your feature request related to a problem? Please describe. Vision Language Models are useful for understanding based on images. In robotics, the environment is dynamic and images from camera sensor(s) come at a high frequency. However, VLMs have high response latencies, so there is a gap in how robots can perceive their environment. Key frame extraction should be configurable and happen in real time. One-second old data for the last frame is acceptable. Key frames should be matched with poses and possibly other data at their timestamp.

Describe the solution you'd like A node that processes visual data from Image topic (we can start with one) continuously and extracts key frames for VLMs, which can be presented as an image mosaic (provided VLM can understand mosaics).

This node should be multi purpose and also output an entire task or runtime visual history as a series of key-frames for memory and reporting purposes. Such features need not to be in the first implementation, but kept in mind for the design. A service to get all recent images (since the last call of the service) should be a part of the node interface.

Describe alternatives you've considered Image capture right for the status update, but it can miss things that happened in between.

Additional context This is well suited for a rclcpp node.

adamdbrw avatar Sep 09 '24 14:09 adamdbrw