captum icon indicating copy to clipboard operation
captum copied to clipboard

Integrated Gradients for Frame-Level Classification Tasks

Open david-gimeno opened this issue 1 year ago • 0 comments

🚀 Feature

Integrated Gradients not only for single classification task, but also for those that are based on frame-level classification, such as speech recognition, active speaker detection, etc.

Additional context

In the case of automatic lipreading, we have a tensor of shape (T, H, W, C), where T, H, W, and C refer to the time (no. of frames), height of the image, width of the image and channels of the image, respectively. I would like to apply Integrated Gradients so the targets of the model can be a sequence, i.e., an integer for each frame of the sequence. Each integer would represent the corresponding text token composing the final transcription.

david-gimeno avatar May 16 '24 10:05 david-gimeno