keras-vis icon indicating copy to clipboard operation
keras-vis copied to clipboard

Grad-CAM extended for ConvNet-RNN structures (optionally)

Open evaldsurtans opened this issue 8 years ago • 3 comments

evaldsurtans avatar Jun 27 '17 14:06 evaldsurtans

Can you explain the use-case for this? I can't glean much info from the commit.

raghakot avatar Jun 28 '17 09:06 raghakot

For example ConNet-RNN looks like this:

model_target.add(TimeDistributed(Conv2D(32, (8, 8), strides=(4, 4), kernel_initializer=glorot_uniform(seed=init_seed), padding='same', activation='relu',
                                        input_shape=(high_dimensions_width, high_dimensions_height, high_dimensions_channels)), input_shape=(params['frames_back'], high_dimensions_width, high_dimensions_height, high_dimensions_channels)))
model_target.add(TimeDistributed(Conv2D(64, (4, 4), strides=(2, 2), kernel_initializer=glorot_uniform(seed=init_seed), activation='relu')))
model_target.add(TimeDistributed(Conv2D(64, (3, 3), kernel_initializer=glorot_uniform(seed=init_seed), padding='same', activation='relu')))
model_target.add(TimeDistributed(Reshape((-1,))))
model_target.add(LSTM(512, kernel_initializer=glorot_uniform(seed=init_seed), recurrent_initializer=orthogonal(seed=init_seed), input_shape=(params['frames_back'], low_dimensions_state), return_sequences=True, dropout=params['dropout'], recurrent_dropout=params['dropout']))
model_target.add(LSTM(512))
model_target.add(Dense(dimensions_actions,kernel_initializer=glorot_uniform(seed=init_seed), name='DenseLinear

This how it can be used:

seed_img = Image.fromarray(env.getScreenRGB())
seed_img = seed_img.convert('L').convert('RGB')
seed_img_arr = np.asarray(seed_img).astype('uint8')

action_idx = np.argmax(raw_q_values)

# x_input.shape = (1, 5, 48, 48, 3) (batch_size, time_steps, pixels_width, pixels_height, pixel_channels)

heatmap = visualize_cam(model_target, layer_idx, [action_idx], seed_img_arr, alpha=0.3, input_data_rnn=x_input)
heatmap_img = Image.fromarray(np.transpose(np.array(heatmap), axes=[1, 0, 2]))
timestamp = time.time()
seed_img = Image.fromarray(np.transpose(np.array(env.getScreenRGB()), axes=[1, 0, 2]))

composite_img = Image.new("RGB", (seed_img.size[0] * 2, seed_img.size[1]))
composite_img.paste(heatmap_img, (0, 0))
composite_img.paste(seed_img, (seed_img.size[0], 0))

This is how output looks like in 3D maze where agent focuses on red doors image

evaldsurtans avatar Jun 29 '17 08:06 evaldsurtans

Nice. Looks like the model is trained using reinforcement learning. It would be really cool to have an example for this in examples/ if your code is not confidential or proprietary.

So whats the difference between model.input and input_data_rnn? From the code, it appears that you are using it to do this.

model_input = input_data_rnn[-1]
heatmap = heatmap[-1]

I dont quite understand what that does. Also, there was an API change. You should rebase. The code no longer tries to overlay heatmap since folks can use this to find heatmap on non-images or video frames as well.

With the new code, the heatmap will have the same shape as x_input and the overlaying part can be done outside.

raghakot avatar Jun 29 '17 23:06 raghakot