keras-vis
keras-vis copied to clipboard
Grad-CAM extended for ConvNet-RNN structures (optionally)
Can you explain the use-case for this? I can't glean much info from the commit.
For example ConNet-RNN looks like this:
model_target.add(TimeDistributed(Conv2D(32, (8, 8), strides=(4, 4), kernel_initializer=glorot_uniform(seed=init_seed), padding='same', activation='relu',
input_shape=(high_dimensions_width, high_dimensions_height, high_dimensions_channels)), input_shape=(params['frames_back'], high_dimensions_width, high_dimensions_height, high_dimensions_channels)))
model_target.add(TimeDistributed(Conv2D(64, (4, 4), strides=(2, 2), kernel_initializer=glorot_uniform(seed=init_seed), activation='relu')))
model_target.add(TimeDistributed(Conv2D(64, (3, 3), kernel_initializer=glorot_uniform(seed=init_seed), padding='same', activation='relu')))
model_target.add(TimeDistributed(Reshape((-1,))))
model_target.add(LSTM(512, kernel_initializer=glorot_uniform(seed=init_seed), recurrent_initializer=orthogonal(seed=init_seed), input_shape=(params['frames_back'], low_dimensions_state), return_sequences=True, dropout=params['dropout'], recurrent_dropout=params['dropout']))
model_target.add(LSTM(512))
model_target.add(Dense(dimensions_actions,kernel_initializer=glorot_uniform(seed=init_seed), name='DenseLinear
This how it can be used:
seed_img = Image.fromarray(env.getScreenRGB())
seed_img = seed_img.convert('L').convert('RGB')
seed_img_arr = np.asarray(seed_img).astype('uint8')
action_idx = np.argmax(raw_q_values)
# x_input.shape = (1, 5, 48, 48, 3) (batch_size, time_steps, pixels_width, pixels_height, pixel_channels)
heatmap = visualize_cam(model_target, layer_idx, [action_idx], seed_img_arr, alpha=0.3, input_data_rnn=x_input)
heatmap_img = Image.fromarray(np.transpose(np.array(heatmap), axes=[1, 0, 2]))
timestamp = time.time()
seed_img = Image.fromarray(np.transpose(np.array(env.getScreenRGB()), axes=[1, 0, 2]))
composite_img = Image.new("RGB", (seed_img.size[0] * 2, seed_img.size[1]))
composite_img.paste(heatmap_img, (0, 0))
composite_img.paste(seed_img, (seed_img.size[0], 0))
This is how output looks like in 3D maze where agent focuses on red doors

Nice. Looks like the model is trained using reinforcement learning. It would be really cool to have an example for this in examples/ if your code is not confidential or proprietary.
So whats the difference between model.input and input_data_rnn? From the code, it appears that you are using it to do this.
model_input = input_data_rnn[-1]
heatmap = heatmap[-1]
I dont quite understand what that does. Also, there was an API change. You should rebase. The code no longer tries to overlay heatmap since folks can use this to find heatmap on non-images or video frames as well.
With the new code, the heatmap will have the same shape as x_input and the overlaying part can be done outside.