keract icon indicating copy to clipboard operation
keract copied to clipboard

Help with get_activations

Open timwfburton opened this issue 2 years ago • 3 comments

I've been having a lot of success with your keras attention library, and was hoping to use get_activations to understand the network behavior in more detail.

In this toy example (which I created to reduce down to the smallest snippet), I have a dataset of size 100, each with 10 time steps, and each time step having 5 features. There's an initial dense model which is encapsulated in a time distributed layer that then feeds into a LSTM that returns it's sequence. The sequence is then input to your Attention layer with 32 units.

My goal is to understand which of the 10 time stamps for each of the 100 samples in the dataset is most important (i.e., receiving the most attention), so I used the get_activations function. I was expected a matrix of size 100x10, but I got 100x32. Is seems like what I'm trying to do should be possible based on your examples - I'd really appreciate any advice that you can give!

attentionQuestion

timwfburton avatar May 25 '22 21:05 timwfburton

@timwfburton thanks for the feedback! Appreciated! Can you paste the full code that I can run?

philipperemy avatar May 31 '22 02:05 philipperemy

Of course, thanks in advance - here it is!

import keras as k
import keract
from attention import Attention
import numpy as np

num_samples = 100
timeSteps = 10
featurePerStep = 5

coreModel =  k.models.Sequential()
coreModel.add(k.layers.Input(shape=(featurePerStep), name = 'input'))
coreModel.add(k.layers.Dense(3))

coreModel.summary()

timeDistributedModel = k.models.Sequential()
timeDistributedModel.add(k.layers.TimeDistributed(coreModel))
timeDistributedModel.add(k.layers.LSTM(64, return_sequences=True, name = 'lstm'))
timeDistributedModel.add(Attention(units=32, name = "attention"))
timeDistributedModel.add(k.layers.Dense(1, name = "dense"))

timeDistributedModel.compile(optimizer=k.optimizers.Adam(),loss='binary_crossentropy')

mockTrainX = np.random.rand(num_samples,timeSteps,featurePerStep)
mockTrainY = np.round(np.random.uniform(0.0, 1.0, size = (num_samples,1)),0)

timeDistributedModel.fit(x=mockTrainX, y=mockTrainY, batch_size = 32, epochs=1)
timeDistributedModel.summary()

activations = keract.get_activations(timeDistributedModel, mockTrainX,layer_names = ['attention'],nested=True)['attention']
print("activations has a length of "+str(len(activations)))
print("activations[0] have a length of "+str(len(activations[0])))
print("activations[99] have a length of "+str(len(activations[99])))

timwfburton avatar May 31 '22 13:05 timwfburton

https://github.com/philipperemy/keras-attention-mechanism/blob/482b0c937b3888da5967b47478701838a4222269/examples/add_two_numbers.py#L95

What you want is the attention_weights. But maybe it does not work well with the Sequential API. Try with the functional API.

philipperemy avatar Jun 08 '22 09:06 philipperemy

I'll close this issue (cf. answer above). If it's not clear, feel free to comment on it!

philipperemy avatar Sep 25 '22 15:09 philipperemy