lucid
lucid copied to clipboard
Research: Feature Visualization Objectives
🔬 This is an experiment in doing radically open research. I plan to post all my work on this openly as I do it, tracking it in this issue. I'd love for people to comment, or better yet collaborate! See more.
Please be respectful of the fact that this is unpublished research and that people involved in this are putting themselves in an unusually vulnerable position. Please treat it as you would unpublished work described in a seminar or by a colleague.
⚙️ This is a bit of a more low-level and technical research issue than many of the others. It might feel a bit in the weeds, but making progress on it would give us lots of powerful traction on basically everything else.
Description
Feature Visualization studies neural network behavior by optimizing an input to trigger a particular input.
For example, to visualize a neuron, we create an input which strongly causes the neuron to fire. We can also visualize a combination of neurons by maximizing the amount the sum fires. These visualizations have a nice geometric interpretation: we are visualizing a direction in a vector space of activations, where each neuron is a basis.
We normally do this by maximizing that direction, that is maximizing the dot product of our activation vector with the the desired direction vector. However...
Maximizing dot product may not be the right objective
There are a number of reasons why just maximizing a direction in this way may not actually be the thing we want, at least in some cases:
-
"Entanglement" - If the activation space is skewed, two different "meaningful directions" could have positive dot product. Intuitively, this creates a kind of entanglement between the features. If you visualize one by maximizing the dot product, you're implicitly slightly maximizing the other.
If it was just the skew case, we could recover by doing a linear transformation of the space or using a different inner product, but the problem can get worse. It could be that more there are more meaningful vectors embedded in the space than the number of dimensions (eg. see Gabriel Goh's Decoding the Thought Vector) in which case some positive dot product between them is inevitable (although it could be very small).
-
General Exciting Inputs - It could be that there are some "exciting" inputs that just cause all neurons to fire more. In this case, it might be most interesting to find inputs that differentially excite a given neuron. Apparently that's a thing that neuroscientists do?
-
Visualizing Activation Vectors - Sometimes we want to visualize an activation vector that we got from the model (eg. Activation Grids in Building Blocks). Suppose you visualize it maximizing dot product, and get an image that has a very high dot product, but also a significant orthogonal component. That doesn't seem right, you know the original activation vector didn't have that orthogonal component! This new vector has high dot product, but it has a significantly different angle.
You might have different intuitions for visualizing an activation vector and a neuron because the activation vector is really a vector in the activation space, while the neuron is more of a co-vector that you're intended to dot product with. Or framed another way, the activation vector is complete, while the neuron kind of makes no claims about orthogonal directions.
There are a number of cases that kind of blur this line. How should we treat an activation vector decomposed by some matrix factorization into components? Or the average of many similar activation vectors?
(An additional reason we might want to do something different is that, even when normal feature visualization works perfectly, it doesn't differentiate between things that strongly help activate the direction and things that only slightly do..).
Alternate Visualization Objectives
There are many other visualization objectives we could try. (Note, there might not be a single correct one -- they may all show us different things.)
-
L2 distance - This is a very natural choice for visualizing an activation vector if your goal is to invert it. In practice, it seems to work well at inverting activations in early layers, but not later layers (optimization issues??).
Even when inversion with L2 works, sometimes it seems like dot product was more interesting. The L2 inversions tend to be very faithful to the original or not work. In contrast, dot product creates a kind of exaggerated "caricature" that seem to give insight into the network's abstractions. What are we really learning from strictly inverting a visualization -- whether enough information was preserved to reconstruct the input? (Strict inversion is perhaps more interesting when the activation vector is synthetic and we don't know have a corresponding input.)
Finally, it's not how one might apply an L2 objective to visualizing something like a neuron, where you don't know how much the neuron should fire in the end, or what should happen to other neurons.
-
Cosine Similarity - If we care about direction vectors, one natural answer is to use cosine similarity, which focuses on angle. Unfortunately, cosine similarity doesn't seem to work very well. I assume this issues is that you can create an input that creates some tiny activation vector in the right direction without substantially activating neurons, and that's quite easy. It could also partly be an optimization issue, though.
-
Dot x Cosine Similarity - Multiplying dot product by cosine similarity (possibly raised to a power) can be a useful way to get a dot-product like objective that cares more about angle, but still maximizes how far it can get in a certain direction. We've had quite a bit of success with this.
One important implementation details: you want to use something like
dot(x,y) * ceil(0.1, cossim(x,y))^n
to avoid multiplying dot product by 0 or negative cosine similarity. Otherwise, you could end up in a situation where you maximize the opposite direction (because both dot and cossim are negative, and multiply to be positive) or get stuck because both are zero. -
Dot + Orthogonal L2 penalty - Similar to the previous one, in this objective we still maximize dot product, but penalize activation vector components that orthogonal to the direction we're maximizing. This can be tuned with a hyper-parameter. We haven't really explored this.
-
Penalize previous layer activations - A "cheap" way to get a neuron to fire more is just to get the features that feed into it to fire more. In some cases, this may mean strongly activating neurons in the previous layer that only have a trivial effect, or activating neurons that generally excite neurons in the next layer. If so, penalizing the activations in previous layers may be natural. This could be implemented using an L1 penalty, L2 penalty, or something else.
We can get some intuition about possible effects from looking at our Neuron Mechanics research.
-
Linear Transform / Alternate Inner Product - Consider the skew situation, where two "meaningful directions" have positive dot product but wouldn't if we linearly transformed the space by some transformation A. This suggests that linearly transforming the space before applying an objective can be a powerful technique. (In the case of dot product, it can also be seen as using the alternative inner product xᵀAᵀAx.
- Decorellated Version - A particularly natural transformation to use is the one that makes the activations decorrelated. It seems like the space of activations is sometimes very stretched (as measured by condition number). Amongst other benefits, decorrelating will unstretch it, making the geometry more natural.
Are we sure there's a problem?
The main things pointing towards there being an issue are:
- Caricatures of activations (ie. visualizations using dot product on an activation vector) at later layers behave weirdly if you use dot product, more as expected if you multiply by cosine similarity and/or use a decorelated reparameterization.
- Poly-semantic neurons in later layers seem to suggest that the directions we're visualizing either aren't the the right semantic units, or have the "entanglement" problem, or have the mutual excitement problem...
These could be explained in different ways, but generally suggest we should think hard both about the directions we're visualizing and the objectives we're using to visualize them.
(A final, more fatal error could be that directions aren't the right thing to try to understand at all. None of these observations really implicate that at this point.)
Resources
Dot x Cosine Similarity
See, for example, this notebook on caricatures.
Penalizing activations at previous layer
obj = objectives.neuron("mixed4d", 504)
obj += -1e-4*objectives.L1("mixed4a") # penalize earlier layer
param_f = lambda: param.image(160)
_ = render.render_vis(model, obj, param_f)
Comment from Yasaman Bahri (@yasamanb): maybe the reason we see poly-semantic neurons is that the task isn't hard enough to get neurons in later layers to learn the "right" abstractions. In early layers, when you're closer to the data, perhaps it is easier. (comment paraphrase by Chris, may not be a super accurate interpretation of Yasaman's remark.)
Hey there, we are looking at these objectives with a new perspective of tying them to uncertainty estimation within a deep neural network. If an activation vectors is far from all seen activation vectors then its an outlier. If an activation vector is equally similar to the centroid of two classes then its a point close to the boundary between the two classes. Early results show that this method differs from the prediction probability at the end of a softmax and is better for some of the deeper/more complex networks I have experimented with.
This sounds super neat! What's the status on this project (considering it's been over half a year since this issue was explicitly talked over)? If it's still in the works, is the main work to be done with respect to looking at different objectives for visualization or something else?
Sounds like a dual vector space method might be useful if a transformation can be used to "unstretch" the space.
@colah Maybe I'm oversimplifying, but each abstraction, i.e. each layer away from the input we would desire a generalized representation of the data, i.e. a many-to-one correspondence between input configurations and abstract neuron activations. If we're classifying objects, we're actually stipulating this intentionally.
The real question is how "quickly" learning algorithms can separate classes. There's an obvious linear algebra angle here which will almost certainly relate to the rank and condition numbers of successive weight matrices (because there are biases too, I guess these would be a fine transformations rather than "linear").