papernotes
papernotes copied to clipboard
‘Lighter’ Can Still Be Dark: Modeling Comparative Color Descriptions
Metadata
- Authors: Olivia Winn and Smaranda Muresan
- Organization: Columbia University
- Conference: ACL 2018 (best short paper)
- Paper: https://aclweb.org/anthology/P18-2125
- Video: https://vimeo.com/288152700
- Dataset: https://bitbucket.org/o_winn/comparative_colors
Summary
This paper purposes a new paradigm of learning to ground comparative adjectives within the realm of color description: given a reference RGB color and a comparative term (e.g., 'lighter', 'paler'), their deep learning model learns to ground the comparative as a direction in RGB space such that the colors along the vector, rooted at the reference color, satisfy the comparison.
Motivation
- In fine-grained object recognition, multi-modal approaches achieves a degree of success by grounding adjectives and nouns from the descriptive text in image features.
- One limitation of this approach, is when objects are differentiated not by having unique sets of attributes but by a difference in the strengths of their attributes.
- Comparatives are shown to be frequently used to distinguish similar colors in color selection task (Monroe et al. 2017). Thus, fine-grained object recognition can utilize comparative adjectives.
- No approaches have focused explicitly on learning to ground comparative adjective.
- This work focuses on comparative color descriptions.

Challenges
The task is not easy, since it requires:
- A reference (root) color point.
- The comparative describes how the reference color changes, which means it could refer to the whole range of potential target colors. As the following screenshot shows, 'darker' refers to a different direction in RGB space depending on the reference color, and thus we need a reference-dependent approach.

Data
- The data contains 821 color labels, averaging 600 RGB data points per label.
600 seems to be wrong? Too few.
- These labels do not contain comparative adjectives, but many start with adjectives in the positive (base) form, e.g., light blue instead of lighter blue.
- However, light blue can be interpreted as lighter than the referential blue.
- Note that not all color labels containing quantifiers could be utilized this manner, e.g., cobalt blue cannot be considered to be more cobalt.
- After preprocessing, the data results in 415 triplets containing 79 unique reference color labels and 81 unique comparatives. (
reference color label,comparative adjective.,target color label), such as (blue, lighter, light blue), where each color label is a set of RGB data points. - Data: https://bitbucket.org/o_winn/comparative_colors
Method

- Input: Reference color r_c & comparative adjective w.
- Output: A vector w_g pointing from r_c in the direction of change in RGB, which in training is measured against the direction towards a target color t_c.
- Model: 2-layer fully-connected network.
- The comparative is represented as a bi-gram to account for comparatives which necessitate using more (e.g., more electric); single-word comparatives are preceded by the zero vector (<PAD>).
- Use pretrained word2vec with d=300.
- Each layer reduces the dimension of the output by an order of magnitude.
- As the dimension of word embeddings are two orders of magnitude larger than the reference RGB color (d=3), the reference RGB color are inputted into both layers of the network, helping to mitigate the loss of color information. (Inputting color into only one layer is insufficient).
- Loss functions:
- Cosine similarity between w_g and the vector from r_c to t_c.
- Euclidean distance between r_c + w_g and t_c to restrain the length of w_g.
Experimental Setup

- Seen Pairings: The reference color label, the comparative adjective and their pairing have been seen in the training data.
- Unseen Pairings: The reference color label and the comparative adjective have been seen in the training data, but not their pairing.
- Unseen Ref. Color: The reference color label, and thus all the corresponding RGB color data points, have not be seen in training, while the comparative has been seen in the training data.
- Unseen Comparative: The comparative adjective has not been seen in training, but the reference color label has been seen.
- Fully Unseen: Neither the comparative adjective nor the reference color have been seen in the training.
- 15% of the data points from each training reference color label were set aside for testing, providing RGB values close but not equivalent to those seen in training.
- 10% of the reference color labels were set aside for testing, as were 10% of the comparative words.
- The network was trained at a 0.001 learning rate for 800 epochs, with the output of the first layer of dimension d=30.
- Evaluation metrics: Cosine similarity as same in training objective and Delta-E (closer to human perception).
- Delta-E perception:
- <= 1.0: Imperceptible
- 1 - 2: Requires close observation
- 2 - 10: Perceivable
- 11 - 49: More similar than opposite
- 100: Exact opposites
Question: Why not also using Delta-E as training objective?
Results


As seen in Figure 3, grounding comparatives in directional vectors over RGB allows them to capture a full range of modification of the reference color. Even for some of the error cases the resulting outputs tend to capture directions which are reasonable illustrations of the color the comparative described. Though the darker grounding example from unseen pairings is incorrectly de-saturating the reference color, it is also in fact making the color darker. Most impressive is the paler example at the bottom, which is able to capture the direction of the comparative almost perfectly. Regarding failures, we see that they tend to be of comparatives words that relate to a different color, such as more greenish and bluer, rather than comparatives such as lighter.
They also examined whether the model could generate plausible comparative terms given a r_c and t_c. All of the comparatives in the model’s vocabulary were applied to r_c, and the corresponding w_g were sorted by cosine similarity to given reference-target direction. Note that this doesn't require to modify the input, output and model.

Future Work
This work is the first step towards fine-grained object recognition through comparative descriptions, providing a way to utilize relational descriptive text. This approach could be extended to other properties such as size, texture, or curvature. It could also be used to aid in zero-shot learning from text sources, generating human-understandable explanations for categorization of similar objects, or providing descriptions of new, unknown objects with respect to known ones.
Reference
- Colors in Context: A Pragmatic Neural Model for Grounded Language Understanding by Will Monroe et al. TACL 2017