introduce an `array` mark utilizing the heatmap transform for array data
This feature request proposes the addition of a new array mark to Vega-Lite.
This mark aims to improve support for the visualization of various types of 2D data, including heatmaps, image data, and other matrix-based representations, with built-in support for color scales, axis labels, and faceting. I see this is an initial step towards https://github.com/vega/vega-lite/issues/6043, as this focus on just a single transform in Vega, but many issues discussed in that issue also apply to this issue.
The following variants are an exploration on how the heatmap transform within Vega behaves, and how data can be prepared for ingestion within the specification. This is an initial attempt that can hopefully serve as a starting point to explore this field a bit more with the hope that someone is brave enough to turn this into an attempt for a PR.
variants explored so far
- heatmap transform single array only
- heatmap transform with color scale
- heatmap transform with color scale and axis
- heatmap transform double array faceted with color scale and axis
- heatmap transform single array with non-zero x and y scale
- heatmap transform double array with non-zero x and y scale
Note: in the specs below, I've reduced the length of the grid values. In the accompanying Vega-Editor links all values of the grids are included.
heatmap transform values only
A basic implementation using numpy to generate a heatmap from a single array, displaying it with Vega. The image is rendered with opacity levels only.
import numpy as np
import matplotlib.pyplot as plt
from skimage import data
from skimage.transform import rescale
import pyperclip
array = data.camera()
array_small = rescale(array, 0.245, anti_aliasing=False)
array_round = (array_small * 255).astype(np.uint8)
plt.imshow(array_round, cmap='gray')
print('shape', array_round.shape)
array_as_flatlist = array_round.flatten(order='C').tolist() # row-major
print('head', array_as_flatlist[0:5])
pyperclip.copy(str(array_as_flatlist))
We can make it work using the heatmap transform in Vega, using the following specification (Vega-Editor):
{
"$schema": "https://vega.github.io/schema/vega/v5.json",
"data": [
{
"name": "GRID_ARRAY",
"values": [
{
"width": 125,
"height": 125,
"values": [199, 200, 200, 198, 198, 118, 135, 161, 161, 140]
}
]
},
{
"name": "GRID_IMAGE",
"source": "GRID_ARRAY",
"transform": [{"type": "heatmap"}]
}
],
"marks": [
{
"type": "image",
"from": {"data": "GRID_IMAGE"},
"encode": {
"update": {
"x": {"value": 0},
"y": {"value": 0},
"image": {"field": "image"},
"width": {"signal": "datum.width"},
"height": {"signal": "datum.height"}
}
}
}
]
}
The result looks like this:
It seems this is the image drawn with opacity levels only.
heatmap transform with color scale
Adding a color scale to the heatmap to enhance visual differentiation of values. This example replicates a grayscale image using Vega's color scale functionality.
Let's add a color scale (Vega-Editor):
{
"$schema": "https://vega.github.io/schema/vega/v5.json",
"data": [
{
"name": "GRID_ARRAY",
"values": [
{
"width": 125,
"height": 125,
"values": [199, 200, 200, 198, 198, 130, 118, 135, 161, 161, 140]
}
]
},
{
"name": "GRID_IMAGE",
"source": "GRID_ARRAY",
"transform": [
{
"type": "heatmap",
"color": {"expr": "scale('COLOR_SCALE', datum.$value / datum.$max)"},
"opacity": 1
}
]
}
],
"scales": [
{
"name": "COLOR_SCALE",
"type": "linear",
"zero": true,
"domain": [0, 1],
"range": {"scheme": "viridis"}
}
],
"marks": [
{
"type": "image",
"from": {"data": "GRID_IMAGE"},
"encode": {
"update": {
"x": {"value": 0},
"y": {"value": 0},
"image": {"field": "image"},
"width": {"signal": "datum.width"},
"height": {"signal": "datum.height"}
}
}
}
]
}
The result will look like this:
Using this approach, I also can reproduce the grayscale image like in python using plt.imshow().
By modifying the color scale as such (Vega-Editor):
{
"name": "COLOR_SCALE",
"type": "linear",
"zero": true,
"domain": [0, 1],
"range": {"scheme": "greys"},
"reverse": true
}
heatmap transform with color scale and axis
Enhancing the previous example by including axis labels, providing context to the grid values. This facilitates interpretation of the data.
Next step is to add axis to the image. The Vega specification now looks as such (Vega-Editor):
{
"$schema": "https://vega.github.io/schema/vega/v5.json",
"width": 250,
"height": 250,
"data": [
{
"name": "GRID_ARRAY",
"values": [
{
"width": 125,
"height": 125,
"values": [199, 200, 200, 198, 198, 118, 135, 161, 161, 140]
}
]
},
{
"name": "GRID_IMAGE",
"source": "GRID_ARRAY",
"transform": [
{
"type": "heatmap",
"color": {"expr": "scale('COLOR_SCALE', datum.$value / datum.$max)"},
"opacity": 1
}
]
}
],
"scales": [
{
"name": "COLOR_SCALE",
"type": "linear",
"zero": true,
"domain": [0, 1],
"range": {"scheme": "viridis"}
},
{
"name": "X_SCALE",
"type": "linear",
"zero": true,
"domain": [0, 125],
"range": "width"
},
{
"name": "Y_SCALE",
"type": "linear",
"zero": true,
"domain": [0, 125],
"range": "height"
}
],
"axes": [
{
"scale": "X_SCALE",
"domain": false,
"orient": "bottom",
"tickCount": 5,
"labelFlush": true
},
{
"scale": "Y_SCALE",
"domain": false,
"orient": "left",
"titlePadding": 5,
"offset": 2
}
],
"marks": [
{
"type": "image",
"from": {"data": "GRID_IMAGE"},
"encode": {
"update": {
"x": {"value": 0},
"y": {"value": 0},
"image": {"field": "image"},
"width": {"signal": "width"},
"height": {"signal": "height"}
}
}
}
]
}
So far so good.
heatmap transform double array faceted with color scale and axis
Faceting multiple grids within a single visualization. This example demonstrates handling of two separate arrays with independent color scales and axis labels.
Are we able to facet grids, if we have for example two grids as input?
I've adapted my python code to prepare the data arrays:
import numpy as np
from skimage import data
from skimage import color
from skimage.transform import rescale
import pyperclip
import json
def array2vega(array):
grid = {
'height': array.shape[0],
'width': array.shape[1],
'values': array.flatten(order='C').tolist() # row-major
}
return grid
array = data.camera()
array_small = rescale(array, 0.245, anti_aliasing=False)
array_round = np.round(array_small, 2)
grid0 = array2vega(array_round)
grid1 = array2vega(1 - array_round)
arrays = [{'grid':grid0, 'variant': 'A'}, {'grid':grid1, 'variant': 'B'}]
pyperclip.copy(json.dumps(arrays))
And modified the Vega specification. This now looks as such (Vega-Editor):
{
"$schema": "https://vega.github.io/schema/vega/v5.json",
"width": 250,
"height": 250,
"data": [
{
"name": "GRID_ARRAY",
"values": [{"grid": {"width": 125, "height": 125, "values": [0.78, 0.78, 0.78, 0.78, 0.78, 0.46, 0.53, 0.63, 0.63, 0.55]}, "variant": "A"}, {"grid": {"width": 125, "height": 125, "values": [0.21999999999999997, 0.21999999999999997, 0.21999999999999997, 0.21999999999999997, 0.21999999999999997, 0.54, 0.47, 0.37, 0.37, 0.44999999999999996]}, "variant": "B"}]
},
{
"name": "GRID_IMAGE",
"source": "GRID_ARRAY",
"transform": [
{
"type": "heatmap",
"field": "grid",
"color": {"expr": "scale('COLOR_SCALE', datum.$value / datum.$max)"},
"opacity": 1
}
]
}
],
"scales": [
{
"name": "COLOR_SCALE",
"type": "linear",
"zero": true,
"domain": [0, 1],
"range": {"scheme": "viridis"}
},
{
"name": "X_SCALE",
"type": "linear",
"zero": true,
"domain": [0, 125],
"range": "width"
},
{
"name": "Y_SCALE",
"type": "linear",
"zero": true,
"domain": [0, 125],
"range": "height"
}
],
"axes": [
{
"scale": "Y_SCALE",
"domain": false,
"orient": "left",
"offset": 2
}
],
"layout": {
"columns": 2
},
"marks": [
{
"type": "group",
"from": {
"facet": {
"name": "facet",
"data": "GRID_IMAGE",
"groupby": "variant"
}
},
"title": {
"text": {"signal": "parent.variant"}
},
"encode": {
"update": {
"width": {"signal": "width"},
"height": {"signal": "height"}
}
},
"axes": [
{
"scale": "X_SCALE",
"domain": false,
"orient": "bottom"
}
],
"marks": [
{
"type": "image",
"from": {"data": "facet"},
"encode": {
"update": {
"x": {"value": 0},
"y": {"value": 0},
"image": {"field": "image"},
"width": {"signal": "width"},
"height": {"signal": "height"}
}
}
}
]
}
]
}
Not bad!
heatmap transform single array with non-zero x and y scale
Handling grids with custom scales, such as geographical data. This example showcases the challenges of aligning non-zero axes with grid dimensions and values.
This variant is still a bit difficult. The array is in unit degrees and goes on the x-axis from -180 to 180 longitude and on the y-axis from -81 to 87 latitude. The step-size is 1 degrees in both directions.
See Vega-Editor:
{
"$schema": "https://vega.github.io/schema/vega/v5.json",
"width": 360,
"height": 168,
"data": [
{
"name": "GRID_ARRAY",
"values": [{
"year":2016,
"grid":{
"x1_":-180,
"x2_":180,
"y1_":-81,
"y2_":87,
"height":168,
"width":360,
"values":[392,392,392,392,393,166,163,165,168,169]
}
}]
},
{
"name": "GRID_IMAGE",
"source": "GRID_ARRAY",
"transform": [
{
"type": "heatmap",
"field": "grid",
"color": {"expr": "scale('COLOR_SCALE', datum.$value / datum.$max)"},
"opacity": 1
}
]
}
],
"scales": [
{
"name": "COLOR_SCALE",
"type": "linear",
"zero": true,
"domain": [0, 1],
"range": {"scheme": "viridis"}
},
{
"name": "X_SCALE",
"type": "linear",
"zero": false,
"domain": [-180, 180],
"range": "width"
},
{
"name": "Y_SCALE",
"type": "linear",
"zero": false,
"domain": [-81, 87],
"range": "height"
}
],
"axes": [
{
"scale": "X_SCALE",
"domain": false,
"orient": "bottom"
},
{
"scale": "Y_SCALE",
"domain": false,
"orient": "left",
"titlePadding": 5,
"offset": 2
}
],
"marks": [
{
"type": "image",
"from": {"data": "GRID_IMAGE"},
"encode": {
"update": {
"x": {"value": 0},
"y": {"value": 0},
"image": {"field": "image"},
"width": {"signal": "datum.grid.width"},
"height": {"signal": "datum.grid.height"}
}
}
}
]
}
This results in:
Basically, for the grid only use the height and width to allocate the canvas size and iterate over the 1D array to colorize each pixel.
For the X_SCALE and Y_SCALE we use the information of x1/x2 and y1/y2 (still manually). We use the "datum.grid.width" and "datum.grid.height" as signal for within the image mark encoding. Since the scales also need a width and height, the global width/height are currently still set to the same witdth and height of the grid.
But if I change the grid input object to:
"x1":-180,
"x2":180,
"y1":-81,
"y2":87,
"height":168,
"width":360,
(removing the appended _ from x1/x2/y1/y2)
The result is this:
I've the feeling all negative values of our scales malfunction in the iterator within heatmap.js (here). But then it seems the drawn y-axis is reversed for the canvas iterator. If I add a "reverse":true to the scale Y_SCALE then it becomes more clear that only positive values are colorized in the canvas:
But then the latitude values on the y-axis does not match the input array.
heatmap transform double array with non-zero x and y scale
A more complex scenario with faceted charts using custom scales. This variant highlights the issues with global versus array-specific dimensions and independent color scales.
Lets make it a bit more complex. A facetted chart with non-zero x and y scales. Lets start with data preparation in python:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import pyperclip
import urllib.request
import json
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import pyperclip
# define data
source = 'https://raw.githubusercontent.com/vega/vega-datasets/main/data/annual-precip.json'
with urllib.request.urlopen(source) as url:
data = json.load(url)
values = data['values']
width = data['width'] # 360
height = data['height'] # 168
extent = [-180, 180, -81, 87] # xmin, xmax, ymin, ymax
# prepare array and plot
array = np.array(values).reshape(height, width)
plt.imshow(array, extent=extent)
def array2vega(array, extent):
grid = {
'extent': extent,
'height': array.shape[0],
'width': array.shape[1],
'values': array.flatten(order='C').tolist() # row-major
}
return grid
grid0 = array2vega(array, extent)
grid1 = array2vega(1 - array, extent)
arrays = [{'grid': grid0, 'variant': 'A'}, {'grid': grid1, 'variant': 'B'}]
df = pd.DataFrame.from_dict(arrays)
# copy and display
pyperclip.copy(df.to_json(orient='records'))
df
When prepararing a vega chart for this as such, See Vega-Editor:
{
"$schema": "https://vega.github.io/schema/vega/v5.json",
"width": 250,
"height": 250,
"data": [
{
"name": "GRID_ARRAY",
"values": [
{
"grid": {
"extent": [-180, 180, -81, 87],
"height": 168,
"width": 360,
"values": [392, 392, 392, 169, 187, 196]
},
"variant": "A"
},
{
"grid": {
"extent": [-180, 180, -81, 87],
"height": 168,
"width": 360,
"values": [-391, -391, -391, -164, -167, -168]
},
"variant": "B"
}
]
},
{
"name": "GRID_IMAGE",
"source": "GRID_ARRAY",
"transform": [
{
"type": "heatmap",
"field": "grid",
"color": {"expr": "scale('COLOR_SCALE', datum.$value / datum.$max)"},
"opacity": 1
}
]
}
],
"scales": [
{
"name": "COLOR_SCALE",
"type": "linear",
"zero": true,
"domain": [0, 1],
"range": {"scheme": "viridis"}
},
{
"name": "X_SCALE",
"type": "linear",
"zero": true,
"domain": [-180, 180],
"range": "width"
},
{
"name": "Y_SCALE",
"type": "linear",
"zero": true,
"domain": [-81, 87],
"range": "height"
}
],
"axes": [
{"scale": "Y_SCALE", "domain": false, "orient": "left", "offset": 2}
],
"layout": {"columns": 2},
"marks": [
{
"type": "group",
"from": {
"facet": {"name": "facet", "data": "GRID_IMAGE", "groupby": "variant"}
},
"title": {"text": {"signal": "parent.variant"}},
"encode": {
"update": {"width": {"signal": "width"}, "height": {"signal": "height"}}
},
"axes": [{"scale": "X_SCALE", "domain": false, "orient": "bottom"}],
"marks": [
{
"type": "image",
"from": {"data": "facet"},
"encode": {
"update": {
"x": {"value": 0},
"y": {"value": 0},
"image": {"field": "image"},
"width": {"signal": "datum.grid.width"},
"height": {"signal": "datum.grid.height"}
}
}
}
]
}
]
}
Two issues become clear from this:
- We see the interference of a global-defined
widthandheightand the array-definedgrid.widthandgrid.height. - Another issue that becomes apparent is that currently the color scale is not applied independent.
Proposed Specification
This is already more discussed within https://github.com/vega/vega-lite/issues/6043, but something as such should be sufficient for many things (notice there is no need for an x and y encoding channel, as the 2D array data comes prepared).
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {
"values": {
"grid": {
"extent": [-180, 180, -81, 87],
"height": 168,
"width": 360,
"values": [392, 392, 392, 169, 187, 196]
},
"variant": "A"
},
},
"mark": "array",
"encoding": {
"color": {"scale": {"scheme": "viridis"}},
"row": {},
"column": {}
}
}
With a new array mark it is hoped we can simplify syntax to specify array data, simultaneously still support handling of color schemes, with options for customization including integration with Vega-Lite's axis and scale system, supporting both zero and non-zero scales.
More over it is shown that faceting of multiple arrays is a real possibility even though maintaining independent scales and axes is something to be explored more deeply.
Performance optimization has not been part of this exploration, but it is to be noted that it would be great if the result of a heatmap transform, a canvas image, can be included within the JSON specification, meaning that the application of the heatmap transform can be done server-side. Currently it is unclear if this is accepted within the JSON standard.
This issue is one of the results of a spontaneous attempt to bring https://github.com/vega/altair/issues/891 further. Thanks for brainstorming on this topic @kanitw, @timtreis, @melonora and @joelostblom!
Thanks @mattijn, I'm still reading through in detail, but I was taken aback by the expression scale('COLOR_SCALE', datum.$value / datum.$max). I've never seen this datum.$foo syntax before. What does it mean, and where did you learn about it?
If i reverse engineer my mind, I think I found it in the heatmap transform docs: https://vega.github.io/vega/docs/transforms/heatmap/
A color value or expression for setting each individual pixel’s color. If an expression is provided, it will be invoked with an input datum that includes
$x,$y,$value, and$maxfields for the grid. If unspecified, the color defaults to gray ("#888").
So it is basically a normalizer for all values in the grid, where datum.$value represent each single value, and datum.$max the maximum in the grid. By normalizing these values, I can combine it with the "domain": [0, 1] in the color scale.
I don't think this approach will hold if I have negative values in my grid.
As was mentioned in this post: https://github.com/vega/altair/issues/891#issuecomment-1458304756, native support for Zarr file format, multiscale (pyramid), within Vega would be great too.
We should think about pluggable data loaders. We already have an arrow loader but zarr could be interesting as well once we support this feature here. Bundle size is always a concern so not sure what we want to include by default.