vega-lite icon indicating copy to clipboard operation
vega-lite copied to clipboard

Bug with density transform when providing csv data as a url

Open dallascard opened this issue 1 year ago • 2 comments

I'm having an issue with density transforms that seems to occur when providing data in the form of link to a .csv file. So far, this does not seem to be an issue when passing in the data as a link to a .json file.

As background, I'm coming to this from Altair. Starting from the Altair example for a Density plot, here is the original example in Altair:

import altair as alt
from vega_datasets import data

alt.Chart(data.movies.url).transform_density(
    'IMDB_Rating',
    as_=['IMDB_Rating', 'density'],
).mark_area().encode(
    x="IMDB_Rating:Q",
    y='density:Q',
)

which produces the following vega-lite specification (which works fine). Note that the url links to a json file:

{
  "config": {"view": {"continuousWidth": 300, "continuousHeight": 300}},
  "data": {
    "url": "https://cdn.jsdelivr.net/npm/[email protected]/data/movies.json"
  },
  "mark": {"type": "area"},
  "encoding": {
    "x": {"field": "IMDB_Rating", "type": "quantitative"},
    "y": {"field": "density", "type": "quantitative"}
  },
  "transform": [{"density": "IMDB_Rating", "as": ["IMDB_Rating", "density"]}],
  "$schema": "https://vega.github.io/schema/vega-lite/v5.8.0.json"
}

(or in the vega-lite editor)

However, if I change the data source to the seattle_weather dataset, and plot the temp_max variable in that dataset, the chart it produces is empty.

Here is the altair code:

import altair as alt
from vega_datasets import data

alt.Chart(data.seattle_weather.url).transform_density(
    'temp_max',
    as_=['temp_max', 'density'],
).mark_area().encode(
    x="temp_max:Q",
    y='density:Q',
)

which produces the following vega-lite specification (which links to a .csv file):

{
  "config": {"view": {"continuousWidth": 300, "continuousHeight": 300}},
  "data": {
    "url": "https://cdn.jsdelivr.net/npm/[email protected]/data/seattle-weather.csv"
  },
  "mark": {"type": "area"},
  "encoding": {
    "x": {"field": "temp_max", "type": "quantitative"},
    "y": {"field": "density", "type": "quantitative"}
  },
  "transform": [{"density": "temp_max", "as": ["temp_max", "density"]}],
  "$schema": "https://vega.github.io/schema/vega-lite/v5.8.0.json"
}

Example in Editor, which shows a blank plot.

Interestingly, this works fine if I use vega datasets to convert the url into a dataframe, which embeds the entire dataset in the vega-lite specificaiton. Here is the working Altair code, which is the same, except for how the data source is specified:

import altair as alt
from vega_datasets import data

alt.Chart(data.seattle_weather()).transform_density(
    'temp_max',
    as_=['temp_max', 'density'],
).mark_area().encode(
    x="temp_max:Q",
    y='density:Q',
)

And the example in the editor, which similarly works.

Also, other types of plots also work with the seattle_weather data when using the url. For example, here is a scatter plot using the same variable in Altair:

alt.Chart(data.seattle_weather.url).mark_point().encode(
    x="temp_max:Q",
    y='temp_min:Q',
)

which works find and produces the following vega-lite code in the editor.

I'm not 100% sure, but I think the problem arises when passing in data in the form of a URL that points to a .csv file. Here is another example using the Disasters dataset. In this case, the plot is not blank, but it is not correct (as can be verified by changing the altair code to embed the data in the vega-light specification, as above). Here is the Altair code:

alt.Chart(data.disasters.url).transform_density(
     'Deaths',
    as_=['Deaths', 'density'],
).mark_line(width=2).encode(
    x="Deaths:Q",
    y='density:Q',
)

and the vega-lite specification:

{
  "config": {"view": {"continuousWidth": 300, "continuousHeight": 300}},
  "data": {
    "url": "https://cdn.jsdelivr.net/npm/[email protected]/data/disasters.csv"
  },
  "mark": {"type": "line", "width": 2},
  "encoding": {
    "x": {"field": "Deaths", "type": "quantitative"},
    "y": {"field": "density", "type": "quantitative"}
  },
  "transform": [{"density": "Deaths", "as": ["Deaths", "density"]}],
  "$schema": "https://vega.github.io/schema/vega-lite/v5.8.0.json"
}

and the live example in the editor.

dallascard avatar Nov 07 '23 03:11 dallascard

Also, it seems like Issue #7603 could be related

dallascard avatar Nov 07 '23 05:11 dallascard

Still working on this, but I think this issue is generated during process with csv extension. In below example, use json file works fine Open the Chart in the Vega Editor

ChiaLingWeng avatar Jan 30 '24 12:01 ChiaLingWeng