vega-lite icon indicating copy to clipboard operation
vega-lite copied to clipboard

Facetting on an "array value" breaks boxplots

Open gs0-pix4d opened this issue 2 years ago • 4 comments

Please:

  • [x] Check for duplicate issues. Please file separate requests as separate issues.
  • [x] Describe how to reproduce the bug. When using a facet/row on a field that has type array, the boxplots are broken in weird ways. Use the provided specification below and vl2svg to reproduce the bug.
  • [x] Use the latest versions of Vega and Vega-Lite.
    ➜  ~ vl2svg --version
    5.2.0
    ➜  ~ vg2svg --version
    5.22.1
    
  • [x] Provide an minimal, reproducible example spec in JSON:

In the following example, the rendering is broken for Study ["A", "B"], Project 2, using either facet or row: remove the leading _ from the row specification to try switching between the two (and disable the facet accordingly)

{
  "data": {
    "values": [
        {"err": 8,"study": ["A", "B"],"project": "1"},
        {"err": 3,"study": ["A", "B"],"project": "1"},
        {"err": 6,"study": ["A", "B"],"project": "2"},
        {"err": 0,"study":["A", "B"],"project": "2"}
      ]
  },
  "mark": {"type": "boxplot"},
  "encoding": {
    "facet": {"field": "project"},
    "_row": {"field": "project"},
    "y": {"field": "study"},
    "x": {"field": "err","type": "quantitative"}
  }
}

Adding the following transform fixes the problem: "transform": [{"calculate": "join(datum.study)", "as": "study"}],. In other words, when the field is not an array but a string, everything is alright.

  • [x] If applicable, include error messages and screenshots, GIF videos (e.g. using https://www.cockos.com/licecap/), or working examples (e.g. by clicking share in the Vega-Editor or https://bit.ly/vega-lite-blocks)

My tests are run using vl2svg cli tool.

Broken rendering: image

Cast to string: image

A more complex example (using real world data, please excuse the crushed boxes of the plot) looks like that: image

You can see that Case 12 seems to work well, and Case 11 has proper data in front of PREFIX-variant4, while all the others are stuck to the top of the plot area.

gs0-pix4d avatar May 31 '22 17:05 gs0-pix4d

Can you look the Vega code to see what needs to be fixed and send a pull request?

domoritz avatar May 31 '22 18:05 domoritz

I would be OK to do it, however I have zero knowledge about the code-base, and I'm a very new Vega user (about two weeks, on and off). I wouldn't know where to start digging and how to debug the behavior.

First of all, is it sure that converting to string is the right thing to do? I was wondering if facetting could have a special meaning on fields that are arrays (such as facetting per array element, for example). In other words, what is (or "are" if it's configurable) the expected outcomes for a dataset like that one (not even talking about boxplots)?

[
  {"val": 3, "groups": ["A", "B"]},
  {"val": 5, "groups": ["A"]}
]

gs0-pix4d avatar Jun 01 '22 07:06 gs0-pix4d

I don't think we facet per value. If you wanted that, you could explicitly flatten the field first. Using an array should result in a multi-line label.

domoritz avatar Jun 01 '22 20:06 domoritz

Reading your first comment again, I think you mean digging through the converted-to-Vega specification code, rather than the Vega source code itself.

Here is the somewhat minimal Vega code I could get where I only removed stuff (and updating scales → domain → fields → data to use the source data). At this point, changing anything makes the visualization go wrong in some way, and I could not find a way to change the grammar to fix the problem nicely: casting the array toString of course works, but that's not the intended fix and we want to keep the new line in the label when using an array of strings.

Click me to show Vega description
{
  "$schema": "https://vega.github.io/schema/vega/v5.json",
  "data": [
    {
      "name": "source_0",
      "values": [
        {"err": 8, "study": ["A", "B"], "project": "1"},
        {"err": 3, "study": ["A", "B"], "project": "1"},
        {"err": 6, "study": ["A", "B"], "project": "2"},
        {"err": 0, "study": ["A", "B"], "project": "2"}
      ]
    }
  ],
  "signals": [
    {"name": "child_width", "value": 200},
    {"name": "y_step", "value": 20},
    {"name": "child_height", "update": "bandspace(domain('y').length, 0, 0) * y_step"}
  ],
  "layout": {"padding": 20, "bounds": "full", "align": "all"},
  "marks": [
    {
      "name": "cell",
      "type": "group",
      "style": "cell",
      "from": {
        "facet": {"name": "facet", "data": "source_0", "groupby": ["project"]}
      },
      "data": [
        {
          "source": "facet",
          "name": "source_0",
          "transform": [
            {
              "type": "joinaggregate",
              "as": ["lower_box_err", "upper_box_err"],
              "ops": ["q1", "q3"],
              "fields": ["err", "err"],
              "groupby": ["project", "study"]
            }
          ]
        }
      ],
      "encode": {
        "update": {
          "width": {"signal": "child_width"},
          "height": {"signal": "child_height"}
        }
      },
      "marks": [
        {
          "name": "child_layer_0_layer_0_marks",
          "type": "symbol",
          "from": {"data": "source_0"},
          "encode": {
            "update": {
              "x": {"scale": "x", "field": "err"},
              "y": {"scale": "y", "field": "study", "band": 0.5}
            }
          }
        }
      ]
    }
  ],
  "scales": [
    {
      "name": "x",
      "type": "linear",
      "domain": {
        "fields": [
          {"data": "source_0", "field": "err"}
        ]
      },
      "range": [0, {"signal": "child_width"}]
    },
    {
      "name": "y",
      "type": "band",
      "domain": {
        "fields": [{"data": "source_0", "field": "study"}],
        "sort": true
      },
      "range": {"step": {"signal": "y_step"}}
    }
  ]
}

gs0-pix4d avatar Jun 13 '22 14:06 gs0-pix4d