wandb icon indicating copy to clipboard operation
wandb copied to clipboard

[Q] How to create normalized frequency histogram with W&B custom chart

Open gil2rok opened this issue 1 year ago • 2 comments

Goal: I am having trouble creating a histogram of normalized frequencies in Weights and Biases custom charts. I would love some community help 😄 .

I modify the default Vega-Lite code from W&B custom chart histograms to produce this plot:

Screenshot 2024-01-21 at 7 14 15 PM

I want to normalize the histograms per-group such that the bin heights add up to one. (Note that because the bin-width is set to one, this is both a valid PDF and PMF.)

I am surprised that this is so difficult to do -- normalized histograms are so common! -- and would immensely appreciate any help to get this to work.

Current Approach: I am following this example from the Vega-Lite documentation that creates a normalized frequency histogram. In the transform block, they aggregate by count, use joinaggregate to sum the entire count, and then caclulate the datum.Count / datum.TotalCount to get the normalized frequencies.

When I try adding this functionality to my Vega-Lite code, no plot appears in the editor, indicating some sort of error 🐛 .

Code I Used: More specifically, I got an error when adding the following Vega-Lite code to the bottom of my transform block:

{
      "joinaggregate": [
        {"op": "sum", "field": "Count", "as": "TotalCount"}
      ],
      "groupby": ["newGroupKeys", "color", "grouped"]
    },
    {
      "calculate": "datum.Count / datum.TotalCount",
      "as": "RelativeFrequency"
}

Here is my working Vega-Lite code used to produce the plot above. When adding changes to normalize by frequency, this code no longer works.

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "description": "A simple histogram",
  "data": {
    "name": "wandb"
  },
   "transform": [
    {
      "calculate": "if('${field:groupKeys}' === ''  || datum['${field:groupKeys}'] === '', false, true)",
      "as": "grouped"
    },
    {
      "calculate": "if('${field:groupKeys}' === ''  || datum['${field:groupKeys}'] === '', datum.name, datum['${field:groupKeys}'])",
      "as": "newGroupKeys"
    },
    {
      "calculate": "if('${field:groupKeys}' === ''  || datum['${field:groupKeys}'] === '', datum.color, datum['${field:groupKeys}'])",
      "as": "color"
    },
  {
    "aggregate": [
      {
      "op" : "average",
      "field": "${field:value}",
      "as": "${field:value}"
      }
    ],
    "groupby": ["newGroupKeys", "color", "grouped", "${field:value}"]
  }
],
  "selection": {
    "grid": {
      "type": "interval", "bind": "scales"
    }
  },
  "title": "${string:title}",
  "layer": [
    {
      "transform": [
        {"filter": "datum.grouped == false"}
      ],
      "mark": {"type": "bar", "tooltip": {"content": "data"}},
      "encoding": {
        "x": {
          "bin": {"binned" : false, "step" : 1},
          "type": "quantitative",
          "field": "${field:value}"
        },
        "y": {
          "aggregate": "count",
          "stack": null
        },
        "opacity": {"value": 0.6},
        "detail": [{"field": "newGroupKeys"}, {"field": "color"}],
        "color": {
          "type": "nominal",
          "field": "newGroupKeys",
          "scale": {"range": {"field": "color"}},
          "legend": {"title": null}
        }
      }
    },
    {
      "transform": [
        {"filter": "datum.grouped == true"}
      ],
      "mark": {"type": "bar", "binSpacing": 0, "tooltip": {"content": "data"}, "clip": true},
      "encoding": {
        "x": {
          "bin" : {"binned" : false, "step" : 1}, 
          "type": "quantitative",
          "scale": {"domain": [0, 30]},
          "field": "${field:value}"
        },
        "y": {
          "aggregate": "count",
          "stack": null
        },
        "opacity": {"value": 0.6},
        "detail": [{"field": "newGroupKeys"}, {"field": "color"}],
        "color": {
          "field": "newGroupKeys",
          "type": "nominal",
          "scale": {"range": "category"},
          "legend": {"title": null}
        }
      }
    }
  ],
  "resolve": {"scale": {"color": "independent"}}
}

Any help to diagnose this issue would be immensely appreciated. Thanks.

gil2rok avatar Jan 22 '24 00:01 gil2rok

Alternatively, can I make a normalized histogram by logging to a table and then using Weave? I need the histograms to work when I group certain runs together and want to ensure Weave plots will do this.

gil2rok avatar Jan 22 '24 03:01 gil2rok

Any help would be immensely appreciated, especially from the W&B team.

gil2rok avatar Jan 29 '24 16:01 gil2rok

Hello! Sorry that this slipped through! I am looking into to see if there is a way to do this in Weave.

rsanandres-wandb avatar Feb 12 '24 17:02 rsanandres-wandb