plot icon indicating copy to clipboard operation
plot copied to clipboard

Error when normalizing a BigInt variable

Open juba opened this issue 1 year ago • 4 comments

When data contains a BigInt variable, trying to apply normalize on this variable throws an error: TypeError: Cannot convert a BigInt value to a number.

I've created a small notebook to reproduce the issue:

https://observablehq.com/@juba/plot-normalize-bigint-error

It seems that the error may come from this line, but I'm far from an expert on this domain so I may be mistaken:

https://github.com/observablehq/plot/blob/0032c11160e6340ce44307675d93bed382d10446/src/transforms/normalize.js#L46C1-L47C1

juba avatar Aug 28 '24 08:08 juba

Yes you can fix this example by doing

 function normalizeBasis(basis) {
   return {
     mapIndex(I, S, T) {
-      const b = +basis(I, S);
+      const b = Number(basis(I, S));
       for (const i of I) {
-        T[i] = S[i] === null ? NaN : S[i] / b;
+        T[i] = S[i] === null ? NaN : Number(S[i]) / b;
       }
     }
   };

to try, just run yarn prepublishOnly in your repo, then upload the file to your notebook and add a cell that says:

Plot = require(await FileAttachment("plot.umd.min.js").url())

However I don't think this is enough, because we need to fix all the different bases.

In that case the patch is simpler:

 function normalizeBasis(basis) {
   return {
     mapIndex(I, S, T) {
+      S = S.map(Number);
       const b = +basis(I, S);
       for (const i of I) {
         T[i] = S[i] === null ? NaN : S[i] / b;

but we should avoid doing this if the values are already numbers (not big).

So maybe instead we do:

--- a/src/options.js
+++ b/src/options.js
@@ -54,7 +54,7 @@ function maybeTypedMap(data, f, type) {
   return map(data, isNumberType(type) ? (d, i) => coerceNumber(f(d, i)) : f, type); // allow conversion from BigInt
 }

-function maybeTypedArrayify(data, type) {
+export function maybeTypedArrayify(data, type) {
   return type === undefined
     ? arrayify(data) // preserve undefined type
     : isArrowVector(data)
diff --git a/src/transforms/normalize.js b/src/transforms/normalize.js
index f4d225cd..2f426020 100644
--- a/src/transforms/normalize.js
+++ b/src/transforms/normalize.js
@@ -1,6 +1,6 @@
 import {extent, deviation, max, mean, median, min, sum} from "d3";
 import {defined} from "../defined.js";
-import {percentile, taker} from "../options.js";
+import {maybeTypedArrayify, percentile, taker} from "../options.js";
 import {mapX, mapY} from "./map.js";

 export function normalizeX(basis, options) {
@@ -43,6 +43,7 @@ export function normalize(basis) {
 function normalizeBasis(basis) {
   return {
     mapIndex(I, S, T) {
+      S = maybeTypedArrayify(S, Float64Array);
       const b = +basis(I, S);
       for (const i of I) {
         T[i] = S[i] === null ? NaN : S[i] / b;

Fil avatar Aug 28 '24 12:08 Fil

Also let's create a more meaningful example / test plot. We often get BigInt as the result of a sql count, so maybe simulate this like so:

classes = d3
  .bin()(olympians.map((d) => d.height))
  .map((bin) => ({ weightclass: bin.x0, count: BigInt(bin.length) }))

// default basis
Plot.barY(
  classes,
  Plot.normalizeY({ x: "weightclass", y: "count" })
).plot({ x: { interval: 0.05 } })

// test a different basis
Plot.barY(
  classes,
  Plot.normalizeY("median", { x: "weightclass", y: "count" })
).plot({ x: { interval: 0.05 } })

Fil avatar Aug 28 '24 12:08 Fil

I can confirm that the issue seems fixed with your PR at https://github.com/observablehq/plot/pull/2155.

Just as a side note, I get these BigInt values because I'm passing arrow data coming from pandas and polars, which themselves seem to now import CSV files with 64 bits integers by default...

Thanks!

juba avatar Aug 28 '24 13:08 juba

thanks. I think I made a mistake in my patch, though. The coercion should happen earlier. Will fix.

Fil avatar Aug 28 '24 13:08 Fil