danfojs icon indicating copy to clipboard operation
danfojs copied to clipboard

Values mapped to incorrect columns

Open febilyt opened this issue 2 years ago • 3 comments

Describe the bug When creating a dataframe from an object array, if the keys of different objects are not in the same order, the values will be mapped to an incorrect column.

To Reproduce Sample code

const dfd = require("danfojs-node");

let data = [
  {
      Id: 1,
      Name: 'Apple'
  },
  {
      Name: 'Orange',
      Id: 2
  },
];
let df = new dfd.DataFrame(data);
df.print();

Output:

╔════════════╤═══════════════════╤═══════════════════╗
║            │ Id                │ Name              ║
╟────────────┼───────────────────┼───────────────────╢
║ 0          │ 1                 │ Apple             ║
╟────────────┼───────────────────┼───────────────────╢
║ 1          │ Orange            │ 2                 ║
╚════════════╧═══════════════════╧═══════════════════╝

Expected behavior Value should be mapped to correct column depending on the object key.

febilyt avatar Feb 01 '23 11:02 febilyt

Nb. If fields are not defined in the first object

  let data = [
    { Id: 1, Name: "Apple" },
    { Name: "Orange", Id: 2 },
    { Name: "Grape", Id: 3, Type: "Red" },
  ];

  let df = new dfd.DataFrame(data);
  console.log(dfd.toJSON(df));

They will be dropped:

[
    { Id: 1, Name: "Apple" },
    { Id: "Orange", Name: 2 },
    { Id: "Grape", Name: 3 },
  ];

Including all fields in the first object

  let data = [
    { Id: 1, Name: "Apple", Type: "Green" },
    { Name: "Orange", Id: 2 },
    { Name: "Grape", Id: 3, Type: "Red" },
  ];

  let df = new dfd.DataFrame(data);
  console.log(dfd.toJSON(df));

Does result in all keys being in the output data (but still incorrectly mapped):

  [
    { Id: 1, Name: "Apple", Type, "Green" },
    { Id: "Orange", Name: 2, Type, undefined },
    { Id: "Grape", Name: 3, Type: "Red" },
  ];

The above both fail with .print() called on the df "Table must have a consistent number of cells."

S-L-Moore avatar Feb 20 '23 23:02 S-L-Moore

I am facing the same issue as @febilyt. Are there any intermediate solutions to this other than fixing the ordering of the data array up front?

erasromani avatar Feb 23 '23 16:02 erasromani

I wondered if a new DataFrame could be created & then have rows & columns dynamically added but I ran into a few issues with addColumn/append:

  //let dft = new dfd.DataFrame();
  //dft.addColumn("ID", [0]);     // Error: column length mismatch
  //dft.append([[0, 0, 0]], [0]); // Error: values must match #columns
  //dft.append([[1, 2, 3]], [0], { inplace: true });
  //                              // Fails if the row doesn't exist (~overwrite)

Cobbled together a quick solution based on farming the unique keys:

  // Get the unique column names from the data
  let column_names: Set<string> = new Set();
  data.forEach((o) => {
    column_names = new Set([...column_names, ...Object.keys(o)]);
  });

  // Initialize empty DataFrame
  let df = new dfd.DataFrame([[...column_names].map((x) => undefined)], {
    columns: [...column_names],
  }).drop({ index: [0] });

  // Append each row
  data.forEach((o, i) => {
    df = df.append([[...column_names].map((name) => o[name])], [i]);
  });

I've not used danfojs yet so I'm not quite sure how well the undefined will be handled however the above does get the data loaded as expected:

0 : {Id: 1, Name: 'Apple', Type: 'Green'}
1 : {Id: 2, Name: 'Orange', Type: undefined}
2 : {Id: 3, Name: 'Grape', Type: 'Red'}

S-L-Moore avatar Feb 25 '23 21:02 S-L-Moore