danfojs icon indicating copy to clipboard operation
danfojs copied to clipboard

Add a feature for inserting a column

Open goodPointP opened this issue 4 years ago • 6 comments

It would be really useful if there was a method that could insert a column into an existing Dataframe between two existing columns. I know about .addColumn, but that seems to place the new column at the end of the Dataframe.

For example:

df.print()

A | B 
======
7 | 5
3 | 6

df.insert({ "afterColumn": "A", "newColumnName": "C", "data": [4,1], inplace: true }) df.print()

A | C | B 
==========
7 | 4 | 5
3 | 1 | 6

goodPointP avatar Nov 22 '21 16:11 goodPointP

Thanks for the suggestion. We will consider it. If you're interested in working on this, let us know.

risenW avatar Nov 28 '21 07:11 risenW

@risenW i'm trying to implement this since it's needed in the groupby implementation.

But looking at the code to add new column, it looks expensive, like we had to look through all the datapoint to add a new column

const newData = [];
const oldValues = this.$data;
for (let i = 0; i < oldValues.length; i++) {
  const innerArr = [...oldValues[i]]; 
  innerArr.push(colunmValuesToAdd[i]);
  newData.push(innerArr);
}

this can be more expensive for large dataset, but i'm thinking if we represent the data like this

data = {
 A: [1,2,3,4],
 B: [5,6,7,8]
}

For we to add new column it will just be like this:

data['C'] = [7,8,9,10]

With this we don't need to loop through all the datapoint.

And also, to implement this features for inserting column at any index(or after a column), it can be done like this with our current addColumn method

const newData = [];
const oldValues = this.$data;

if (typeof cIndex === "string" && cIndex in this.columns) {
    cIndex = this.columns.indexOf(cIndex)
}
for (let i = 0; i < oldValues.length; i++) {
  const innerArr = [...oldValues[i]]; 
  innerArr.splice(cIndex, 0, colunmValuesToAdd[i]); // this will add the new column data to the respective index point
  newData.push(innerArr);
}

this look easier to implement with the current code, but we are looping through lot of data point, but using the other method

let data = {
 A: [1,2,3,4],
 B: [5,6,7,8]
}
// and we want to add column C to data after A which is index 0 in columns [A, B]
// we can just split the data object at A if A is given, or we can use the index
let newData = {}
if (typeof cIndex === "number" && cIndex <= this.columns.length) {
    cIndex = this.columns[cIndex]
}
for(let [key, values] of Object.entries(data)) {
   if(key === cIndex) {
       //let assume the new column should be added after cIndex 'A'
       newData[key] = values
       newData[newColumn] = newValues
   } else {
     newData[key] = values
  }
}

let df = new DataFrame(newData) // for inplace we can just update the internals with the newData

So i think this new implementation will be quite fast. what do you think.

steveoni avatar Dec 27 '21 07:12 steveoni

@steveoni Representing internal DS with an object is not optimal because of other operations we do on DataFrames and Series. One major example is indexing by position, objects are not as efficient as Arrays when it comes to indexing by position, and also JS does not guarantee the position of object entries. There is a myriad of other reasons I considered as well.

Regarding this issue, at most, it will take O(n), and that's fine for an insert operation.

Also, side note, the column names array has a 1-1 mapping with data array so, if you get the index/position of the column name to add after, you can simply insert the new column and data in their corresponding arrays using the index/position.

Using the example DataFrame above, we have something like:



let new_cols = [...this.columns] // A, B
let new_col_position = 1 //After column A, you will use a loop with condition to get this index
new_cols.splice(new_col_position, 0, "C"); // A, C, B

//Using the position, you can directly splice the data array as well. 
let new_data = [...this.data] //only do this copy if the operation is happening in place, else use splice directly
new_data.splice(new_col_position, 0, [4, 1]); //Add the new data in that position. 

Hope this helps

risenW avatar Dec 27 '21 07:12 risenW

@risenW okay i get. thats cool.

but i don't get what you trying to do here

//Using the position, you can directly splice the data array as well. 
let new_data = [...this.data] //only do this copy if the operation is happening in place, else use splice directly
new_data.splice(new_col_position, 0, [4, 1]); //Add the new data in that position.

but i will go with the example i have before, which is similar to the example you gave

steveoni avatar Dec 27 '21 11:12 steveoni

@risenW okay i get. thats cool.

but i don't get what you trying to do here

//Using the position, you can directly splice the data array as well. 
let new_data = [...this.data] //only do this copy if the operation is happening in place, else use splice directly
new_data.splice(new_col_position, 0, [4, 1]); //Add the new data in that position.

but i will go with the example i have before, which is similar to the example you gave

Splice by default will mutate the array, so I'm saying that you should make a copy first.

risenW avatar Dec 27 '21 11:12 risenW

Yeah sure. I’m aware of that

steveoni avatar Dec 27 '21 11:12 steveoni