danfojs icon indicating copy to clipboard operation
danfojs copied to clipboard

Can't create a dataframe from a horizontal array?

Open joshuakoh1 opened this issue 3 years ago • 8 comments

const arr = [200, 400, 600, 800, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950, 2000, 2050, 2100, 2150, 2200, 2250, 2300, 2350, 2400, 2450, 2500, 2550, 2575, 2600, 2625, 2650, 2675, 2700, 2725, 2750, 2775, 2800, 2825, 2850, 2875, 2900, 2925, 2950, 2975, 3000, 3025, 3050, 3075, 3100, 3125, 3150, 3175, 3200, 3225, 3250, 3275, 3300, 3325, 3340, 3350, 3360, 3370, 3375, 3380, 3390, 3400, 3410, 3420, 3425, 3430, 3440, 3450, 3460, 3470, 3475, 3480, 3490, 3500, 3510, 3520, 3525, 3530, 3540, 3550, 3560, 3570, 3575, 3580, 3590, 3600, 3610, 3620, 3625, 3630, 3640, 3650, 3660, …]

const df = new dfd.DataFrame(arr, {columns: ['A']})

Uncaught Error: ParamError: Column names length mismatch. You provided a column of length 1 but Ndframe columns has lenght of undefined

joshuakoh1 avatar Mar 13 '22 19:03 joshuakoh1

const arr = [200, 400, 600, 800, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950, 2000, 2050, 2100, 2150, 2200, 2250, 2300, 2350, 2400, 2450, 2500, 2550, 2575, 2600, 2625, 2650, 2675, 2700, 2725, 2750, 2775, 2800, 2825, 2850, 2875, 2900, 2925, 2950, 2975, 3000, 3025, 3050, 3075, 3100, 3125, 3150, 3175, 3200, 3225, 3250, 3275, 3300, 3325, 3340, 3350, 3360, 3370, 3375, 3380, 3390, 3400, 3410, 3420, 3425, 3430, 3440, 3450, 3460, 3470, 3475, 3480, 3490, 3500, 3510, 3520, 3525, 3530, 3540, 3550, 3560, 3570, 3575, 3580, 3590, 3600, 3610, 3620, 3625, 3630, 3640, 3650, 3660, …]

const df = new dfd.DataFrame(arr, {columns: ['A']})

Uncaught Error: ParamError: Column names length mismatch. You provided a column of length 1 but Ndframe columns has lenght of undefined

A DataFrame needs a 2D array, so wrap your arr in square brackets. For example:

const df = new dfd.DataFrame([arr], {columns: ['A']})

risenW avatar Mar 13 '22 19:03 risenW

@risenW that still doesn't solve the problem of it being a horizontal array for a single column

Uncaught Error: ParamError: Column names length mismatch. You provided a column of length 1 but Ndframe columns has lenght of 498

Edit: I managed to use dfd.tensorflow.transpose to flip it

joshuakoh1 avatar Mar 13 '22 19:03 joshuakoh1

const arr = [200, 400, 600, 800, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950, 2000, 2050, 2100, 2150, 2200, 2250, 2300, 2350, 2400, 2450, 2500, 2550, 2575, 2600, 2625, 2650, 2675, 2700, 2725, 2750, 2775, 2800, 2825, 2850, 2875, 2900, 2925, 2950, 2975, 3000, 3025, 3050, 3075, 3100, 3125, 3150, 3175, 3200, 3225, 3250, 3275, 3300, 3325, 3340, 3350, 3360, 3370, 3375, 3380, 3390, 3400, 3410, 3420, 3425, 3430, 3440, 3450, 3460, 3470, 3475, 3480, 3490, 3500, 3510, 3520, 3525, 3530, 3540, 3550, 3560, 3570, 3575, 3580, 3590, 3600, 3610, 3620, 3625, 3630, 3640, 3650, 3660,

This is definitely a bug, your initial code is supposed to work fine. I will fix this ASAP

risenW avatar Mar 13 '22 19:03 risenW

Can I handle this??

Tunjii10 avatar May 19 '22 16:05 Tunjii10

Can I handle this??

Sure. Give it a go, let me know if you need any support.

risenW avatar May 19 '22 20:05 risenW

So from what I found out, this bugs happens because by default the dfd.DataFrame class on call of it parents constructor ND.frame sets the isSeries parameter to false by default. super({ data, index, columns, dtypes, config, isSeries: false }); The horizontal array here is basically a series. The get shape function in the generic.ts file throws the recieved error because it calculates row/col as multi row/col which is not (1 col, multiple row).

get shape(): Array<number> {
        if (this.$data.length === 0) {
            if (this.$columns.length === 0) return [0, 0];
            else return [0, this.$columns.length];
        }
        if (this.$isSeries) {
            return [this.$data.length, 1];
        } else {
            const rowLen = (this.$data).length
            const colLen = (this.$data[0] as []).length
            return [rowLen, colLen]
        }
    }

I can fix this by carrying out a check on the dataframe data to find if its a series or not then i have to make sure that what is passed to the generic class for horizontal array is 1d if not the number of rows wont be calculated properly. is this fine by you @risenW

Tunjii10 avatar May 20 '22 14:05 Tunjii10

Facing the same issue. Any update on this?

MikeSpock avatar Oct 08 '22 17:10 MikeSpock