breadroll icon indicating copy to clipboard operation
breadroll copied to clipboard

Feature request: strongly typed dataframes

Open itsyoboieltr opened this issue 1 year ago • 3 comments

The big advantage of using this over python would be speed, and also that it could be made completely type-safe. If the user knows the rows/columns ahead of time (which is most likely the case) I would suggest we add the possibility to pass a type parameter.

import Breadroll, { Dataframe } from 'breadroll';

interface Row {
  name: string;
  age: number;
  city: string;
}

const csv: Breadroll = new Breadroll({ header: true, delimiter: ',' });

const df: Dataframe<Row> = await csv.open.local<Row>('./data/input.csv');

This would ideally make all functions completely type-safe: you should not be able to access columns that do not exist, you have autocomplete for filtering, etc.

itsyoboieltr avatar Mar 03 '24 16:03 itsyoboieltr

Big thanks, @itsyoboieltr, this is a fantastic feature suggestion! ✨ it's already rocketed its way onto the roadmap, as we think about methods of implementation. We'll see how we can ship this :shipit: as soon as we can.

devsgnr avatar Mar 04 '24 11:03 devsgnr

In addition to the type-level safety, maybe there should also be an option for runtime validation too, similar to how trpc does it - Input & Output Validators. This would probably be too costly for very large datasets, but for small-to-medium sized ones could prove to be useful, to make sure the data is sane. Interestingly, most of these validators also support some kind of data transformation, encoding-decoding, etc, so allowing their use in some way would immediately increase the array of features this package also provides :D

itsyoboieltr avatar Mar 04 '24 12:03 itsyoboieltr

These are some great insights, data transformation like encoding and decoding are one of the next things we are looking at, since we already have the .apply method. 🚀 Thanks

devsgnr avatar Mar 04 '24 13:03 devsgnr