DataFrame icon indicating copy to clipboard operation
DataFrame copied to clipboard

DataFrameInternal - Using OrderedCollection over Array2D

Open AtharvaKhare opened this issue 6 years ago • 1 comments

DataFrameInternal currently uses Array2D (Previously it used Matrix https://github.com/PolyMathOrg/DataFrame/issues/44)

Is there any specific reason such as speed/functionality for choosing Array2D?

Currently, while adding/removing a row, entire dataframe gets re-created. This becomes problematic for large data - eg: reading a csv file with thousands of rows results in calling addRow for every row in csv. DataFrameInternal is recreated for every such call.

I think using OrderedCollection would be better, since we can add elements at arbitrary indices. Are there any negatives for using OrderedCollection?

AtharvaKhare avatar Jun 20 '19 16:06 AtharvaKhare

The way I was thinking of implementing this is having column-oriented OrderedCollections and contents will also be an OrderedCollection which holds these columns.

To access a row, fetch it's index, and iterate through contents, fetching index-th element for every column.

Will have to do a detailed performance profiling to see speed-downs in fetching row (if any), and speed-ups in adding rows.

AtharvaKhare avatar Jun 21 '19 02:06 AtharvaKhare