Sameer Deshmukh

Results 235 comments of Sameer Deshmukh

Alternatively we can create a C extension over libcsv as an nmatrix plugin and use that for loading data into dataframes. https://github.com/SciRuby/nmatrix/issues/407

No we'll keep it MRI specific. JRuby should have another library for CSV importing (I think jCSV from Rodrigo Botafogo can do the job - https://github.com/rbotafogo/jCSV

Broadcasting would basically involve changing the internal data structures in such a way that they are more efficient and reduce copying of data whenever possible. For example, pandas uses numpy...

@kou do you think we should leapfrog to using Apache Arrow Tensor directly for internal storage? I am seriously considering an overhaul of the daru storage infrastructure given the speed...

@mrkn if you have experience with arrow can you please shed some light on this?

It's too much overhead for daru to store everything as a string or symbol and then access it. It's more simple and straightforward to have the user take care of...

@Shekharrajak can you link this issue into your repo?

I think we should shift to a C based CSV parser like paratext. See https://github.com/SciRuby/daru/issues/170

@athityakumar can you explore if it is possible to optimize from_csv using @info-rchitect's method?