`from_pandas` should be more flexible than requiring a full row on ingestion
I am trying to convert my project from using sparse arrays to dense arrays, and I ran into a lot of problems while trying to use the same methods I had been using on sparse arrays, specifically from_pandas.
Is it correct that TileDB requires an entire row of data to be consumed at the same time in order to use from_pandas?
The data I work with is represented in a dataframe as MultiIndex and is very variable in size (State-sized LiDAR pointcloud data), with a high likelihood that 1 row of data is too large to consume at once while also running all the pre-tiledb processes I need to run over it.
To me, it should be possible to call from_pandas on a dataframe that matches your TileDB array, and have it inserted to the Array based on the indices it finds there. When I followed from_pandas through it's flow, I noticed that much of the logic required for this is already available, but skipped over or not used in favor of using a row index slice.
I have created a branch where I've written a preliminary implementation of the feature (and a test) with no interruption to current usage, and I can make a PR if you're interested in it: https://github.com/kylemann16/TileDB-Py/commit/60defc0b92057323855aa7e479e9a64e65c9e0a2
It's a pretty rudimentary implementation, and I'm certain I don't know all the implications it would have, but it passes tests and works when I use it for my project.
If I'm missing something and this is redundant, or if it's not in line with how you you'd like TileDB-py to work, I'd love to get some feedback/discussion on this going. As it currently is, from_pandas is only useful to me in a sparse array scenario.
Thank you!