superduper icon indicating copy to clipboard operation
superduper copied to clipboard

[MISC] Check that `pandas` can be used to connect to multiple tables

Open blythed opened this issue 10 months ago • 2 comments

Some extra definition:

With superduper we should connect like this:

db = superduper('parent_directory/*.csv')

This means that if we need output tables, these should be saved as 'parent_directory/<name-of-output-table>.csv'.

We will need BytesEncoding.base64 everywhere, and we should somehow save the output table after every computation.

We should restrict this so that it does not work in cluster mode.

blythed avatar Apr 03 '24 07:04 blythed

For example:


db = superduper(['customers.csv', 'orders.csv'])

table = Table('orders')

db.execute(table.filter(table.brand == 'Nike'))

blythed avatar Apr 03 '24 07:04 blythed

As part of [TEST-USE] Transfer learning #1967 I am trying this

from superduperdb import superduper
db = superduper(['sample.xlsx'], metadata_store=f'mongomock://meta') 

and I am getting this error ValueError: Couldn't auto-identify ['sample.xlsx'], please wrap explicitly using ``superduperdb.components.*`` Any inputs will be appreciated, thank you.

Lalith-Sagar-Devagudi avatar Apr 17 '24 08:04 Lalith-Sagar-Devagudi