superduper [MISC] Check that `pandas` can be used to connect to multiple tables

[MISC] Check that `pandas` can be used to connect to multiple tables

Open blythed opened this issue 10 months ago • 2 comments

Some extra definition:

With superduper we should connect like this:

db = superduper('parent_directory/*.csv')

This means that if we need output tables, these should be saved as 'parent_directory/<name-of-output-table>.csv'.

We will need BytesEncoding.base64 everywhere, and we should somehow save the output table after every computation.

We should restrict this so that it does not work in cluster mode.

Apr 03 '24 07:04 blythed

For example:


db = superduper(['customers.csv', 'orders.csv'])

table = Table('orders')

db.execute(table.filter(table.brand == 'Nike'))

Apr 03 '24 07:04 blythed

As part of [TEST-USE] Transfer learning #1967 I am trying this

from superduperdb import superduper
db = superduper(['sample.xlsx'], metadata_store=f'mongomock://meta')

and I am getting this error ValueError: Couldn't auto-identify ['sample.xlsx'], please wrap explicitly using ``superduperdb.components.*`` Any inputs will be appreciated, thank you.

Apr 17 '24 08:04 Lalith-Sagar-Devagudi

superduper superduper copied to clipboard

[MISC] Check that `pandas` can be used to connect to multiple tables

superduper
superduper copied to clipboard