hamilton icon indicating copy to clipboard operation
hamilton copied to clipboard

[good first issue - intermediate/advanced] Moar data adapters (i.e. materializers)!

Open elijahbenizzy opened this issue 2 years ago • 2 comments

Is your feature request related to a problem? Please describe. We have a few default ones, and could use a ton of extensions.

Describe the solution you'd like Moar data adapters!

Describe alternatives you've considered These should be small/lightweight extensions. Similar to the rest of the extension framework.

Some ideas:

  1. postgresql (needs some SQL stuff, might be complicated. Also connection management...)
  2. duckdb (ditto)
  3. arrow integrations
  4. s3-specific ones
  5. big query
  6. image
  7. image groups
  8. geopandas shape file
  9. Pandas on spark for ^^^ + the defaults
  10. Polars for ^^^ + the defaults
  11. openml data sets -- making it easy to load them
  12. huggingface -- making it easy to pull things from that ecosystem

Additional context Will need extension help -- see, for example, https://github.com/DAGWorks-Inc/hamilton/blob/main/hamilton/plugins/pandas_extensions.py to get started. The TL;DR is that the extensions are responsible for registering, and raise a NotImplementedError if the required dependency isn't present (which then gets caught downstream). They're all loaded here: https://github.com/DAGWorks-Inc/hamilton/blob/bfc2300bf06cd83ab28b514474732c7a27698dd3/hamilton/function_modifiers/base.py#L26. That way you don't have to have extra setup.py, etc... for each one.

elijahbenizzy avatar Apr 25 '23 03:04 elijahbenizzy

postgresql (needs some SQL stuff, might be complicated. Also connection management...)

How about tackling SQLite first to side-step the connection management complexity?

Scorpil avatar Oct 01 '23 10:10 Scorpil

postgresql (needs some SQL stuff, might be complicated. Also connection management...)

How about tackling SQLite first to side-step the connection management complexity?

@Scorpil Yep that would simplify things, but to narrow the issue down, it's also a question of what the object type is being serialized/deserialized? We have Pandas SQL (#355 ) that covers this for Pandas dataframes, but not for other object types. Did you have an object type in mind to serialize to/from SQLite?

skrawcz avatar Oct 02 '23 03:10 skrawcz

Closing this issue in favor of creating more specific ones.

skrawcz avatar Jul 18 '24 19:07 skrawcz