blythed
blythed
For example, in Hugging Face the following are available:
Encoders detailed in saved `Document` instances should be versioned: ```python {'_content': {'bytes': b'...', 'encoder': 'bla/'}} ```
Currently if something goes wrong during `Component` creation, we don't abort creation gracefully. For example, `db.add(VectorIndex(...))`. We should handle deleting the contained components.
We currently have custom formatting statements `__doc__ = __doc__.format(...)`. This should be handled in a simple way.
- Old `encode` - `_register_class` - `handle_integration` - `unique_id` - `build` - `find_leaf_cls`
Add an optional parameter `bytes_encoding` to `Document.encode` so that we can send encoded `Document` instances via REST interfaces. `bytes_encoding` can take 2 values: `"bytes"` and `"base64"` - i.e. `config.BytesEncoding` **Tasks**...
@kartik4949 to add information, discussion points, diagrams, links.
Each backend should be supported with `superduper`: **Document store** [MongoDB](https://www.mongodb.com/) **Embedded** [SQLite](https://www.sqlite.org/index.html) [DuckDB](https://duckdb.org/) **Classical SQL** [PostgreSQL](https://www.postgresql.org/) [MSSQL](https://www.microsoft.com/en-us/sql-server/) [MySQL](https://www.mysql.com/) [Oracle](https://www.oracle.com/database/) **Data-lake** [BigQuery](https://cloud.google.com/bigquery) [ClickHouse](https://clickhouse.com/) [Snowflake](https://www.snowflake.com/en/) **Tables** [pandas](https://pandas.pydata.org/) [Polars](https://www.pola.rs/) **Data thingy** [PySpark](https://spark.apache.org/docs/3.3.1/api/python/index.html) [Trino](https://trino.io/)...
1. Compute features 2. Train PCA on finished computed features 3. Compute dimension reduced features ```python class DimReduceModel(Model): trainer: ... upstream: ... listener1 = Listener(features_model, ...) dim_reduce_model = DimReduceModel(trainer=PCATrainer(), upstream=[listener1])...