blythed issues

Results 117 issues of


                                            blythed

[FEATURE] expose more generative parameters to developer/ user in LLMs

For example, in Hugging Face the following are available:

[BUG] encoders in data are not versioned

Encoders detailed in saved `Document` instances should be versioned: ```python {'_content': {'bytes': b'...', 'encoder': 'bla/'}} ```

[DistEnv] Fail in creation gracefully with `try ... finally`

Currently if something goes wrong during `Component` creation, we don't abort creation gracefully. For example, `db.add(VectorIndex(...))`. We should handle deleting the contained components.

Handle `super()` doc-string parameters in a clever/ simple way.

We currently have custom formatting statements `__doc__ = __doc__.format(...)`. This should be handled in a simple way.

[SERIALIZE] Make `Document` wrapping optional on insert

[SERIALIZE] Cleanup extraneous serialization methods

- Old `encode` - `_register_class` - `handle_integration` - `unique_id` - `build` - `find_leaf_cls`

Support `base64` encoding of `bytes` as `str` in `Document`

Add an optional parameter `bytes_encoding` to `Document.encode` so that we can send encoded `Document` instances via REST interfaces. `bytes_encoding` can take 2 values: `"bytes"` and `"base64"` - i.e. `config.BytesEncoding` **Tasks**...

[DistEnv] Strategy for `ray` on multiple nodes, with sharding as option

@kartik4949 to add information, discussion points, diagrams, links.

Create Notebook Examples for Less Tested Databases in Documentation

Each backend should be supported with `superduper`: **Document store** [MongoDB](https://www.mongodb.com/) **Embedded** [SQLite](https://www.sqlite.org/index.html) [DuckDB](https://duckdb.org/) **Classical SQL** [PostgreSQL](https://www.postgresql.org/) [MSSQL](https://www.microsoft.com/en-us/sql-server/) [MySQL](https://www.mysql.com/) [Oracle](https://www.oracle.com/database/) **Data-lake** [BigQuery](https://cloud.google.com/bigquery) [ClickHouse](https://clickhouse.com/) [Snowflake](https://www.snowflake.com/en/) **Tables** [pandas](https://pandas.pydata.org/) [Polars](https://www.pola.rs/) **Data thingy** [PySpark](https://spark.apache.org/docs/3.3.1/api/python/index.html) [Trino](https://trino.io/)...

epic

[QUEUES] Allow jobs created on `Component` creation to be passed onto downstream components

1. Compute features 2. Train PCA on finished computed features 3. Compute dimension reduced features ```python class DimReduceModel(Model): trainer: ... upstream: ... listener1 = Listener(features_model, ...) dim_reduce_model = DimReduceModel(trainer=PCATrainer(), upstream=[listener1])...