Artjoms Iskovs
Artjoms Iskovs
When uploading a Parquet file to an existing table, the schemas must match one-to-one, even if the uploaded file can be coerced into the table type without information loss. For...
Follow-up to https://github.com/splitgraph/seafowl/issues/20 Currently, we compute the ETag based on all versions of Seafowl tables in a query. This disregards: - Contents changing when the version doesn't (e.g. using https://www.splitgraph.com/docs/seafowl/guides/baking-dataset-docker-image...
Currently we only support the JSON Lines output format and (IIRC) load the whole response into memory to serialize it before forwarding it to the client. It would be nice...
Alluded to in https://github.com/splitgraph/seafowl/issues/48. Start a transaction before planning a batch of Seafowl statements, roll it back on error and commit on success (before returning a result): https://docs.rs/sqlx/latest/sqlx/struct.Transaction.html . Useful...
Currently, our WASM functions only support passing basic types like ints and floats. In order to be able to pass something more complex like strings or datetimes, we want to...
We currently do not support UDAFs (user defined aggregation functions), even though DataFusion does (https://docs.rs/datafusion/latest/datafusion/physical_plan/udaf/struct.AggregateUDF.html). The most basic implementation would be expecting the WASM function to be an "accumulator" (which...
Done in https://github.com/splitgraph/seafowl/pull/71: ~~- `MemoryManagerConfig`: max runtime memory usage for plan execution (rough, since it doesn't track basic process data structures): https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/src/execution/memory_manager.rs#L35-L55 / https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/src/execution/runtime_env.rs#L141-L145~~ ~~- `DiskManagerConfig`: using the OS temp...
Add ability to cache query results in the same object storage that we use for actual Parquet files. This might not be crucial if we implement https://github.com/splitgraph/seafowl/issues/20 (in which case...
(followup to https://github.com/splitgraph/seafowl/issues/20) The current default is not sending any cache-control headers at all, which means that the browser/CDNs will lean towards more caching. (from my basic testing, it seems...