Weston Pace
Weston Pace
We have some basic merge insert functionality in lance today. Other databases have more sophisticated capabilities. Perhaps the most extensive and well documented is the "[merge into](https://learn.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql?view=sql-server-ver16)" from SQL server....
The current batch size is a max_batch_size only. It will slice large batches but will not merge smaller batches. There are cases where it is important for users to have...
We should return the # of inserted rows, # of updated rows, and # of deleted rows (maybe also the # of matched/not matched but unmodified rows)?
As reported from Discord: ``` thread 'lance_background_thread' panicked at /home/runner/work/lance/lance/rust/lance/src/utils/tokio.rs:34:24: called Result::unwrap() on an Err value: RecvError(()) thread 'lance-cpu' panicked at /home/runner/work/lance/lance/rust/lance-index/src/vector/pq/utils.rs:63:5: sub_vector idx: 96, num_sub_vectors: 96 Traceback (most recent...
We've setup a lance dataset on google [filestore](https://cloud.google.com/filestore#key-features). When we scan it we expect to get fairly high bandwidth but we are not seeing that in practice.
`write_dataset` panics if the input is a record batch reader and it receives an empty record batch
If the column is marked non-null and the column is capable of representing nulls, then this will trigger a panic because the statistics collector will record the min/max as null...
merge_insert uses a datafusion join internally. Since we do not provide a RAM limit to datafusion it uses a lot of RAM by default to run the JOIN. We should...
The I/O scheduler receives batches of IOPS. For example, a take operation, on a value encoded page, will yield a single batch with one IOP per index. We should coalesce...
#1959 describes coalescing in general and should add it for IOPS within a batch. However, there are cases where it makes sense to coalesce requests between batches, especially on filesystems...
When we are reading a Lance file the scheduler will usually submit requests in the order that they are needed. The exception to this is any field that needs indirect...