Weston Pace
Weston Pace
An error is thrown: ``` thread 'lance_background_thread' panicked at /home/pace/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-buffer-52.2.0/src/buffer/immutable.rs:222:9: the offset of the new Buffer cannot exceed the existing length note: run with `RUST_BACKTRACE=1` environment variable to display a...
Python APIs that process large binary values often except a file-like object. We should create an API for scanning a large binary column that produces file-like objects.
Large lance files are often created by compacting smaller lance files. E.g. one thousand 1GB lance files compacted into a single 1TB file. Creating these large files is very difficult...
The current encoding creates many small pages (small as in few rows per page). A better design would be to store the position/length of each individual value and save a...
When searching an IVF/PQ index we need to load partitions. Currently we are using the CPU count to determine how many partitions to load in parallel. However, this is primarily...
When using `binary` or `string` columns a single batch of data cannot contain more than 2GiB of data. Users will either need to use `large_binary` and `large_string` or make sure...
A breaking change was made to the binary encoding in 0.14.0. v2 was still labeled as experimental at the time. However, a significant number of people have run into this...
A single page of list offsets might (will usually) map to many pages of list items. This is especially true if the list items are large. Currently, we schedule all...
Users can use the fragment API to create fragments and then commit a dataset using the python `commit` method (this is an advanced use case). However, it is not possible...
For example, sample is ok (the `kwargs` are passed through to `take`): ``` def sample( self, num_rows: int, columns: Optional[Union[List[str], Dict[str, str]]] = None, randomize_order: bool = True, **kwargs, )...