Tom Augspurger
Tom Augspurger
One question about the codec pipeline: what's the benefit of batching? Typically, batching is done to increase throughput: at the cost of some latency, you can sometimes process more things...
Thanks. https://github.com/zarr-developers/zarr-python/pull/2863 is adding the wrappers for a GPU-based Zstd codec. That does indeed benefit from the batched codec approach, since the underlying library (nvcomp) accepts a batch of buffers...
Note: I think that all the pieces should be in place thanks to dask-glm. This should be a matter of translating the scikit-learn API to a linear regression with dask-glm's...
I think that ```python from dask_ml.datasets import make_regression from dask_glm.regularizers import L1 from dask_glm.estimators import LinearRegression X, y = make_regression(n_samples=1000, chunks=100) lr = LinearRegression(regularizer=L1()) lr.fit(X, y) ``` Is basically correct....
It should certainly be possible, but I'm not sure offhand how much work it'll be. On Wed, Jun 6, 2018 at 11:27 AM, jakirkham wrote: > Hmm...so when scikit-learn implements...
> It does seem rather unfortunate to have to list all of the ids defined in the core spec redundantly in order to exclude them as valid extension names. My...
> If something is missing from the list of exclusions then it will just also validate as an extension, meaning the configuration doesn't get checked. Mmm here's what I had...
This came up again in https://github.com/zarr-developers/zarr-python/pull/2876#pullrequestreview-2725267688. I'm going to reopen this and look into how we could move everything to `zarr._core` and deprecate the module with a warning.
xref https://github.com/zarr-developers/zarr-python/pull/2876. Once that's released we can (conditionally) update the imports in numcodecs to use the public API (and IIRC Icechunk had some references to `zarr.core.buffer` too) Then once numcodecs...
This snippet is probably failing for the same reason: ```python engine = pl.GPUEngine(executor="streaming", executor_options={"max_rows_per_partition": 2}) df = pl.LazyFrame({"a": ["1", "2", None, "4"]}) q = df.select(pl.col("a").cast(pl.Int8)) q.collect(engine=engine) ``` ```pytb --------------------------------------------------------------------------- ComputeError...