Tom Augspurger comments

Results 1078 comments of


                                            Tom Augspurger

Codec pipeline memory usage

One question about the codec pipeline: what's the benefit of batching? Typically, batching is done to increase throughput: at the cost of some latency, you can sometimes process more things...

Thanks. https://github.com/zarr-developers/zarr-python/pull/2863 is adding the wrappers for a GPU-based Zstd codec. That does indeed benefit from the batched codec approach, since the underlying library (nvcomp) accepts a batch of buffers...

Add Lasso

Note: I think that all the pieces should be in place thanks to dask-glm. This should be a matter of translating the scikit-learn API to a linear regression with dask-glm's...

Add Lasso

I think that ```python from dask_ml.datasets import make_regression from dask_glm.regularizers import L1 from dask_glm.estimators import LinearRegression X, y = make_regression(n_samples=1000, chunks=100) lr = LinearRegression(regularizer=L1()) lr.fit(X, y) ``` Is basically correct....

Add Lasso

It should certainly be possible, but I'm not sure offhand how much work it'll be. On Wed, Jun 6, 2018 at 11:27 AM, jakirkham wrote: > Hmm...so when scikit-learn implements...

Added json schema

> It does seem rather unfortunate to have to list all of the ids defined in the core spec redundantly in order to exclude them as valid extension names. My...

Added json schema

> If something is missing from the list of exclusions then it will just also validate as an extension, meaning the configuration doesn't get checked. Mmm here's what I had...

Make zarr.core private

This came up again in https://github.com/zarr-developers/zarr-python/pull/2876#pullrequestreview-2725267688. I'm going to reopen this and look into how we could move everything to `zarr._core` and deprecate the module with a warning.

Make zarr.core private

xref https://github.com/zarr-developers/zarr-python/pull/2876. Once that's released we can (conditionally) update the imports in numcodecs to use the public API (and IIRC Icechunk had some references to `zarr.core.buffer` too) Then once numcodecs...

[BUG]: Duplicate values in unary ops with streaming executor and multiple partitions

This snippet is probably failing for the same reason: ```python engine = pl.GPUEngine(executor="streaming", executor_options={"max_rows_per_partition": 2}) df = pl.LazyFrame({"a": ["1", "2", None, "4"]}) q = df.select(pl.col("a").cast(pl.Int8)) q.collect(engine=engine) ``` ```pytb --------------------------------------------------------------------------- ComputeError...