Julia Signell

Results 268 comments of Julia Signell

I think there is value it wrapping this up and getting it in! @jrbourbeau should we just commit my last two suggestions and merge?

Yeah I think existing section of API docs is best for now.

In principle this approach seems fine to me. I think masked arrays are kind of under-supported in Dask in general. So this kind of work is definitely appreciated!

With the recent changes to try to make more informative errors, this has actually gotten a little harder to interpret. I'm wondering if period needs a special case like datetime...

I am taking a look at this now and can only reproduce with the distributed case. I think the issue is happening within the `to_zarr` method actually. If you add...

I was able to get the desired output with this diff: ```diff diff --git a/dask/array/core.py b/dask/array/core.py index 9aa10950..f45a3619 100644 --- a/dask/array/core.py +++ b/dask/array/core.py @@ -3318,10 +3318,9 @@ def to_zarr( #...

I'm not sure what the implications are for performance, but that seems like a reasonable solution to me. I'll open a PR to carry on the conversation.

> As the current PR shows, Dask's implementation is pretty close already, so I'd like to explore closing the gap via incremental changes to `dask.array`, rather than introducing a new...

It looks like pandas allows columns with duplicate column names, but read methods general coerce (csv) or fail (parquet) if duplicate columns exist. It kind of feels like duplicate column...

> If there's a backwards compatibility problem, wouldn't that be caught by some test? Not necessarily. This could just not be any test of this behavior. I am proposing explicitly...