marberi

Results 13 comments of marberi

@phobson The original goal was simply adding a test, since I thought it was fixed and @jakirkham seemed to confim that in https://github.com/dask/dask/issues/4845 . Now, running the test manually (pytest...

Moved this over to test_groupy.py and removed the "np+pd" dependencies. When originally submitting I was not aware of this subdirectory. Also, when originally submitting this request, the problem did not...

@TomAugspurger Could you outline how I could make such a test? I am quite familiar with Pandas from as a user, but not how one would debug this issue.

I will have a try. Tried some simple debugging this morning, inserting some print statements in the Pandas code to see if I could find some hints. Quite tricky, since...

Thanks. It looks very relevant. The Pandas code called by dask crashes in multiple places, but testing with "is_unique" seems to always be involved.

I tried adding a lock in "is_unique" in the Pandas source code, as shown below. Then rerunning I still have the same issue. Does this look like a correct usage...

Ok, I got one step further concerning this. The "is_unique" method in the Index class found in: pandas/pandas/core/indexes/base.py has a "@cache_readonly" decorator. When commenting out this cache, this problem disappears....

Would this be an acceptable fix? I tested acquiring locks inside the ```__get__``` method itself and it did not work. https://github.com/marberi/pandas/commit/ad33858298961d63ab0bc70caf1d04b3ca02b5fc

Ok, also posted there. Let's see what they say.