Vyas Ramasubramani
Vyas Ramasubramani
This works now, but with the caveat that categorical column must be explicitly marked as ordered ``` In [1]: import cudf ...: import dask_cudf ...: df = cudf.DataFrame({"a": list("caba"), "b":...
The chain has gotten a bit long here, let me summarize to make sure I have everything right. #15780 will fix #15778. Once that is merged, will this issue also...
@galipremsagar @mroeschke WDYT of the status of this now? The last comment on the associated pandas issue was from Matt, so deferring to you two to decide if there's any...
Hmm is that contradicting your last statement on [the associated pandas issue](https://github.com/pandas-dev/pandas/issues/37100#issuecomment-898187809)? Is there a behavior change needed on the pandas end, then? IIUC pandas is allowing this input through....
CC @shwina @galipremsagar
The alternative resolution to this issue would be if we changed the ColumnAccessor to use a list of columns and a list of names instead of maintaining a dictionary of...
@mroeschke do you know if there are any plans to change the duplicate names support in pandas? There are a lot of ways in which it's kind of broken to...
Sorry, I should clarify. I wasn't only thinking about MultiIndex objects, but also DataFrame objects. For example, pandas lets you do this: ``` In [1]: import pandas as pd In...
OK got it. That is very helpful context, thanks! If that is the case and there is real interest in this in pandas, then we may have to rethink cudf's...
I still see a meaningful performance difference here, but not as drastic of one. Some of this behavior could be largely filesystem-dependent, so my numbers may not be directly comparable...