hail icon indicating copy to clipboard operation
hail copied to clipboard

hl.Table.to_pandas() generates a dtype=string dataframe which is still experimental

Open mkanai opened this issue 2 years ago • 1 comments

I encountered an error in an external package when I used a Hail-generated pandas data frame, which is due to an unsupported dtype pandas.StringDtype. https://github.com/biocore-ntnu/pyranges/pull/264

Given it's still experimental in pandas, can we have an option to generate a data frame that have dtype=object string columns? or maybe, we should make dtype=object default. https://github.com/hail-is/hail/blob/c4b09953f62cea090c8ab2026bc81851b9f4d64a/hail/python/hail/table.py#L3345-L3346

mkanai avatar Apr 06 '22 22:04 mkanai

Great suggestion Masa! We can provide a types argument to to_pandas which allows the user to override the type for a subset of columns. I've marked this help wanted. If someone on the team has some spare cycles they might pick it up. We also welcome PRs to make this change!

danking avatar Jun 02 '22 18:06 danking

Closing the loop: this was released into 0.2.110!

danking avatar Mar 13 '23 21:03 danking