polars icon indicating copy to clipboard operation
polars copied to clipboard

Outer Join and other Joins can hang execution when they contain class objects as columns

Open 3twalker3 opened this issue 3 years ago • 3 comments

The following will freeze pola.rs

class silly():
    def __init__(self,junk: str):
        self.junk = junk
 
data1={
    'Key':[1,2,3],
    'Value':['a','b','c'],
    'classValue':[silly('a'),silly('b'),silly('c')]
}
 
data2={
    'Key':[1,2],
    'Value':['x','y'],
    'classValue':[silly('x'),silly('y')]
}
 
dd1 = DataFrame(data1)
dd2 = DataFrame(data2)
 

dd3=dd1.join(dd2,on="Key",how="outer")

3twalker3 avatar Jun 20 '22 05:06 3twalker3

Hmm... Object support is still not optimal in polars. Could you tell me a bit about what you want to do? Big chance we can do it without objects.

ritchie46 avatar Jun 22 '22 05:06 ritchie46

Hey there!

Thanks for the mail.

I’m actually trying to put a dictionary of ints to floats as a column. Support of a dictionary would be amazing!!!

On Jun 22, 2022, at 1:28 AM, Ritchie Vink @.***> wrote:

 Hmm... Object support is still not optimal in polars. Could you tell me a bit about what you want to do? Big chance we can do it without objects.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

3twalker3 avatar Jun 22 '22 12:06 3twalker3

Maybe a struct would be useful, if you want to store a dict per cell in the dataframe? https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.struct.html

zundertj avatar Jul 18 '22 21:07 zundertj

This will work in the next Polars release.

Random python object support is improved in: https://github.com/pola-rs/polars/pull/5275

In [3]: 
   ...: 
   ...: class silly():
   ...:     def __init__(self,junk: str):
   ...:         self.junk = junk
   ...: 
   ...: data1={
   ...:     'Key':[1,2,3],
   ...:     'Value':['a','b','c'],
   ...:     'classValue':[silly('a'),silly('b'),silly('c')]
   ...: }
   ...: 
   ...: data2={
   ...:     'Key':[1,2],
   ...:     'Value':['x','y'],
   ...:     'classValue':[silly('x'),silly('y')]
   ...: }
   ...: 
   ...: dd1 = pl.DataFrame(data1)
   ...: dd2 = pl.DataFrame(data2)
   ...: 
   ...: 
   ...: dd3=dd1.join(dd2,on="Key",how="outer")
   ...: 

In [4]: dd3
Out[4]: 
shape: (3, 5)
┌─────┬───────┬─────────────────────────┬─────────────┬────────────────────────┐
│ Key ┆ Value ┆ classValue              ┆ Value_right ┆ classValue_right       │
│ --- ┆ ---   ┆ ---                     ┆ ---         ┆ ---                    │
│ i64 ┆ str   ┆ object                  ┆ str         ┆ object                 │
╞═════╪═══════╪═════════════════════════╪═════════════╪════════════════════════╡
│ 1   ┆ a     ┆ <__main__.silly object  ┆ x           ┆ <__main__.silly object │
│     ┆       ┆ at 0x7f86...            ┆             ┆ at 0x7f86...           │
├╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2   ┆ b     ┆ <__main__.silly object  ┆ y           ┆ <__main__.silly object │
│     ┆       ┆ at 0x7f86...            ┆             ┆ at 0x7f86...           │
├╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3   ┆ c     ┆ <__main__.silly object  ┆ null        ┆ null                   │
│     ┆       ┆ at 0x7f86...            ┆             ┆                        │
└─────┴───────┴─────────────────────────┴─────────────┴────────────────────────┘

ghuls avatar Oct 21 '22 22:10 ghuls