buckaroo
buckaroo copied to clipboard
Figure out better autocleaning comparison
Checks
- [X] I have checked that this enhancement has not already been requested
How would you categorize this request. You can select multiple if not sure
Auto Cleaning, Performance
Enhancement Description
polars makes some autocleaning functionality very difficult, particularly comparing original to modfified across different dtypes. This makes it much more difficult to color and add tooltips to the resulting dataframe based on modifications.
pl.DataFrame({'a_raw':["not_parseable", "30"], 'a_cleaned': [None, 30]})
pl.select(pl.col("a_raw").eq("a_cleaned"))
which they shouldn't equal each other because their different types... but you cant do this either
pl.DataFrame({'a_raw': pl.Series(["not_parseable", 30], dtype=pl.Object), 'a_cleaned': [None, 30]})
pl.select(pl.col("a_raw").eq("a_cleaned"))
you can't even do this
pl.DataFrame({'a_raw':["not_parseable", 30], 'a_cleaned': [None, 30]})
pl.select(pl.struct(["a_raw", "a_cleaned"]).map_elements(lambda x: x[0] == x[1]))
Because you can't put an object into a struct
Pseudo Code Implementation
This might require writing some custom expressions. particularly a version of cast that returns a struct with the original
Prior Art
N/A