tidypandas
tidypandas copied to clipboard
[feature] Implement `tidyselect`
Column names should also support functions along with strings:
df.select(['a', 'b', 'c']) # regular
df.select(['a', 'b', starts_with('c'), ends_with('d'), contains('some_regex')])
tidyselect should power all the methods that take column_names
as the input.
Also, replace_na
, should be changed to accept list of columns and/or column selector in the key of the dictionary being passed.
Also, is the value
appropriate argument name in replace_na
?
IMHO, value should be a single value for simplicity. If we allow a list of values to be passed, then we implicitly already the columns to be renamed right?
my suggestion:
df.replace_na({ends_with("width"): 0})
and not df.replace_na({ends_with("width"): [0, 1]})
as in latter case, usually we might not know how many columns get selected.
To expand tidyselect for following methods
count add_count nest_by expand complete Summarise -> in by Mutate -> in by
Is there is an easy way to filter using tidyselect. Like:
df.filter(starts_with("x") < 10)