polars
polars copied to clipboard
`split_exact` suggestions
I'm looking for a feature similar to Python's maxsplit option in str.split. split_exact appears to address a similar need, although it looks like it splits on every occurrence of the separator and only returns the first n+1 segments:
>>> "a_b_c_d".split("_", maxsplit=1)
['a', 'b_c_d']
>>> pl.Series(["a_b_c_d"]).str.split_exact("_", 1)
shape: (1,)
Series: '' [struct[2]]
[
{"a","b"}
]
I'm not sure if this was the intended behavior of split_exact, but I would find it more useful if the result matched python's maxsplit behavior. Either way, I would love for this to be the default or a configurable behavior.
The struct-type return is also a little awkward, and I'd prefer a list instead. Is that a reasonable change to make?
The struct-type return is also a little awkward, and I'd prefer a list instead. Is that a reasonable change to make?
A list type has a more expensive memory format as it is designed to deal with varying length elements. A struct can be decomposed in all fields zero cost.
Regarding the max_split. The function seems different, maybe we can add that as well.
Addressed via #4373