polars icon indicating copy to clipboard operation
polars copied to clipboard

feat: Add `str.head` and `str.tail`

Open mcrumiller opened this issue 1 year ago • 8 comments

Resolves #10337.

mcrumiller avatar Feb 11 '24 21:02 mcrumiller

nice addition! 😃

I would recommend a small change in the test

the check for "foobar" with

  • head(-3) == "foo"
  • tail(-3) == "bar"

is a little confusing because this would also work if the function just took the absolute value.

"abcde" with

  • head(2) == "ab" & head(-2) == "abc"
  • tail(2) == "de" & tail(-2) == "cde"

would be a little clearer to understand and is not ambiguous

Julian-J-S avatar Feb 12 '24 07:02 Julian-J-S

@stinodego I've updated the docstrings with more detail and more examples.

I cannot for the life of me figure out why the CI doctest is failing with an "unexpected indentation" error. My doctests pass fine locally and I can't determine which part is causing the error.

I do note that when I run code locally, Series show 8 spaces of indentation:

>>> import polars as pl
>>> s = pl.Series(["pear", None, "papaya", "dragonfruit"])
>>> s.str.head(-3)
shape: (4,)
Series: '' [str]
[
        "p"
        null
        "pap"
        "dragonfr"
]

And the examples in string.py are a hodgepodge of 4 or 8 spaces. str.explode, for example, has 8 spaces in its docstring examples, but those do not seem to cause an error, whereas str.contains has only 4 spaces in its examples, and also does not cause an error. Could this be the issue?

Edit: I suspect this is the case, as my local doctest does not complain. I've reduced to 4 and we'll see how that fares. Edit2: nope, still failing.

mcrumiller avatar Feb 12 '24 19:02 mcrumiller

It's probably an unclosed backtick. I can take a look. There are some issues with the docstring formatting anyway that I can see won't render.

stinodego avatar Feb 12 '24 19:02 stinodego

When can it be used in rust?

coolstudio1678 avatar Mar 16 '24 06:03 coolstudio1678

It needs to be approved first. @ritchie46 would you mind taking a look?

mcrumiller avatar Mar 16 '24 16:03 mcrumiller

I hope to get to this today. I am a bit worried about sliced that are within char boundaries.

ritchie46 avatar Mar 18 '24 08:03 ritchie46

I am a bit worried about sliced that are within char boundaries.

Do we have reason not to trust str.len(), or do you mean that we must be very careful?

mcrumiller avatar Mar 18 '24 14:03 mcrumiller

Codecov Report

Attention: Patch coverage is 94.20290% with 8 lines in your changes are missing coverage. Please review.

Project coverage is 81.15%. Comparing base (dcee934) to head (0b9f353). Report is 4 commits behind head on main.

Files Patch % Lines
...rates/polars-plan/src/dsl/function_expr/strings.rs 80.76% 5 Missing :warning:
py-polars/src/expr/general.rs 50.00% 2 Missing :warning:
.../polars-ops/src/chunked_array/strings/substring.rs 98.52% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #14425      +/-   ##
==========================================
+ Coverage   81.14%   81.15%   +0.01%     
==========================================
  Files        1363     1363              
  Lines      175282   175408     +126     
  Branches     2527     2527              
==========================================
+ Hits       142236   142360     +124     
- Misses      32568    32571       +3     
+ Partials      478      477       -1     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 18 '24 20:03 codecov[bot]