stringr icon indicating copy to clipboard operation
stringr copied to clipboard

Add new `str_dedent` function

Open chrimaho opened this issue 2 years ago • 8 comments

One incredibly helpful function in Python is the textwrap.dedent function. Under the hood, this function uses regex to strip any leading spaces, while maintaining any internal indentation within a chunk of code.

This addition here re-implements the same functionality using native R code.

I've ensured to include 4 different unit tests for the same.

chrimaho avatar Jun 10 '23 07:06 chrimaho

I'd also like to have a dedent() function in R. Two comments:

  • Wouldn't it make sense to trim leading and trailing whitespaces in the output or add an argument to do so? That would be very useful when making character strings from multi-line text in R:

    str_dedent("
      This is a long sentence that starts on the first line,
      continues on the second,
      and ends on the third one.
    ")
    

    For readability purposes, it's better to start the text on its own line here. We wouldn't want to keep that blank line at the beginning though. Passing the output to str_trim() every time in that situation would be cumbersome.

  • The function currently doesn't support the following situation (when python's textwrap.dedent() does):

    str_dedent("
        foo
      bar
    ")
    

    It's probably uncommon enough not to support it but that could be detailed in the function documentation.

arnaudgallou avatar Sep 05 '24 09:09 arnaudgallou

Like @lionel-, I think I am also surprised by the trailing \n here

library(stringr)
library(glue)

glue_chr <- function(...) { 
  unclass(glue(...))
}

str_dedent("
  Line 1
  Line 2
  Line 3
")
#> [1] "Line 1\nLine 2\nLine 3\n"
glue_chr("
  Line 1
  Line 2
  Line 3
")
#> [1] "Line 1\nLine 2\nLine 3"

I would have expected the above to give this output

str_dedent("
  Line 1
  Line 2
  Line 3")
#> [1] "Line 1\nLine 2\nLine 3"
glue_chr("
  Line 1
  Line 2
  Line 3")
#> [1] "Line 1\nLine 2\nLine 3"

I think an invariant of this function could be:

Strips all leading and trailing whitespace from the output

which provides a nice symmetry and nice user experience

DavisVaughan avatar Sep 23 '25 13:09 DavisVaughan

@DavisVaughan you mean "Strips all leading and trailing whitespace lines from the output" right?

And you both really think we don't want a trailing new line? If you were going to cat() this you would want a trailing \n?

hadley avatar Sep 24 '25 17:09 hadley

First, I'll just add some bits from glue's documentation for reference:

Empty first and last lines are automatically trimmed, as is leading whitespace that is common across all lines. ... If you want an explicit newline at the start or end, include an extra empty line. ... Leading and trailing whitespace from the first and last lines is removed.

A uniform amount of indentation is stripped from the second line on, equal to the minimum indentation of all non-blank lines after the first.

As for this:

And you both really think we don't want a trailing new line? If you were going to cat() this you would want a trailing \n?

If you're just cat()ing the result of 1 str_dedent() call, I think it doesn't matter because the R console basically adds the trailing newline. And in more complicated situations, this is when I'd probably use cli::cat_line() anyway.

jennybc avatar Sep 24 '25 18:09 jennybc

@jennybc that's only true if it's at the top-level (i.e. if you cat multiple times before returning, you need newlines in between them).

I'm not sure why I'm so far apart on trailing newlines than the rest of you. I thought this was a situation where preserving them was "obviously correct".

hadley avatar Sep 24 '25 19:09 hadley

if you cat multiple times before returning, you need newlines in between them)

I guess that's when I would use cli::cat_line().

We're talking a lot about glue, which stringr imports. Which makes me wonder ... why isn't str_dedent() just glue::trim()? 🤔

jennybc avatar Sep 24 '25 19:09 jennybc

That is a good question. If I replace the existing implementation with a direct call to glue::trim() then I get the following failures:

── Failure ([test-remove.R:12:3](vscode-file://vscode-app/Applications/Positron.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html#)): strips common ws ──────────────────────────────────────────────────────────────────────────────
str_dedent("  Hello\n    World") (`actual`) not equal to "Hello\n  World" (`expected`).

`lines(actual)`:   "Hello" "World"  
`lines(expected)`: "Hello" "  World"

── Failure ([test-remove.R:13:3](vscode-file://vscode-app/Applications/Positron.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html#)): strips common ws ──────────────────────────────────────────────────────────────────────────────
str_dedent("    Hello\n  World") (`actual`) not equal to "  Hello\nWorld" (`expected`).

`lines(actual)`:   "Hello"   "World"
`lines(expected)`: "  Hello" "World"

── Failure ([test-remove.R:25:3](vscode-file://vscode-app/Applications/Positron.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html#)): preserves final newline ───────────────────────────────────────────────────────────────────────
str_dedent("  Hello\n  World\n") (`actual`) not equal to "Hello\nWorld\n" (`expected`).

`lines(actual)`:   "Hello" "World"   
`lines(expected)`: "Hello" "World" ""

── Failure ([test-remove.R:35:3](vscode-file://vscode-app/Applications/Positron.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html#)): preserves final newline ───────────────────────────────────────────────────────────────────────
str_dedent("\n      Hello\n      World\n    ") (`actual`) not equal to "Hello\nWorld\n" (`expected`).

`lines(actual)`:   "  Hello" "  World"   
`lines(expected)`: "Hello"   "World"   ""

I can make most of them go away by adding a leading \n to the strings (which better reflects real use), but the remaining weirdness is this:

cat(
  glue::trim("
    Hello
      World
  ")
)
#>   Hello
#>     World

Created on 2025-09-24 with reprex v2.1.1

I find the extra indent here surprising. But maybe we could fix that?

hadley avatar Sep 24 '25 22:09 hadley

If you're just cat()ing the result of 1 str_dedent() call, I think it doesn't matter because the R console basically adds the trailing newline

The Positron and RStudio consoles do (at top-level as Hadley mentions), but not the R console:

Screenshot 2025-09-25 at 09 57 51

If you were going to cat() this you would want a trailing \n?

I would expect the caller of cat() to add the trailing \n. But I think that's a case where I'd pipe the output to writeLines() (or use cat_line() as Jenny suggests).

lionel- avatar Sep 25 '25 07:09 lionel-