Humanizer icon indicating copy to clipboard operation
Humanizer copied to clipboard

Investigate using Span

Open vcsjones opened this issue 7 years ago • 17 comments

Humanizer is a very string-heavy library. Since strings are immutable, this often ends up with a lot of strings on the heap for string transformations and parsing. Now that Span and family are in .NET Core 2.1, I would propose the following.

  1. Add netcoreapp2.1 as a target to the project.
  2. Use [ReadOnly]Span<char> in the library itself where possible if targeting netcoreapp2.1.

An example might be the ToTitleCase implementation. This will allocate a string on the heap for every "word" in the string. It's feasible instead to just find all the indexes of the spaces and slice on the string (allocation free) and push the result on to a string builder, and finally return the builder ToString.

Assuming part 1 is OK, part 2 could be broken down in to many smaller pull requests. A later step might be to offer public overloads of APIs to accept ReadOnlySpan<char> where it makes sense.

Do we want to take a dependency on the System.Memory package to offer Span functionality elsewhere, or is netcoreapp enough?

Thoughts?

vcsjones avatar Aug 20 '18 17:08 vcsjones

A PR for this would be very welcome!

clairernovotny avatar Aug 31 '18 14:08 clairernovotny

Just did a quick test here:

https://github.com/Rbn3D/Humanizer/commit/f3fb8714f05f7f37dc7b3056daafbef473afb0df

It's a very small example, not optimal since most usages of the method converted still uses the overload that requires a string, so we need to copy it to a span first (in order to modify it), but for large strings should perform better.

All tests are passing.

The only problem is that, in order to use System.Memory, I had to update the target frameworks of the projects like this:

  • Humanizer.csproj => netstandard1.0 to netstandard 1.6
  • Humanizer.Tests.csproj => net46 to net461

I can make a PR of this when I have some time to convert more methods and think on how public overloads taking Span should look like, so they are efficient but still familiar, without differ too much to the current library design / code style.

Rbn3D avatar Sep 01 '18 11:09 Rbn3D

I'm not sure why netstandard 1.6 is required? When I look at the System.Memory NuGet package, it supports down to netstandard1.1?

clairernovotny avatar Sep 01 '18 12:09 clairernovotny

@onovotny Well, we need at least 1.1, but in my tests I was not able to set it up properly. Not sure if it's related to my development environment setup (using visual studio 2017 v15.8.0), will look at that again.

Rbn3D avatar Sep 01 '18 12:09 Rbn3D

Updated to target netStandard 1.1 (Now working, still unsure why didn't build before) Test pass.

See https://github.com/Humanizr/Humanizer/compare/master...Rbn3D:span-tests

Rbn3D avatar Sep 01 '18 12:09 Rbn3D

Cool...I would also suggest a BenchmarkDotNet project that shows the before/after of these changes for perf and allocations. The benchmark project can reference humanizer as a nuget for the current behavior, and source for the latest. I think it'd help show the gains.

For the purposes of adding Span, I'd be okay with breaking API changes if they're source compatible (implicit casts, etc). That could allow removing the string overloads and taking ReadOnlySpan<char> instead. We can bump the major to 3.0 in the version.json for that.

clairernovotny avatar Sep 01 '18 13:09 clairernovotny

I'm having trouble referencing two versions of Humanizer project for the benchmarkDotNet project.

Extern alias can be used on the local source project, but not for the nuget one, and the compiler doesn't seem to be able to reference both projects in the code (currently fails to recognise the nuget version)

https://github.com/Humanizr/Humanizer/compare/master...Rbn3D:span-tests

Any ideas?

If anyone can checkout this and try it on their side I would appreciate it.

Rbn3D avatar Sep 02 '18 18:09 Rbn3D

I'd suggest using a project configuration and a condition in the csproj based on that

clairernovotny avatar Sep 02 '18 19:09 clairernovotny

After some research, I came to the conclusion that there is no way (AFAIK) to make the benchmark work with a nuget reference and a source one for the same project.

The closest thing I found is this: https://github.com/dotnet/BenchmarkDotNet/pull/805 But it's still unmerged.

Probably using a source reference and a dll reference (and use 'extern alias' on them) would be enough, maybe I can create a PowerShell script that downloads the dll from nuget and runs the benchmark... not ideal however.

Thoughts?

Rbn3D avatar Sep 05 '18 18:09 Rbn3D

I took a stab at converting the four capitalization-transforms to Span-based versions, and setting up BenchmarkDotNet tests for them. Pretty good results so far, even with all tests using the string-input, string-output overloads so that they can run on both platforms.

https://github.com/Humanizr/Humanizer/compare/master...jtmueller:span-optimized

Apologies for the table width...

UPDATE: I improved the benchmarks and made them a little more realistic. The performance improvement isn't quite so dramatic now (around 50% in some cases) but the GC pauses and memory allocated is still a fair bit lower.

        Method |  Job | Runtime | StringLen |         Mean |         Error |        StdDev | Ratio | RatioSD | Gen 0/1k Op | Allocated Memory/Op |
-------------- |----- |-------- |---------- |-------------:|--------------:|--------------:|------:|--------:|------------:|--------------------:|
 AllTransforms |  Clr |     Clr |        10 |   2,734.6 ns |    35.3011 ns |    33.0207 ns |  1.00 |    0.00 |      0.6676 |              2104 B |
 AllTransforms | Core |    Core |        10 |   1,934.3 ns |    18.9671 ns |    17.7419 ns |  0.71 |    0.01 |      0.3071 |               968 B |
               |      |         |           |              |               |               |       |         |             |                     |
     LowerCase |  Clr |     Clr |        10 |     163.9 ns |     0.7806 ns |     0.6519 ns |  1.00 |    0.00 |      0.0355 |               112 B |
     LowerCase | Core |    Core |        10 |     152.8 ns |     0.7475 ns |     0.6626 ns |  0.93 |    0.01 |      0.0253 |                80 B |
               |      |         |           |              |               |               |       |         |             |                     |
     UpperCase |  Clr |     Clr |        10 |     164.9 ns |     0.8530 ns |     0.7979 ns |  1.00 |    0.00 |      0.0355 |               112 B |
     UpperCase | Core |    Core |        10 |     160.0 ns |     0.8488 ns |     0.7940 ns |  0.97 |    0.01 |      0.0253 |                80 B |
               |      |         |           |              |               |               |       |         |             |                     |
  SentenceCase |  Clr |     Clr |        10 |     188.7 ns |     1.2433 ns |     1.1022 ns |  1.00 |    0.00 |      0.0710 |               224 B |
  SentenceCase | Core |    Core |        10 |     126.4 ns |     1.5431 ns |     1.4435 ns |  0.67 |    0.01 |      0.0253 |                80 B |
               |      |         |           |              |               |               |       |         |             |                     |
     TitleCase |  Clr |     Clr |        10 |   2,537.8 ns |    12.4278 ns |    11.0170 ns |  1.00 |    0.00 |      0.6676 |              2104 B |
     TitleCase | Core |    Core |        10 |   1,868.6 ns |    13.0149 ns |    12.1742 ns |  0.74 |    0.01 |      0.2995 |               944 B |
               |      |         |           |              |               |               |       |         |             |                     |
 AllTransforms |  Clr |     Clr |       100 |  21,895.1 ns |   333.0608 ns |   311.5453 ns |  1.00 |    0.00 |      6.3477 |             20009 B |
 AllTransforms | Core |    Core |       100 |  16,309.9 ns |   139.0030 ns |   130.0235 ns |  0.75 |    0.01 |      2.2888 |              7264 B |
               |      |         |           |              |               |               |       |         |             |                     |
     LowerCase |  Clr |     Clr |       100 |     339.2 ns |     1.6029 ns |     1.4209 ns |  1.00 |    0.00 |      0.0939 |               296 B |
     LowerCase | Core |    Core |       100 |     265.0 ns |     3.6287 ns |     3.3942 ns |  0.78 |    0.01 |      0.0834 |               264 B |
               |      |         |           |              |               |               |       |         |             |                     |
     UpperCase |  Clr |     Clr |       100 |     331.2 ns |     3.4866 ns |     3.2614 ns |  1.00 |    0.00 |      0.0939 |               296 B |
     UpperCase | Core |    Core |       100 |     255.6 ns |     1.5230 ns |     1.4246 ns |  0.77 |    0.01 |      0.0834 |               264 B |
               |      |         |           |              |               |               |       |         |             |                     |
  SentenceCase |  Clr |     Clr |       100 |     227.9 ns |     1.0630 ns |     0.9943 ns |  1.00 |    0.00 |      0.1855 |               584 B |
  SentenceCase | Core |    Core |       100 |     145.5 ns |     2.8433 ns |     3.3848 ns |  0.64 |    0.02 |      0.0837 |               264 B |
               |      |         |           |              |               |               |       |         |             |                     |
     TitleCase |  Clr |     Clr |       100 |  24,930.2 ns |   466.1155 ns |   436.0047 ns |  1.00 |    0.00 |      8.7891 |             27712 B |
     TitleCase | Core |    Core |       100 |  16,922.7 ns |    59.8173 ns |    53.0265 ns |  0.68 |    0.01 |      2.2888 |              7240 B |
               |      |         |           |              |               |               |       |         |             |                     |
 AllTransforms |  Clr |     Clr |      1000 | 270,043.7 ns | 5,793.6248 ns | 5,419.3602 ns |  1.00 |    0.00 |    166.5039 |            524124 B |
 AllTransforms | Core |    Core |      1000 | 159,703.0 ns | 1,165.8451 ns | 1,090.5323 ns |  0.59 |    0.01 |     22.7051 |             71680 B |
               |      |         |           |              |               |               |       |         |             |                     |
     LowerCase |  Clr |     Clr |      1000 |   1,938.1 ns |     8.4199 ns |     7.4640 ns |  1.00 |    0.00 |      0.6638 |              2097 B |
     LowerCase | Core |    Core |      1000 |   1,409.0 ns |    11.3904 ns |    10.6546 ns |  0.73 |    0.01 |      0.6542 |              2064 B |
               |      |         |           |              |               |               |       |         |             |                     |
     UpperCase |  Clr |     Clr |      1000 |   2,004.2 ns |    15.4363 ns |    14.4392 ns |  1.00 |    0.00 |      0.6638 |              2097 B |
     UpperCase | Core |    Core |      1000 |   1,304.0 ns |    17.2735 ns |    16.1576 ns |  0.65 |    0.01 |      0.6542 |              2064 B |
               |      |         |           |              |               |               |       |         |             |                     |
  SentenceCase |  Clr |     Clr |      1000 |     593.4 ns |     2.2363 ns |     2.0918 ns |  1.00 |    0.00 |      1.3294 |              4186 B |
  SentenceCase | Core |    Core |      1000 |     389.2 ns |     7.5996 ns |     8.4470 ns |  0.66 |    0.01 |      0.6557 |              2064 B |
               |      |         |           |              |               |               |       |         |             |                     |
     TitleCase |  Clr |     Clr |      1000 | 363,197.3 ns | 2,177.6280 ns | 1,930.4105 ns |  1.00 |    0.00 |    332.5195 |           1046476 B |
     TitleCase | Core |    Core |      1000 | 166,807.6 ns | 1,263.8589 ns | 1,182.2144 ns |  0.46 |    0.00 |     22.7051 |             71656 B |

jtmueller avatar Nov 18 '18 10:11 jtmueller

I couldn't figure out how to get BenchmarkDotNet to run the netstandard2.0 version of Humanizer on the Core runtime, so it's currently using the CLR runtime to run the code without Span optimizations. It's not a perfect comparison of course since it's a different runtime, but it gives a general idea.

However, I think a decent chunk of the benefits seen so far can be had by taking a dependency on System.Memory, on any platform that supports that package. The main places .NET Core 2.1 has an advantage over System.Memory on other platforms are...

  • Parsing numbers etc from fragments of a string (Int32.Parse on other platforms doesn't take Span and requires a string allocation here).
  • If using a StringBuilder, it only accepts Span as input on Core 2.1 - other platforms require a string allocation just to add a fragment to a StringBuilder. This might be a good argument for using List<char> instead of StringBuilder for cases where the output size isn't known in advance.

jtmueller avatar Nov 18 '18 19:11 jtmueller

String.Humanize, on the other hand, gets roughly the same performance for short strings, slightly better for long strings, after conversion. Garbage collections and memory allocated are lower, though. Maybe there's more that can be done to optimize these methods.

                 Method |  Job | Runtime | StringLen |         Mean |        Error |       StdDev | Ratio | RatioSD | Gen 0/1k Op | Allocated Memory/Op |
----------------------- |----- |-------- |---------- |-------------:|-------------:|-------------:|------:|--------:|------------:|--------------------:|
               Humanize |  Clr |     Clr |        10 |     736.4 ns |     3.006 ns |     2.811 ns |  1.00 |    0.00 |      0.1192 |               376 B |
               Humanize | Core |    Core |        10 |     553.2 ns |     3.510 ns |     3.283 ns |  0.75 |    0.01 |      0.0448 |               144 B |
                        |      |         |           |              |              |              |       |         |             |                     |
    HumanizeToLowerCase |  Clr |     Clr |        10 |     917.4 ns |    10.331 ns |     8.627 ns |  1.00 |    0.00 |      0.1545 |               488 B |
    HumanizeToLowerCase | Core |    Core |        10 |     749.0 ns |     4.711 ns |     4.176 ns |  0.82 |    0.01 |      0.0553 |               176 B |
                        |      |         |           |              |              |              |       |         |             |                     |
    HumanizeToUpperCase |  Clr |     Clr |        10 |     927.3 ns |     6.082 ns |     5.689 ns |  1.00 |    0.00 |      0.1545 |               488 B |
    HumanizeToUpperCase | Core |    Core |        10 |     748.9 ns |     3.086 ns |     2.887 ns |  0.81 |    0.01 |      0.0553 |               176 B |
                        |      |         |           |              |              |              |       |         |             |                     |
 HumanizeToSentenceCase |  Clr |     Clr |        10 |     964.4 ns |     8.520 ns |     7.970 ns |  1.00 |    0.00 |      0.1888 |               600 B |
 HumanizeToSentenceCase | Core |    Core |        10 |     732.5 ns |     2.698 ns |     2.523 ns |  0.76 |    0.01 |      0.0553 |               176 B |
                        |      |         |           |              |              |              |       |         |             |                     |
    HumanizeToTitleCase |  Clr |     Clr |        10 |   2,222.1 ns |     9.162 ns |     8.571 ns |  1.00 |    0.00 |      0.4234 |              1344 B |
    HumanizeToTitleCase | Core |    Core |        10 |   1,709.4 ns |    31.030 ns |    27.508 ns |  0.77 |    0.01 |      0.1965 |               624 B |
                        |      |         |           |              |              |              |       |         |             |                     |
               Humanize |  Clr |     Clr |       100 |   6,326.0 ns |    38.432 ns |    35.949 ns |  1.00 |    0.00 |      0.6180 |              1968 B |
               Humanize | Core |    Core |       100 |   6,358.2 ns |   127.189 ns |   205.386 ns |  1.02 |    0.03 |      0.1526 |               504 B |
                        |      |         |           |              |              |              |       |         |             |                     |
    HumanizeToLowerCase |  Clr |     Clr |       100 |   6,757.7 ns |    48.783 ns |    45.632 ns |  1.00 |    0.00 |      0.7172 |              2264 B |
    HumanizeToLowerCase | Core |    Core |       100 |   6,539.5 ns |    93.779 ns |    73.217 ns |  0.97 |    0.01 |      0.1678 |               536 B |
                        |      |         |           |              |              |              |       |         |             |                     |
    HumanizeToUpperCase |  Clr |     Clr |       100 |   6,718.8 ns |    42.142 ns |    37.358 ns |  1.00 |    0.00 |      0.7172 |              2264 B |
    HumanizeToUpperCase | Core |    Core |       100 |   6,463.6 ns |    74.971 ns |    70.128 ns |  0.96 |    0.01 |      0.1678 |               536 B |
                        |      |         |           |              |              |              |       |         |             |                     |
 HumanizeToSentenceCase |  Clr |     Clr |       100 |   6,534.5 ns |    59.087 ns |    55.270 ns |  1.00 |    0.00 |      0.8087 |              2552 B |
 HumanizeToSentenceCase | Core |    Core |       100 |   6,475.8 ns |    47.803 ns |    44.715 ns |  0.99 |    0.01 |      0.1678 |               536 B |
                        |      |         |           |              |              |              |       |         |             |                     |
    HumanizeToTitleCase |  Clr |     Clr |       100 |  28,878.0 ns |    56.094 ns |    52.470 ns |  1.00 |    0.00 |      7.8125 |             24673 B |
    HumanizeToTitleCase | Core |    Core |       100 |  21,951.9 ns |   240.848 ns |   225.289 ns |  0.76 |    0.01 |      2.2278 |              7096 B |
                        |      |         |           |              |              |              |       |         |             |                     |
               Humanize |  Clr |     Clr |      1000 |  60,199.1 ns |   213.496 ns |   199.705 ns |  1.00 |    0.00 |      5.0049 |             15937 B |
               Humanize | Core |    Core |      1000 |  60,210.3 ns | 1,006.623 ns |   840.576 ns |  1.00 |    0.01 |      1.2207 |              4096 B |
                        |      |         |           |              |              |              |       |         |             |                     |
    HumanizeToLowerCase |  Clr |     Clr |      1000 |  61,820.1 ns |   353.120 ns |   330.309 ns |  1.00 |    0.00 |      5.6152 |             18034 B |
    HumanizeToLowerCase | Core |    Core |      1000 |  61,963.3 ns |   375.361 ns |   313.444 ns |  1.00 |    0.01 |      1.2207 |              4128 B |
                        |      |         |           |              |              |              |       |         |             |                     |
    HumanizeToUpperCase |  Clr |     Clr |      1000 |  61,766.2 ns |   516.212 ns |   457.609 ns |  1.00 |    0.00 |      5.6152 |             18034 B |
    HumanizeToUpperCase | Core |    Core |      1000 |  62,059.3 ns |   367.471 ns |   343.733 ns |  1.01 |    0.01 |      1.2207 |              4128 B |
                        |      |         |           |              |              |              |       |         |             |                     |
 HumanizeToSentenceCase |  Clr |     Clr |      1000 |  61,045.7 ns |   429.279 ns |   358.468 ns |  1.00 |    0.00 |      6.3477 |             20122 B |
 HumanizeToSentenceCase | Core |    Core |      1000 |  60,265.7 ns |   352.692 ns |   312.652 ns |  0.99 |    0.01 |      1.2817 |              4128 B |
                        |      |         |           |              |              |              |       |         |             |                     |
    HumanizeToTitleCase |  Clr |     Clr |      1000 | 378,527.0 ns | 9,590.224 ns | 9,418.882 ns |  1.00 |    0.00 |    291.5039 |            918825 B |
    HumanizeToTitleCase | Core |    Core |      1000 | 210,195.4 ns |   983.641 ns |   920.099 ns |  0.56 |    0.01 |     19.0430 |             60232 B |

jtmueller avatar Nov 19 '18 04:11 jtmueller

I added a handful more optimizations: https://github.com/Humanizr/Humanizer/compare/master...jtmueller:span-optimized

It's difficult to take this too much further, mainly because System.Text.RegularExpressions does not yet support Span<char> or ReadOnlySpan<char>. Humanizer is fairly heavy on regular expressions, and every time we hit one, we have to allocate a string. In some code paths this puts a serious crimp in the amount of allocation and copying we are able to avoid.

The Regex thing is not insoluble, but it is non-trivial. Options include...

  • Live with it.
  • Convince Microsoft to add ReadOnlySpan<char> overloads to the Regex class and wait for a future .NET Core release.
  • Add the overloads for them and submit a pull request.
  • Manually write code that does the equivalent of each regex in terms of ReadOnlySpan<char>.
  • Generate such code? I wonder if you could use Regex.CompileToAssembly to generate an assembly with a class for each Regex, then decompile that assembly into C#, then modify the C# to use ReadOnlySpan<char>?
    • UPDATE: All that stuff is possible, except for modifying the C# to use Span - the generated code simply inherits from Regex, so you're back to needing a string before you can call any methods.

Tracking issue for Span support in Regex Experimental PR that adds such support in a private branch

jtmueller avatar Nov 19 '18 06:11 jtmueller

If it helps, I'm happy targeting .NET Standard 2.0+ That should resolve Span issues? It can multi-target netcoreapp2.1 if there's benefit to doing so.

clairernovotny avatar Nov 22 '18 14:11 clairernovotny

The span stuff doesn't appear in .NET Standard until 2.1, which has yet to be officially released. Targeting that will definitely be preferable to targeting netcoreapp2.1 when it becomes possible, but it doesn't really change anything else. Keep in mind that .NET Framework will not support .NET Standard 2.1, so there would still be multi-targeting.

If my numbers are correct, Span gives us a 50% performance boost in some cases, particularly with long strings, and reduces memory allocations and GC pauses. It could be even better, but the fact that most strings in Humanizer end up passing through a Regex sooner or later, and the fact that Regex doesn't support Span, limits the optimizations that can be done.

Short of Span support showing up in Regex in some future version of .NET Core, we'd have to rewrite parts of Humanizer to not use Regex at all in order to fully realize the potential improvements.

jtmueller avatar Nov 26 '18 19:11 jtmueller

FYI, Regex supports Span since .NET 7

https://learn.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.enumeratematches?view=net-7.0

I have no idea how to deal with multi targets though.

tiakun avatar Jul 06 '23 10:07 tiakun

some changes that should facilitate this

  • netstandard1.0 dropped
  • Polyfill added to unlock a bunch of new c# syntactical sugar and add some missing apis https://github.com/SimonCropp/Polyfill. let me know if any more polyfills are required
  • System.ValueTuple added https://www.tutorialsteacher.com/csharp/valuetuple
  • System.Collections.Immutable added so we can use https://steven-giesel.com/blogPost/34e0fd95-0b3f-40f2-ba2a-36d1d4eb5601
  • System.Memory added so we can us spans and memory features
  • a Benchmarks project, that leverages BenchmarkDotNet, has been added. tested with these enum improvements https://github.com/Humanizr/Humanizer/pull/1409

so i think this unlocks the usage of span to improve perf.

any objections to closing this and letting the pull requests flow?

SimonCropp avatar Feb 18 '24 11:02 SimonCropp