commons-lang
commons-lang copied to clipboard
[LANG-1576] refine StringUtils.chomp
as title.
Coverage increased (+0.0003%) to 94.958% when pulling 46350d4300c33f1821ea7d7cc561e4e348233736 on xenoamess-fork:refine_chomp into 8d35d66ba9fe71e9f7bc853425e4bce13c961283 on apache:master.
If I understand the benchmark correctly, the New method is approx 3% slower for Test1 and about 2% faster for Test2. Does not seem like a huge gain.
Does not seem like a huge gain.
Yes, it isn't.
The JMH results are very long and obscure the thread of the PR.
Is it possible to post just the summary inline, with a link to the full results in an attachment? That would make it much easier to follow the PR.
The JMH results are very long and obscure the thread of the PR.
Is it possible to post just the summary inline, with a link to the full results in an attachment? That would make it much easier to follow the PR.
@sebbASF Yes, you are right. I will try ubuntu pastebin.
As a result: I re-refine the codes and redone the jmh test. Sorry for the delay because I was quite busy yesterday, and the test takes too long time(several hours on my pc.) full jmh result at: https://pastebin.ubuntu.com/p/wbsqtzrkyC/ In short:
Benchmark Mode Cnt Score Error Units
StringUtilsChompTest.test1New avgt 25 2.368 ? 0.011 ns/op
StringUtilsChompTest.test1Old avgt 25 2.434 ? 0.102 ns/op
StringUtilsChompTest.test2New avgt 25 2.364 ? 0.010 ns/op
StringUtilsChompTest.test2Old avgt 25 2.368 ? 0.013 ns/op
StringUtilsChompTest.test3New avgt 25 2.355 ? 0.007 ns/op
StringUtilsChompTest.test3Old avgt 25 2.556 ? 0.035 ns/op
StringUtilsChompTest.test4New avgt 25 6.352 ? 0.149 ns/op
StringUtilsChompTest.test4Old avgt 25 6.494 ? 0.363 ns/op
StringUtilsChompTest.test5New avgt 25 6.272 ? 0.086 ns/op
StringUtilsChompTest.test5Old avgt 25 6.248 ? 0.035 ns/op
StringUtilsChompTest.test6New avgt 25 6.304 ? 0.079 ns/op
StringUtilsChompTest.test6Old avgt 25 6.298 ? 0.061 ns/op
StringUtilsChompTest.testStringsNew avgt 25 72976.756 ?304.822 ns/op
StringUtilsChompTest.testStringsOld avgt 25 78290.023 ?657.792 ns/op
The test shows it is almost even speed when ends with \r or \n, but 6.7% faster when ends with no \r nor \n,which I think is the most cases we use the function. If I still done something stupidly wrong just tell me, will try to fix it when have any time. Thanks.
performance tests refined. thanks for help from @aherbert full test result at : https://pastebin.ubuntu.com/p/mC3wTgsCKT/ in short,
Benchmark Mode Cnt Score Error Units
StringUtilsChompTest.test10_Random_Strings_New avgt 5 67574.592 ? 9156.972 ns/op
StringUtilsChompTest.test10_Random_Strings_Old avgt 5 76213.928 ?12127.719 ns/op
StringUtilsChompTest.test1_Empty_New avgt 5 2.375 ? 0.087 ns/op
StringUtilsChompTest.test1_Empty_Old avgt 5 2.387 ? 0.085 ns/op
StringUtilsChompTest.test2_No_Chomp_New avgt 5 2.370 ? 0.120 ns/op
StringUtilsChompTest.test2_No_Chomp_Old avgt 5 2.386 ? 0.104 ns/op
StringUtilsChompTest.test3_No_Chomp_New avgt 5 2.419 ? 0.096 ns/op
StringUtilsChompTest.test3_No_Chomp_Old avgt 5 2.377 ? 0.162 ns/op
StringUtilsChompTest.test4_R_New avgt 5 2.402 ? 0.106 ns/op
StringUtilsChompTest.test4_R_Old avgt 5 2.361 ? 0.023 ns/op
StringUtilsChompTest.test5_N_New avgt 5 2.357 ? 0.005 ns/op
StringUtilsChompTest.test5_N_Old avgt 5 2.397 ? 0.167 ns/op
StringUtilsChompTest.test6_R_N_New avgt 5 6.013 ? 0.245 ns/op
StringUtilsChompTest.test6_R_N_Old avgt 5 6.133 ? 0.788 ns/op
StringUtilsChompTest.test7_a_N_New avgt 5 6.407 ? 0.343 ns/op
StringUtilsChompTest.test7_a_N_Old avgt 5 6.717 ? 1.407 ns/op
StringUtilsChompTest.test8_a_N_New avgt 5 6.678 ? 0.752 ns/op
StringUtilsChompTest.test8_a_N_Old avgt 5 6.580 ? 0.208 ns/op
StringUtilsChompTest.test9_a_R_N_New avgt 5 6.498 ? 0.672 ns/op
StringUtilsChompTest.test9_a_R_N_Old avgt 5 6.770 ? 2.015 ns/op
@aherbert Hi. I changed the jmh codes according to your suggestions, and re-run performance test. full test at: https://pastebin.ubuntu.com/p/cqNKS35zSP/ in short:
Benchmark (data) (name) Mode Cnt Score Error Units
StringUtilsChompTest.singleString NULL old avgt 5 2.737 ? 0.166 ns/op
StringUtilsChompTest.singleString NULL new avgt 5 2.789 ? 0.213 ns/op
StringUtilsChompTest.singleString CHAR0 old avgt 5 3.227 ? 0.459 ns/op
StringUtilsChompTest.singleString CHAR0 new avgt 5 3.168 ? 0.233 ns/op
StringUtilsChompTest.singleString CHAR0_CR old avgt 5 3.706 ? 0.035 ns/op
StringUtilsChompTest.singleString CHAR0_CR new avgt 5 3.674 ? 0.480 ns/op
StringUtilsChompTest.singleString CHAR0_LF old avgt 5 3.656 ? 0.410 ns/op
StringUtilsChompTest.singleString CHAR0_LF new avgt 5 3.682 ? 0.525 ns/op
StringUtilsChompTest.singleString CHAR0_CR_LF old avgt 5 9.538 ? 1.558 ns/op
StringUtilsChompTest.singleString CHAR0_CR_LF new avgt 5 9.359 ? 0.499 ns/op
StringUtilsChompTest.singleString CHAR1 old avgt 5 3.880 ? 0.531 ns/op
StringUtilsChompTest.singleString CHAR1 new avgt 5 3.803 ? 0.451 ns/op
StringUtilsChompTest.singleString CHAR1_CR old avgt 5 17.378 ? 2.431 ns/op
StringUtilsChompTest.singleString CHAR1_CR new avgt 5 21.752 ? 23.793 ns/op
StringUtilsChompTest.singleString CHAR1_LF old avgt 5 19.025 ? 4.690 ns/op
StringUtilsChompTest.singleString CHAR1_LF new avgt 5 18.087 ? 2.450 ns/op
StringUtilsChompTest.singleString CHAR1_CR_LF old avgt 5 17.940 ? 1.500 ns/op
StringUtilsChompTest.singleString CHAR1_CR_LF new avgt 5 18.130 ? 1.195 ns/op
StringUtilsChompTest.singleString CHAR2 old avgt 5 4.216 ? 1.192 ns/op
StringUtilsChompTest.singleString CHAR2 new avgt 5 4.341 ? 0.310 ns/op
StringUtilsChompTest.singleString CHAR2_CR old avgt 5 17.308 ? 2.704 ns/op
StringUtilsChompTest.singleString CHAR2_CR new avgt 5 18.087 ? 2.312 ns/op
StringUtilsChompTest.singleString CHAR2_LF old avgt 5 18.527 ? 2.699 ns/op
StringUtilsChompTest.singleString CHAR2_LF new avgt 5 18.200 ? 1.919 ns/op
StringUtilsChompTest.singleString CHAR2_CR_LF old avgt 5 17.692 ? 0.991 ns/op
StringUtilsChompTest.singleString CHAR2_CR_LF new avgt 5 18.285 ? 2.505 ns/op
StringUtilsChompTest.test_Random_Strings_New N/A N/A avgt 5 66999.815 ?10053.205 ns/op
StringUtilsChompTest.test_Random_Strings_Old N/A N/A avgt 5 76290.287 ?15986.436 ns/op
Hi. I re-changed some codes and run a performance test. In short,
Benchmark (data) (name) Mode Cnt Score Error Units
StringUtilsChompTest.singleString NULL old avgt 5 2.751 ? 0.091 ns/op
StringUtilsChompTest.singleString NULL new avgt 5 2.767 ? 0.182 ns/op
StringUtilsChompTest.singleString CHAR0 old avgt 5 3.687 ? 2.040 ns/op
StringUtilsChompTest.singleString CHAR0 new avgt 5 3.221 ? 0.070 ns/op
StringUtilsChompTest.singleString CHAR0_CR old avgt 5 3.757 ? 0.249 ns/op
StringUtilsChompTest.singleString CHAR0_CR new avgt 5 3.718 ? 0.171 ns/op
StringUtilsChompTest.singleString CHAR0_LF old avgt 5 3.780 ? 0.362 ns/op
StringUtilsChompTest.singleString CHAR0_LF new avgt 5 3.829 ? 0.456 ns/op
StringUtilsChompTest.singleString CHAR0_CR_LF old avgt 5 9.466 ? 0.550 ns/op
StringUtilsChompTest.singleString CHAR0_CR_LF new avgt 5 9.506 ? 0.441 ns/op
StringUtilsChompTest.singleString CHAR1 old avgt 5 4.035 ? 1.104 ns/op
StringUtilsChompTest.singleString CHAR1 new avgt 5 3.780 ? 0.239 ns/op
StringUtilsChompTest.singleString CHAR1_CR old avgt 5 18.910 ? 2.074 ns/op
StringUtilsChompTest.singleString CHAR1_CR new avgt 5 19.492 ? 1.942 ns/op
StringUtilsChompTest.singleString CHAR1_LF old avgt 5 19.429 ? 1.860 ns/op
StringUtilsChompTest.singleString CHAR1_LF new avgt 5 22.080 ? 6.952 ns/op
StringUtilsChompTest.singleString CHAR1_CR_LF old avgt 5 22.026 ? 2.968 ns/op
StringUtilsChompTest.singleString CHAR1_CR_LF new avgt 5 18.868 ? 1.075 ns/op
StringUtilsChompTest.singleString CHAR2 old avgt 5 4.068 ? 0.168 ns/op
StringUtilsChompTest.singleString CHAR2 new avgt 5 3.896 ? 0.361 ns/op
StringUtilsChompTest.singleString CHAR2_CR old avgt 5 17.723 ? 0.809 ns/op
StringUtilsChompTest.singleString CHAR2_CR new avgt 5 19.235 ? 0.502 ns/op
StringUtilsChompTest.singleString CHAR2_LF old avgt 5 21.363 ? 1.176 ns/op
StringUtilsChompTest.singleString CHAR2_LF new avgt 5 21.182 ? 1.029 ns/op
StringUtilsChompTest.singleString CHAR2_CR_LF old avgt 5 21.532 ? 1.032 ns/op
StringUtilsChompTest.singleString CHAR2_CR_LF new avgt 5 21.420 ? 2.569 ns/op
StringUtilsChompTest.test_Random_Strings_New N/A N/A avgt 5 70565.068 ?2096.854 ns/op
StringUtilsChompTest.test_Random_Strings_Old N/A N/A avgt 5 77950.673 ?2115.093 ns/op
detailed report at https://pastebin.ubuntu.com/p/bTNkPGpSpV/
@sebbASF yes, the performance test shows the original condition struction is not bad as I thought at beginning, and revert to original condition struction will not harm performance. so according to the "minimal change" principle, I changed the codes to looks more likely to original.
Thanks - much easier to see what has been changed now. i.e. cache the string length, and don't use substring unless it is needed.
I don't think the RandomStrings test is a fair benchmark. The behaviour of the method depends only on the last one or two characters (and the length). AFAICT the strings are not random and don't have a mix of line-endings. So I think the test only measures the efficiency of String.substring where nothing needs to be dropped.
@sebbASF
I don't think the RandomStrings test is a fair benchmark.
Actually it is fair. See the string array generation function of that test.
Sorry, I see now that the strings do have a mix of CR and LF endings (or neither). However, the strings are all of length 2. The strings are not representative of the likely use cases.
As to the change itself, it looks fine, but IMO the benchmark needs some work.
@sebbASF Hi.
Sorry, I see now that the strings do have a mix of CR and LF endings (or neither). However, the strings are all of length 2. The strings are not representative of the likely use cases.
The behaviour of the method depends only on the last one or two characters (and the length). And I don't think add more chars will help. If we increase the length to some normal length of our usecases (for example 6) it will be a really large test.
I think the current huge list of input strings is more about testing functionality rather than performance.
The method only cares about 3 characters: CR, LF or something else (unless it has a bug). So the performance test needs to check those in various combinations. I think that is just 9 combinations.
It also needs to test the length, because that is used when doing the substring. The method is likely to be used with textual input so it would make sense to try with a selection of lengths. Not sure what the maximum should be, probably at least 1000, maybe considerably more. It might make sense to do these as separate tests to see if the length affects the performance.
@sebbASF
I think the current huge list of input strings is more about testing functionality rather than performance.
The method only cares about 3 characters: CR, LF or something else (unless it has a bug). So the performance test needs to check those in various combinations. I think that is just 9 combinations.
Not really. More cases are:
- null string.
- string of length 0 ("")
- string of length 1 who ends with '\r'
- string of length 1 who ends with '\n'
- string of length 1 who ends with normal char
While 3,4,5 be actually handled by a same if, I'd prefer hanle them seperately.
So do 1 and 2.
It also needs to test the length, because that is used when doing the substring. The method is likely to be used with textual input so it would make sense to try with a selection of lengths. Not sure what the maximum should be, probably at least 1000, maybe considerably more. It might make sense to do these as separate tests to see if the length affects the performance.
So you mean we should add test for some specific length of strings? For example, "a"*1024, "a"*10240" and "a"*102400"? Fine, then I will add/rerun it if you need this data, at later today.
You are correct - I was forgetting that the characters could be missing.
As to testing the length, yes I think we do need to test for various lengths. Longer substrings may take longer to create; if that is true it should show that the new code is better.
Hi. I added some more lengths of Strings to the benchmark, as suggested by @sebbASF . Thanks. full benchmark at https://pastebin.ubuntu.com/p/yPH4xnKqZd/ In short,
Benchmark (data) (name) Mode Cnt Score Error Units
StringUtilsChompTest.singleString NULL old avgt 5 3.246 ? 0.091 ns/op
StringUtilsChompTest.singleString NULL new avgt 5 3.265 ? 0.153 ns/op
StringUtilsChompTest.singleString CHAR0 old avgt 5 3.901 ? 0.251 ns/op
StringUtilsChompTest.singleString CHAR0 new avgt 5 3.863 ? 0.769 ns/op
StringUtilsChompTest.singleString CHAR0_CR old avgt 5 4.393 ? 0.519 ns/op
StringUtilsChompTest.singleString CHAR0_CR new avgt 5 4.335 ? 0.393 ns/op
StringUtilsChompTest.singleString CHAR0_LF old avgt 5 4.279 ? 0.295 ns/op
StringUtilsChompTest.singleString CHAR0_LF new avgt 5 4.315 ? 0.230 ns/op
StringUtilsChompTest.singleString CHAR0_CR_LF old avgt 5 10.768 ? 0.643 ns/op
StringUtilsChompTest.singleString CHAR0_CR_LF new avgt 5 9.901 ? 0.136 ns/op
StringUtilsChompTest.singleString CHAR1 old avgt 5 4.358 ? 0.257 ns/op
StringUtilsChompTest.singleString CHAR1 new avgt 5 3.989 ? 0.117 ns/op
StringUtilsChompTest.singleString CHAR1_CR old avgt 5 19.181 ? 0.715 ns/op
StringUtilsChompTest.singleString CHAR1_CR new avgt 5 19.502 ? 1.590 ns/op
StringUtilsChompTest.singleString CHAR1_LF old avgt 5 20.168 ? 1.003 ns/op
StringUtilsChompTest.singleString CHAR1_LF new avgt 5 19.578 ? 0.575 ns/op
StringUtilsChompTest.singleString CHAR1_CR_LF old avgt 5 19.984 ? 1.332 ns/op
StringUtilsChompTest.singleString CHAR1_CR_LF new avgt 5 19.834 ? 1.017 ns/op
StringUtilsChompTest.singleString CHAR2 old avgt 5 4.408 ? 0.098 ns/op
StringUtilsChompTest.singleString CHAR2 new avgt 5 4.033 ? 0.231 ns/op
StringUtilsChompTest.singleString CHAR2_CR old avgt 5 19.056 ? 1.116 ns/op
StringUtilsChompTest.singleString CHAR2_CR new avgt 5 18.928 ? 0.903 ns/op
StringUtilsChompTest.singleString CHAR2_LF old avgt 5 21.658 ? 2.624 ns/op
StringUtilsChompTest.singleString CHAR2_LF new avgt 5 20.125 ? 1.185 ns/op
StringUtilsChompTest.singleString CHAR2_CR_LF old avgt 5 19.865 ? 0.865 ns/op
StringUtilsChompTest.singleString CHAR2_CR_LF new avgt 5 20.033 ? 1.695 ns/op
StringUtilsChompTest.singleString CHAR1024 old avgt 5 4.469 ? 0.186 ns/op
StringUtilsChompTest.singleString CHAR1024 new avgt 5 4.051 ? 0.146 ns/op
StringUtilsChompTest.singleString CHAR1024_CR old avgt 5 102.333 ? 1.823 ns/op
StringUtilsChompTest.singleString CHAR1024_CR new avgt 5 102.237 ? 0.913 ns/op
StringUtilsChompTest.singleString CHAR1024_LF old avgt 5 104.155 ? 2.604 ns/op
StringUtilsChompTest.singleString CHAR1024_LF new avgt 5 103.365 ? 2.392 ns/op
StringUtilsChompTest.singleString CHAR1024_CR_LF old avgt 5 103.665 ? 4.241 ns/op
StringUtilsChompTest.singleString CHAR1024_CR_LF new avgt 5 103.453 ? 1.482 ns/op
StringUtilsChompTest.singleString CHAR10240 old avgt 5 4.772 ? 1.314 ns/op
StringUtilsChompTest.singleString CHAR10240 new avgt 5 4.064 ? 0.085 ns/op
StringUtilsChompTest.singleString CHAR10240_CR old avgt 5 981.277 ? 15.790 ns/op
StringUtilsChompTest.singleString CHAR10240_CR new avgt 5 985.230 ? 13.679 ns/op
StringUtilsChompTest.singleString CHAR10240_LF old avgt 5 983.508 ? 15.570 ns/op
StringUtilsChompTest.singleString CHAR10240_LF new avgt 5 1000.211 ? 48.633 ns/op
StringUtilsChompTest.singleString CHAR10240_CR_LF old avgt 5 990.806 ? 7.380 ns/op
StringUtilsChompTest.singleString CHAR10240_CR_LF new avgt 5 990.001 ? 9.229 ns/op
StringUtilsChompTest.singleString CHAR102400 old avgt 5 4.474 ? 0.142 ns/op
StringUtilsChompTest.singleString CHAR102400 new avgt 5 4.154 ? 0.325 ns/op
StringUtilsChompTest.singleString CHAR102400_CR old avgt 5 10077.261 ? 616.639 ns/op
StringUtilsChompTest.singleString CHAR102400_CR new avgt 5 10002.896 ? 85.774 ns/op
StringUtilsChompTest.singleString CHAR102400_LF old avgt 5 10081.275 ? 64.053 ns/op
StringUtilsChompTest.singleString CHAR102400_LF new avgt 5 10085.129 ? 43.548 ns/op
StringUtilsChompTest.singleString CHAR102400_CR_LF old avgt 5 10044.759 ? 68.778 ns/op
StringUtilsChompTest.singleString CHAR102400_CR_LF new avgt 5 10097.769 ? 101.311 ns/op
StringUtilsChompTest.test_Random_Strings_New N/A N/A avgt 5 76270.389 ?1885.347 ns/op
StringUtilsChompTest.test_Random_Strings_Old N/A N/A avgt 5 82757.374 ?9869.516 ns/op
@garydgregory rebased. please find some time to review. thanks.