swift icon indicating copy to clipboard operation
swift copied to clipboard

[stdlib] performance optimizations in `Array.replaceSubrange`

Open oxy opened this issue 2 years ago • 15 comments
trafficstars

This PR is intended to generally improve performance for replaceSubrange.

Currently, it simplifies the main codepath to avoid unneeded branches. Additional plans:

  • [ ] write benchmarks for replaceSubrange in a variety of contexts (replacing 1 element, 10%, 50%, and 100%, and growing/shrinking)
  • [ ] add an additional branch for replaceSubrange that doesn't modify an existing Array in place and instead constructs a new Array with copy/move (for non-unique / growth cases)

oxy avatar May 26 '23 00:05 oxy

@swift-ci please test

oxy avatar May 26 '23 00:05 oxy

@swift-ci please test

oxy avatar May 26 '23 02:05 oxy

@swift-ci please benchmark

Significant changes with the original commit:

Performance (x86_64): -O

Regression OLD NEW DELTA RATIO
ArrayAppendToGeneric 293.333 500.896 +70.8% 0.59x (?)
ArrayAppendSequence 410.0 662.0 +61.5% 0.62x (?)
Dictionary4 156.0 192.5 +23.4% 0.81x (?)
UTF8Decode_InitFromCustom_contiguous 129.0 157.182 +21.8% 0.82x (?)
UTF8Decode_InitDecoding 129.077 157.167 +21.8% 0.82x (?)
ArrayAppendGenericStructs 1462.5 1770.0 +21.0% 0.83x (?)
Dictionary4OfObjects 183.5 217.857 +18.7% 0.84x
UTF8Decode_InitFromCustom_noncontiguous 250.5 279.375 +11.5% 0.90x (?)
FindString.Loop1.Substring 277.875 307.143 +10.5% 0.90x (?)
 
Improvement OLD NEW DELTA RATIO
NaiveRRC.append.smallContiguousRepeated 176.0 98.563 -44.0% 1.79x
FlattenListFlatMap 4324.0 2842.0 -34.3% 1.52x (?)
Array.removeAll.keepingCapacity.Object 6.63 5.224 -21.2% 1.27x (?)
RangeAssignment 155.643 134.688 -13.5% 1.16x (?)
Set.isDisjoint.Int.Empty 51.2 45.743 -10.7% 1.12x (?)
FlattenListLoop 1625.0 1478.0 -9.0% 1.10x (?)
Set.subtracting.Empty.Box 21.688 19.823 -8.6% 1.09x (?)
PrefixWhileSequence 181.4 168.75 -7.0% 1.07x (?)
PrefixWhileAnySequence 181.222 169.0 -6.7% 1.07x (?)
Set.isDisjoint.Seq.Int.Empty 53.125 49.6 -6.6% 1.07x (?)
NSStringConversion.Rebridge.LongUTF8 31.2 29.156 -6.6% 1.07x (?)

Code size: -O

Regression OLD NEW DELTA RATIO
NaiveRangeReplaceableCollectionConformance.o 11824 13644 +15.4% 0.87x
 
Improvement OLD NEW DELTA RATIO
RangeAssignment.o 3302 2751 -16.7% 1.20x
ArrayRemoveAll.o 7880 7594 -3.6% 1.04x
IndexPathTest.o 9941 9685 -2.6% 1.03x
RemoveWhere.o 14692 14390 -2.1% 1.02x
PopFrontGeneric.o 2470 2422 -1.9% 1.02x
MirrorTest.o 11588 11428 -1.4% 1.01x
s

Performance (x86_64): -Osize

Regression OLD NEW DELTA RATIO
UTF8Decode_InitFromCustom_contiguous 126.833 157.429 +24.1% 0.81x (?)
UTF8Decode_InitDecoding 127.615 156.917 +23.0% 0.81x (?)
UTF8Decode_InitFromCustom_noncontiguous 287.571 315.333 +9.7% 0.91x (?)
 
Improvement OLD NEW DELTA RATIO
StringWithCString2 0.002 0.0 -66.7% 3.00x
NaiveRRC.append.smallContiguousRepeated 176.0 94.0 -46.6% 1.87x
RemoveWhereSwapInts 15.354 11.52 -25.0% 1.33x (?)
Array.removeAll.keepingCapacity.Object 6.88 5.173 -24.8% 1.33x (?)

Code size: -Osize

Improvement OLD NEW DELTA RATIO
RangeAssignment.o 2998 2658 -11.3% 1.13x
NaiveRangeReplaceableCollectionConformance.o 11616 11058 -4.8% 1.05x
ArrayRemoveAll.o 7350 7019 -4.5% 1.05x
IndexPathTest.o 7305 7008 -4.1% 1.04x
RemoveWhere.o 12449 12106 -2.8% 1.03x
PopFrontGeneric.o 2427 2381 -1.9% 1.02x
MirrorTest.o 11460 11300 -1.4% 1.01x

Performance (x86_64): -Onone

Regression OLD NEW DELTA RATIO
UTF8Decode_InitDecoding 135.75 167.333 +23.3% 0.81x (?)
UTF8Decode_InitFromCustom_contiguous 136.083 166.923 +22.7% 0.82x (?)
NSStringConversion.InlineBuffer.ASCII 5282.0 6144.0 +16.3% 0.86x (?)
NSStringConversion.InlineBuffer.UTF8 3170.0 3605.0 +13.7% 0.88x (?)
ArrayOfGenericPOD2 1050.0 1145.0 +9.0% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
DataCreateMedium 159400.0 138800.0 -12.9% 1.15x (?)
CharacterLiteralsLarge 447.25 397.2 -11.2% 1.13x (?)
CxxStringConversion.cxx.to.swift 162.333 144.667 -10.9% 1.12x (?)
PopFrontArrayGeneric 3160.0 2890.0 -8.5% 1.09x (?)
BinaryFloatingPointPropertiesBinade 55.0 51.2 -6.9% 1.07x (?)
Calculator 930.0 867.0 -6.8% 1.07x (?)

Code size: -swiftlibs

oxy avatar Jun 05 '23 18:06 oxy

@swift-ci please benchmark

oxy avatar Jun 05 '23 20:06 oxy

Adding the extra branches to skip .deinitialize / .moveInitialize helped with some of the regressions:

Performance (x86_64): -O

Regression OLD NEW DELTA RATIO
StringWithCString2 0.0 0.002 +200.0% 0.33x (?)
InsertCharacterEndIndex 90.455 101.118 +11.8% 0.89x (?)
InsertCharacterEndIndexNonASCII 28.406 31.613 +11.3% 0.90x (?)
InsertCharacterTowardsEndIndex 103.438 113.267 +9.5% 0.91x (?)
FindString.Loop1.Substring 277.8 303.143 +9.1% 0.92x (?)
FlattenListLoop 1205.0 1313.0 +9.0% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
NaiveRRC.append.largeContiguous 0.124 0.0 -99.2% 125.00x (?)
NaiveRRC.append.smallContiguousRepeated 177.9 98.588 -44.6% 1.80x
LessSubstringSubstringGenericComparable 29.724 22.364 -24.8% 1.33x (?)
LessSubstringSubstring 29.625 22.318 -24.7% 1.33x (?)
EqualSubstringSubstring 30.6 23.077 -24.6% 1.33x (?)
EqualSubstringString 30.16 22.864 -24.2% 1.32x
EqualStringSubstring 30.375 23.179 -23.7% 1.31x (?)
EqualSubstringSubstringGenericEquatable 29.96 22.97 -23.3% 1.30x (?)
UTF8Decode_InitFromData 167.583 138.455 -17.4% 1.21x (?)
UTF8Decode_InitFromBytes 170.889 143.0 -16.3% 1.20x (?)
StringComparison_longSharedPrefix 246.3 209.545 -14.9% 1.18x (?)
NormalizedIterator_fastPrenormal 553.023 486.531 -12.0% 1.14x (?)
Breadcrumbs.UTF16ToIdx.longASCII 43.581 39.352 -9.7% 1.11x (?)
SortStringsUnicode 2390.0 2165.0 -9.4% 1.10x (?)
NormalizedIterator_latin1 182.1 167.636 -7.9% 1.09x (?)
Breadcrumbs.MutatedUTF16ToIdx.Mixed 210.545 195.0 -7.4% 1.08x (?)
Breadcrumbs.MutatedIdxToUTF16.Mixed 217.875 201.875 -7.3% 1.08x (?)
SubstringEqualString 181.0 168.2 -7.1% 1.08x (?)
RangeAssignment 155.5 145.0 -6.8% 1.07x (?)

Code size: -O

Regression OLD NEW DELTA RATIO
StringSplitting.o 36094 37118 +2.8% 0.97x
 
Improvement OLD NEW DELTA RATIO
RangeAssignment.o 3302 2767 -16.2% 1.19x
ArrayRemoveAll.o 7880 7611 -3.4% 1.04x
NaiveRangeReplaceableCollectionConformance.o 11824 11472 -3.0% 1.03x
IndexPathTest.o 9941 9733 -2.1% 1.02x
PopFrontGeneric.o 2470 2422 -1.9% 1.02x
RemoveWhere.o 14692 14407 -1.9% 1.02x
MirrorTest.o 11588 11444 -1.2% 1.01x

Performance (x86_64): -Osize

Regression OLD NEW DELTA RATIO
StringBuilderLong 778.333 916.842 +17.8% 0.85x (?)
DropLastCountableRangeLazy 4.992 5.802 +16.2% 0.86x (?)
InsertCharacterEndIndex 90.105 101.278 +12.4% 0.89x (?)
InsertCharacterEndIndexNonASCII 28.469 31.0 +8.9% 0.92x (?)
String.replaceSubrange.String 10.013 10.887 +8.7% 0.92x (?)
String.replaceSubrange.ArrChar.Small 35.781 38.654 +8.0% 0.93x (?)
InsertCharacterTowardsEndIndex 118.714 128.071 +7.9% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
NaiveRRC.append.smallContiguousRepeated 172.9 93.923 -45.7% 1.84x
EqualSubstringString 29.844 22.422 -24.9% 1.33x (?)
EqualSubstringSubstringGenericEquatable 29.923 22.576 -24.6% 1.33x (?)
EqualSubstringSubstring 29.844 22.576 -24.4% 1.32x (?)
EqualStringSubstring 29.826 22.633 -24.1% 1.32x
LessSubstringSubstring 30.387 23.088 -24.0% 1.32x
LessSubstringSubstringGenericComparable 30.378 23.186 -23.7% 1.31x
Array.removeAll.keepingCapacity.Object 6.875 5.432 -21.0% 1.27x (?)
UTF8Decode_InitFromData 167.5 138.385 -17.4% 1.21x (?)
UTF8Decode_InitFromBytes 175.125 146.8 -16.2% 1.19x
StringComparison_longSharedPrefix 246.9 208.545 -15.5% 1.18x (?)
Breadcrumbs.UTF16ToIdx.longASCII 44.673 39.297 -12.0% 1.14x (?)
SubstringEqualString 183.778 164.8 -10.3% 1.12x (?)
StringFromLongWholeSubstringGeneric 6.007 5.477 -8.8% 1.10x (?)
StringComparison_latin1 336.833 312.0 -7.4% 1.08x (?)
Breadcrumbs.MutatedIdxToUTF16.Mixed 217.625 202.0 -7.2% 1.08x (?)
Breadcrumbs.MutatedUTF16ToIdx.Mixed 210.636 195.636 -7.1% 1.08x (?)
SubstringEquatable 316.286 295.5 -6.6% 1.07x (?)

Code size: -Osize

Regression OLD NEW DELTA RATIO
StringSplitting.o 36281 37222 +2.6% 0.97x
 
Improvement OLD NEW DELTA RATIO
RangeAssignment.o 2998 2690 -10.3% 1.11x
ArrayRemoveAll.o 7350 7040 -4.2% 1.04x
IndexPathTest.o 7305 7016 -4.0% 1.04x
NaiveRangeReplaceableCollectionConformance.o 11616 11273 -3.0% 1.03x
RemoveWhere.o 12449 12142 -2.5% 1.03x
PopFrontGeneric.o 2427 2381 -1.9% 1.02x
MirrorTest.o 11460 11316 -1.3% 1.01x

Performance (x86_64): -Onone

Regression OLD NEW DELTA RATIO
Array.removeAll.keepingCapacity.Object 5.61 6.798 +21.2% 0.83x (?)
String.replaceSubrange.String 11.102 12.425 +11.9% 0.89x (?)
InsertCharacterTowardsEndIndex 131.462 144.5 +9.9% 0.91x (?)
ArrayAppendLatin1Substring 21984.0 24036.0 +9.3% 0.91x (?)
ArrayAppendAsciiSubstring 21780.0 23760.0 +9.1% 0.92x (?)
ArrayAppendUTF16Substring 21792.0 23748.0 +9.0% 0.92x (?)
InsertCharacterEndIndex 135.5 147.077 +8.5% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
PopFrontArrayGeneric 3158.0 2433.846 -22.9% 1.30x (?)
LessSubstringSubstringGenericComparable 33.0 25.962 -21.3% 1.27x (?)
EqualSubstringSubstringGenericEquatable 32.96 25.968 -21.2% 1.27x (?)
LessSubstringSubstring 34.69 27.5 -20.7% 1.26x
EqualSubstringSubstring 34.327 27.469 -20.0% 1.25x (?)
EqualSubstringString 34.5 27.636 -19.9% 1.25x (?)
EqualStringSubstring 34.043 27.455 -19.4% 1.24x (?)
UTF8Decode_InitFromData 169.2 139.538 -17.5% 1.21x (?)
UTF8Decode_InitFromBytes 173.667 145.0 -16.5% 1.20x (?)
DataCreateMedium 159500.0 138700.0 -13.0% 1.15x (?)
DataCreateSmall 21850.0 19390.0 -11.3% 1.13x (?)
RangeAssignment 11807.0 10685.0 -9.5% 1.11x (?)
Breadcrumbs.MutatedUTF16ToIdx.Mixed 221.1 203.9 -7.8% 1.08x (?)
Breadcrumbs.MutatedIdxToUTF16.Mixed 228.429 210.818 -7.7% 1.08x (?)

oxy avatar Jun 05 '23 21:06 oxy

@swift-ci please benchmark

Another commit, another set of benchmarks:

Performance (x86_64): -O

Regression OLD NEW DELTA RATIO
StringWithCString2 0.0 0.002 +200.0% 0.33x (?)
ArrayAppendGenericStructs 1442.0 1710.0 +18.6% 0.84x (?)
NormalizedIterator_emoji 331.429 377.92 +14.0% 0.88x (?)
String.replaceSubrange.Substring.Small 39.286 43.75 +11.4% 0.90x (?)
FindString.Loop1.Substring 278.625 306.0 +9.8% 0.91x (?)
NormalizedIterator_nonBMPSlowestPrenormal 418.75 458.889 +9.6% 0.91x (?)
InsertCharacterEndIndex 90.5 99.05 +9.4% 0.91x (?)
InsertCharacterTowardsEndIndex 103.588 112.933 +9.0% 0.92x (?)
String.replaceSubrange.ArrChar.Small 36.444 39.462 +8.3% 0.92x (?)
InsertCharacterStartIndex 255.278 275.625 +8.0% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
NaiveRRC.append.smallContiguousRepeated 178.7 102.625 -42.6% 1.74x
FlattenListLoop 1621.0 1021.0 -37.0% 1.59x (?)
LessSubstringSubstring 29.875 22.42 -25.0% 1.33x (?)
LessSubstringSubstringGenericComparable 29.719 22.422 -24.6% 1.33x (?)
EqualSubstringSubstring 30.6 23.308 -23.8% 1.31x (?)
RangeAssignment 155.444 118.643 -23.7% 1.31x (?)
EqualSubstringString 30.051 22.97 -23.6% 1.31x (?)
EqualSubstringSubstringGenericEquatable 29.938 23.056 -23.0% 1.30x (?)
EqualStringSubstring 30.276 23.4 -22.7% 1.29x
UTF8Decode_InitFromData 171.455 136.833 -20.2% 1.25x (?)
UTF8Decode_InitFromBytes 172.6 140.0 -18.9% 1.23x
Set.isDisjoint.Int.Empty 51.2 45.74 -10.7% 1.12x (?)
CxxStringConversion.cxx.to.swift 156.333 140.5 -10.1% 1.11x (?)
Data.init.Sequence.809B.Count.RE.I 22.963 20.788 -9.5% 1.10x (?)
Breadcrumbs.MutatedUTF16ToIdx.Mixed 210.636 191.364 -9.1% 1.10x (?)
Breadcrumbs.MutatedIdxToUTF16.Mixed 218.0 198.222 -9.1% 1.10x (?)
Set.subtracting.Empty.Box 21.64 19.824 -8.4% 1.09x (?)
SortStringsUnicode 2387.5 2192.5 -8.2% 1.09x (?)
Data.init.Sequence.809B.Count.RE 23.042 21.333 -7.4% 1.08x (?)
FlattenListFlatMap 3033.0 2816.0 -7.2% 1.08x (?)
Set.isDisjoint.Seq.Int.Empty 53.13 49.36 -7.1% 1.08x (?)
ArraySetElement 306.5 286.143 -6.6% 1.07x (?)

Code size: -O

Regression OLD NEW DELTA RATIO
StringSplitting.o 36094 37118 +2.8% 0.97x
 
Improvement OLD NEW DELTA RATIO
RangeAssignment.o 3302 2703 -18.1% 1.22x
NaiveRangeReplaceableCollectionConformance.o 11824 10704 -9.5% 1.10x
ArrayRemoveAll.o 7880 7611 -3.4% 1.04x
IndexPathTest.o 9941 9733 -2.1% 1.02x
PopFrontGeneric.o 2470 2422 -1.9% 1.02x
RemoveWhere.o 14692 14407 -1.9% 1.02x
MirrorTest.o 11588 11444 -1.2% 1.01x

Performance (x86_64): -Osize

Regression OLD NEW DELTA RATIO
SuffixAnySequence 98.75 1587.0 +1507.1% 0.06x
SuffixSequence 115.692 1698.0 +1367.7% 0.07x
SuffixSequenceLazy 115.455 1607.0 +1291.9% 0.07x
SuffixCountableRangeLazy 5.249 9.0 +71.4% 0.58x (?)
ArrayAppendGenericStructs 1064.0 1756.667 +65.1% 0.61x (?)
PrefixAnySeqCntRangeLazy 121.077 134.25 +10.9% 0.90x (?)
NormalizedIterator_nonBMPSlowestPrenormal 415.254 458.293 +10.4% 0.91x (?)
NormalizedIterator_emoji 331.04 365.0 +10.3% 0.91x (?)
InsertCharacterEndIndex 89.947 99.105 +10.2% 0.91x (?)
String.replaceSubrange.Substring.Small 39.818 43.5 +9.2% 0.92x (?)
String.replaceSubrange.ArrChar.Small 35.815 39.08 +9.1% 0.92x (?)
StringEnumRawValueInitialization 450.0 488.8 +8.6% 0.92x (?)
FindString.Loop1.Substring 283.0 307.143 +8.5% 0.92x (?)
InsertCharacterTowardsEndIndex 118.429 127.923 +8.0% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
NaiveRRC.append.largeContiguous 0.39 0.104 -73.1% 3.72x (?)
NaiveRRC.append.smallContiguousRepeated 176.0 102.688 -41.7% 1.71x
EqualSubstringSubstringGenericEquatable 29.923 22.606 -24.5% 1.32x
EqualSubstringString 29.844 22.6 -24.3% 1.32x (?)
EqualSubstringSubstring 29.833 22.606 -24.2% 1.32x (?)
EqualStringSubstring 29.826 22.621 -24.2% 1.32x (?)
LessSubstringSubstring 30.375 23.286 -23.3% 1.30x (?)
LessSubstringSubstringGenericComparable 30.385 23.515 -22.6% 1.29x (?)
RangeAssignment 158.455 126.818 -20.0% 1.25x (?)
UTF8Decode_InitFromData 167.4 134.417 -19.7% 1.25x (?)
UTF8Decode_InitFromBytes 171.0 137.417 -19.6% 1.24x (?)
PrefixAnyCollection 137.273 110.733 -19.3% 1.24x (?)
PrefixWhileAnyCollectionLazy 147.769 121.143 -18.0% 1.22x (?)
StringComparison_longSharedPrefix 246.3 206.1 -16.3% 1.20x (?)
DropFirstAnySeqCntRange 120.714 107.333 -11.1% 1.12x (?)
DropFirstAnySeqCRangeIter 120.625 107.333 -11.0% 1.12x (?)
Breadcrumbs.MutatedUTF16ToIdx.Mixed 210.5 191.364 -9.1% 1.10x (?)
Breadcrumbs.MutatedIdxToUTF16.Mixed 217.5 198.111 -8.9% 1.10x (?)
Array.removeAll.keepingCapacity.Object 5.841 5.361 -8.2% 1.09x (?)

Code size: -Osize

Regression OLD NEW DELTA RATIO
Suffix.o 17680 22991 +30.0% 0.77x
StringSplitting.o 36281 37222 +2.6% 0.97x
 
Improvement OLD NEW DELTA RATIO
RangeAssignment.o 2998 2615 -12.8% 1.15x
ArrayRemoveAll.o 7350 7040 -4.2% 1.04x
IndexPathTest.o 7305 7016 -4.0% 1.04x
NaiveRangeReplaceableCollectionConformance.o 11616 11258 -3.1% 1.03x
RemoveWhere.o 12449 12142 -2.5% 1.03x
PopFrontGeneric.o 2427 2381 -1.9% 1.02x
MirrorTest.o 11460 11316 -1.3% 1.01x

Performance (x86_64): -Onone

Regression OLD NEW DELTA RATIO
SubstringRemoveFirst1 0.143 0.167 +16.7% 0.86x (?)
String.replaceSubrange.Substring.Small 40.98 45.68 +11.5% 0.90x (?)
String.replaceSubrange.ArrChar.Small 36.92 41.143 +11.4% 0.90x (?)
InsertCharacterTowardsEndIndex 131.769 142.385 +8.1% 0.93x (?)
ArrayOfGenericPOD2 1049.0 1130.0 +7.7% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
NaiveRRC.append.largeContiguous 109.0 0.458 -99.6% 237.47x
NaiveRRC.init.largeContiguous 103.75 0.49 -99.5% 211.31x
RangeAssignment 11501.0 5693.0 -50.5% 2.02x
NaiveRRC.append.smallContiguousRepeated 2286.0 1668.0 -27.0% 1.37x (?)
PopFrontArrayGeneric 3160.0 2406.667 -23.8% 1.31x (?)
EqualSubstringSubstringGenericEquatable 32.88 25.935 -21.1% 1.27x (?)
LessSubstringSubstring 34.966 27.719 -20.7% 1.26x (?)
UTF8Decode_InitFromData 167.9 133.727 -20.4% 1.26x (?)
LessSubstringSubstringGenericComparable 32.548 26.0 -20.1% 1.25x (?)
UTF8Decode_InitFromBytes 172.667 138.333 -19.9% 1.25x (?)
EqualSubstringString 34.417 27.687 -19.6% 1.24x (?)
EqualSubstringSubstring 34.25 27.667 -19.2% 1.24x
EqualStringSubstring 34.182 27.857 -18.5% 1.23x (?)
DataCreateMedium 160600.0 137200.0 -14.6% 1.17x (?)
DataCreateSmall 21700.0 19030.0 -12.3% 1.14x (?)
Breadcrumbs.MutatedIdxToUTF16.Mixed 228.0 208.182 -8.7% 1.10x (?)

oxy avatar Jun 05 '23 22:06 oxy

@swift-ci please benchmark

oxy avatar Jun 06 '23 01:06 oxy

@swift-ci please benchmark

oxy avatar Jun 06 '23 18:06 oxy

@swift-ci please benchmark Apple Silicon

oxy avatar Jun 07 '23 19:06 oxy

@swift-ci Please Apple Silicon benchmark

The AS results are less noisy, but also highlight a different regression (???)

Performance (arm64): -O

Regression OLD NEW DELTA RATIO
RemoveWhereQuadraticString 167.0 210.583 +26.1% 0.79x (?)
NSStringConversion.InlineBuffer.UTF8 469.667 505.5 +7.6% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
NaiveRRC.append.smallContiguousRepeated 75.762 55.438 -26.8% 1.37x (?)
SIMDReduce.Int32x16.Initializer 13.036 11.075 -15.0% 1.18x (?)
Set.isDisjoint.Seq.Empty.Box 45.241 39.0 -13.8% 1.16x (?)
ObserverForwarderStruct 203.846 186.154 -8.7% 1.10x (?)

Code size: -O

Regression OLD NEW DELTA RATIO
StringSplitting.o 27959 28675 +2.6% 0.98x
 
Improvement OLD NEW DELTA RATIO
RangeAssignment.o 2889 2197 -24.0% 1.31x
NaiveRangeReplaceableCollectionConformance.o 8022 7190 -10.4% 1.12x
ArrayRemoveAll.o 6060 5704 -5.9% 1.06x
IndexPathTest.o 7978 7710 -3.4% 1.03x
RemoveWhere.o 11280 10916 -3.2% 1.03x
MirrorTest.o 8490 8310 -2.1% 1.02x
PopFrontGeneric.o 2009 1973 -1.8% 1.02x

Performance (arm64): -Osize

Regression OLD NEW DELTA RATIO
SuffixSequenceLazy 88.409 704.0 +696.3% 0.13x
SuffixAnySequence 88.409 678.0 +666.9% 0.13x
SuffixSequence 88.423 677.667 +666.4% 0.13x
NSStringConversion.InlineBuffer.UTF8 469.667 506.0 +7.7% 0.93x
 
Improvement OLD NEW DELTA RATIO
NaiveRRC.append.largeContiguous 0.389 0.0 -99.7% 390.00x
NaiveRRC.append.smallContiguousRepeated 77.25 57.0 -26.2% 1.36x
RangeAssignment 155.727 143.643 -7.8% 1.08x (?)

Code size: -Osize

Regression OLD NEW DELTA RATIO
Suffix.o 13712 17404 +26.9% 0.79x
StringSplitting.o 24583 25027 +1.8% 0.98x
 
Improvement OLD NEW DELTA RATIO
RangeAssignment.o 2617 2205 -15.7% 1.19x
ArrayRemoveAll.o 6096 5708 -6.4% 1.07x
NaiveRangeReplaceableCollectionConformance.o 7994 7494 -6.3% 1.07x
IndexPathTest.o 6194 5902 -4.7% 1.05x
RemoveWhere.o 10036 9656 -3.8% 1.04x
MirrorTest.o 8130 7946 -2.3% 1.02x
PopFrontGeneric.o 2065 2021 -2.1% 1.02x

Performance (arm64): -Onone

Regression OLD NEW DELTA RATIO
ArrayOfGenericPOD2 855.0 1170.0 +36.8% 0.73x (?)
Set.filter.Int100.24k 62.575 69.714 +11.4% 0.90x (?)
Set.filter.Int100.20k 52.596 58.488 +11.2% 0.90x
Set.filter.Int100.16k 42.586 47.241 +10.9% 0.90x
Set.filter.Int100.28k 75.03 83.1 +10.8% 0.90x (?)
StringWordBuilderReservingCapacity 811.667 875.0 +7.8% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
NaiveRRC.append.largeContiguous 53.167 0.278 -99.5% 190.57x
NaiveRRC.init.largeContiguous 53.295 0.288 -99.5% 184.42x
RangeAssignment 5628.0 2854.0 -49.3% 1.97x
NaiveRRC.append.smallContiguousRepeated 1234.5 957.5 -22.4% 1.29x
PopFrontArrayGeneric 2456.0 2061.25 -16.1% 1.19x (?)

oxy avatar Jun 07 '23 20:06 oxy

@swift-ci please benchmark

oxy avatar Jun 08 '23 07:06 oxy

@swift-ci please apple silicon benchmark

oxy avatar Jun 08 '23 07:06 oxy

[removed one of the three branches that could have potentially introduced a regression in the Suffix benchmarks - working on better understanding for optimizer behavior on the other two to potentially fix the regression]

oxy avatar Jun 08 '23 07:06 oxy

@swift-ci please benchmark

oxy avatar Jun 08 '23 15:06 oxy

@swift-ci please apple silicon benchmark

oxy avatar Jun 08 '23 15:06 oxy

@swift-ci please benchmark

oxy avatar Jun 09 '23 18:06 oxy

@swift-ci please apple silicon benchmark

oxy avatar Jun 09 '23 18:06 oxy

@swift-ci please benchmark

harlanhaskins avatar Jun 18 '23 19:06 harlanhaskins

@swift-ci please apple silicon benchmark

oxy avatar Aug 02 '23 17:08 oxy

Test again, same results:

Performance (arm64): -O

Regression OLD NEW DELTA RATIO
Set.isSuperset.Seq.Empty.Int 34.303 40.55 +18.2% 0.85x
DataAppendBytesSmall 121.667 134.143 +10.3% 0.91x (?)
 
Improvement OLD NEW DELTA RATIO
NaiveRRC.append.smallContiguousRepeated 75.727 55.529 -26.7% 1.36x
ObserverForwarderStruct 219.583 197.308 -10.1% 1.11x (?)
NSStringConversion.InlineBuffer.UTF8 500.5 464.333 -7.2% 1.08x (?)
StringInterpolationSmall 533.889 496.923 -6.9% 1.07x (?)

Code size: -O

Regression OLD NEW DELTA RATIO
NaiveRangeReplaceableCollectionConformance.o 8090 9326 +15.3% 0.87x
 
Improvement OLD NEW DELTA RATIO
RangeAssignment.o 2885 2165 -25.0% 1.33x
ArrayRemoveAll.o 6136 5792 -5.6% 1.06x
IndexPathTest.o 7974 7686 -3.6% 1.04x
RemoveWhere.o 11152 10760 -3.5% 1.04x
MirrorTest.o 8546 8330 -2.5% 1.03x
PopFrontGeneric.o 1997 1961 -1.8% 1.02x

Performance (arm64): -Osize

Regression OLD NEW DELTA RATIO
SuffixSequenceLazy 87.773 703.333 +701.3% 0.12x
SuffixAnySequence 87.714 677.0 +671.8% 0.13x
SuffixSequence 87.773 677.0 +671.3% 0.13x
ArrayInClass 156.034 187.239 +20.0% 0.83x
DistinctClassFieldAccesses 35.928 42.169 +17.4% 0.85x
ArraySetElement 218.545 249.75 +14.3% 0.88x
Array2D 5586.286 6237.333 +11.7% 0.90x
DropLastAnySeqCntRange 278.714 310.714 +11.5% 0.90x (?)
DropLastAnySeqCRangeIter 278.625 310.571 +11.5% 0.90x
DataAppendBytesSmall 134.143 146.636 +9.3% 0.91x (?)
PrefixWhileSequence 214.9 233.444 +8.6% 0.92x (?)
PrefixWhileAnySequence 214.9 233.444 +8.6% 0.92x
DictionaryBridgeToObjC_Bridge 6.489 6.988 +7.7% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
NaiveRRC.append.smallContiguousRepeated 77.278 58.5 -24.3% 1.32x (?)
RangeAssignment 164.667 143.538 -12.8% 1.15x
DataAppendArray 2256.41 2052.941 -9.0% 1.10x (?)
NSStringConversion.InlineBuffer.UTF8 500.667 465.333 -7.1% 1.08x (?)
StringInterpolationSmall 549.565 512.0 -6.8% 1.07x (?)
FindString.Rec3.Array 95.192 88.9 -6.6% 1.07x (?)

Code size: -Osize

Regression OLD NEW DELTA RATIO
Suffix.o 13684 17352 +26.8% 0.79x
StringSplitting.o 24283 24535 +1.0% 0.99x
 
Improvement OLD NEW DELTA RATIO
RangeAssignment.o 2617 2197 -16.0% 1.19x
NaiveRangeReplaceableCollectionConformance.o 7970 7282 -8.6% 1.09x
ArrayRemoveAll.o 6152 5784 -6.0% 1.06x
IndexPathTest.o 6186 5886 -4.8% 1.05x
RemoveWhere.o 9976 9544 -4.3% 1.05x
MirrorTest.o 8214 8006 -2.5% 1.03x
PopFrontGeneric.o 2053 2009 -2.1% 1.02x

Performance (arm64): -Onone

Regression OLD NEW DELTA RATIO
ArrayAppendGenericStructs 706.154 872.727 +23.6% 0.81x (?)
PopFrontArrayGeneric 2461.25 2994.545 +21.7% 0.82x
DataCreateMedium 88400.0 101300.0 +14.6% 0.87x
RandomDoubleLCG 15110.0 17028.0 +12.7% 0.89x (?)
DataCreateSmall 12215.0 13700.0 +12.2% 0.89x (?)
Set.filter.Int100.24k 62.575 69.714 +11.4% 0.90x (?)
Set.filter.Int100.20k 52.587 58.469 +11.2% 0.90x (?)
Set.filter.Int100.16k 42.604 47.24 +10.9% 0.90x
Set.filter.Int100.28k 75.03 83.067 +10.7% 0.90x
TypeName 830.0 897.5 +8.1% 0.92x (?)
StringWordBuilderReservingCapacity 812.5 874.286 +7.6% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
NaiveRRC.init.largeContiguous 53.417 0.296 -99.4% 179.86x
NaiveRRC.append.largeContiguous 53.467 0.3 -99.4% 177.63x
RangeAssignment 5662.0 2852.0 -49.6% 1.99x
NaiveRRC.append.smallContiguousRepeated 1224.5 974.0 -20.5% 1.26x (?)
RawBuffer.1000.findLast 66422.0 55569.0 -16.3% 1.20x
RawBuffer.128.findLast 9080.0 7733.0 -14.8% 1.17x (?)
RawBuffer.39.findLast 3307.0 2924.0 -11.6% 1.13x (?)
ObjectiveCBridgeStubToNSString 1220.0 1127.143 -7.6% 1.08x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini9,1
  Total Number of Cores: 8 (4 performance and 4 efficiency)
  Memory: 16 GB

oxy avatar Aug 02 '23 19:08 oxy

@swift-ci please test

oxy avatar Aug 03 '23 02:08 oxy

Marking as ready for review.

The performance regressions for SuffixSequence and friends are a result of an unfortunate optimizer decision early in the SIL phase to not inline Sequence.suffix because some parts of the callgraph in the future are shared now - since this only occurs in -Osize and is likely due to how the Suffix benchmark module is compiled (and, as far as I can tell, unlikely to affect real programs), I think this should be okay to merge performance-wise.

oxy avatar Aug 08 '23 22:08 oxy

@swift-ci please test

oxy avatar Aug 09 '23 17:08 oxy

@swift-ci please test

oxy avatar Aug 10 '23 00:08 oxy

@swift-ci please test

oxy avatar Aug 10 '23 22:08 oxy

@swift-ci Please clean test Linux platform

oxy avatar Aug 14 '23 16:08 oxy

I haven't been able to reproduce the test failures locally on a Ubuntu 20.04 Intel container that ran the same buildbot_linux preset. I'm not sure what the source of the failure is, since it appears to crash in libc in the stdlibUnittest code, but it passes locally.

It doesn't appear to be flakiness, since multiple runs on multiple platforms had the test pass locally but the test still fails in CI.

oxy avatar Aug 18 '23 19:08 oxy

@swift-ci please test

Catfish-Man avatar Feb 28 '24 21:02 Catfish-Man

I think something in the past ~half year fixed it - I rebased onto main and could not reproduce the failures with buildbot_linux anymore! Hopefully the same is true in CI...

oxy avatar Feb 28 '24 21:02 oxy

@swift-ci please test

EDIT: I finally know what tests are failing in release mode and why:

NoBoundsCheck/EvilShrinkage/*:

  • expected/old release behavior: just keep forming indexes and reading past reported endIndex
  • old DebugAssert behavior: _debugPrecondition to abort on collection shrinkage
  • new behavior: stop at endIndex, see that buffer is underfull, always abort

BoundsChecked/EvilGrowth/*:

  • expected/old release behavior: stop reading at first reported count, ignore growth
  • old DebugAssert behavior: _debugPrecondition to abort on collection growth
  • new behavior: continues reading past the old count, always aborts because the hole we created in the array is not large enough (abort generated by UMBP.initialize(fromContentsOf:))

oxy avatar Feb 28 '24 21:02 oxy