swift
swift copied to clipboard
[stdlib] performance optimizations in `Array.replaceSubrange`
This PR is intended to generally improve performance for replaceSubrange.
Currently, it simplifies the main codepath to avoid unneeded branches. Additional plans:
- [ ] write benchmarks for replaceSubrange in a variety of contexts (replacing 1 element, 10%, 50%, and 100%, and growing/shrinking)
- [ ] add an additional branch for replaceSubrange that doesn't modify an existing Array in place and instead constructs a new Array with copy/move (for non-unique / growth cases)
@swift-ci please test
@swift-ci please test
@swift-ci please benchmark
Significant changes with the original commit:
Performance (x86_64): -O
| Regression | OLD | NEW | DELTA | RATIO |
|---|---|---|---|---|
| ArrayAppendToGeneric | 293.333 | 500.896 | +70.8% | 0.59x (?) |
| ArrayAppendSequence | 410.0 | 662.0 | +61.5% | 0.62x (?) |
| Dictionary4 | 156.0 | 192.5 | +23.4% | 0.81x (?) |
| UTF8Decode_InitFromCustom_contiguous | 129.0 | 157.182 | +21.8% | 0.82x (?) |
| UTF8Decode_InitDecoding | 129.077 | 157.167 | +21.8% | 0.82x (?) |
| ArrayAppendGenericStructs | 1462.5 | 1770.0 | +21.0% | 0.83x (?) |
| Dictionary4OfObjects | 183.5 | 217.857 | +18.7% | 0.84x |
| UTF8Decode_InitFromCustom_noncontiguous | 250.5 | 279.375 | +11.5% | 0.90x (?) |
| FindString.Loop1.Substring | 277.875 | 307.143 | +10.5% | 0.90x (?) |
| Improvement | OLD | NEW | DELTA | RATIO |
| NaiveRRC.append.smallContiguousRepeated | 176.0 | 98.563 | -44.0% | 1.79x |
| FlattenListFlatMap | 4324.0 | 2842.0 | -34.3% | 1.52x (?) |
| Array.removeAll.keepingCapacity.Object | 6.63 | 5.224 | -21.2% | 1.27x (?) |
| RangeAssignment | 155.643 | 134.688 | -13.5% | 1.16x (?) |
| Set.isDisjoint.Int.Empty | 51.2 | 45.743 | -10.7% | 1.12x (?) |
| FlattenListLoop | 1625.0 | 1478.0 | -9.0% | 1.10x (?) |
| Set.subtracting.Empty.Box | 21.688 | 19.823 | -8.6% | 1.09x (?) |
| PrefixWhileSequence | 181.4 | 168.75 | -7.0% | 1.07x (?) |
| PrefixWhileAnySequence | 181.222 | 169.0 | -6.7% | 1.07x (?) |
| Set.isDisjoint.Seq.Int.Empty | 53.125 | 49.6 | -6.6% | 1.07x (?) |
| NSStringConversion.Rebridge.LongUTF8 | 31.2 | 29.156 | -6.6% | 1.07x (?) |
Code size: -O
| Regression | OLD | NEW | DELTA | RATIO |
|---|---|---|---|---|
| NaiveRangeReplaceableCollectionConformance.o | 11824 | 13644 | +15.4% | 0.87x |
| Improvement | OLD | NEW | DELTA | RATIO |
| RangeAssignment.o | 3302 | 2751 | -16.7% | 1.20x |
| ArrayRemoveAll.o | 7880 | 7594 | -3.6% | 1.04x |
| IndexPathTest.o | 9941 | 9685 | -2.6% | 1.03x |
| RemoveWhere.o | 14692 | 14390 | -2.1% | 1.02x |
| PopFrontGeneric.o | 2470 | 2422 | -1.9% | 1.02x |
| MirrorTest.o | 11588 | 11428 | -1.4% | 1.01x |
| s |
Performance (x86_64): -Osize
| Regression | OLD | NEW | DELTA | RATIO |
|---|---|---|---|---|
| UTF8Decode_InitFromCustom_contiguous | 126.833 | 157.429 | +24.1% | 0.81x (?) |
| UTF8Decode_InitDecoding | 127.615 | 156.917 | +23.0% | 0.81x (?) |
| UTF8Decode_InitFromCustom_noncontiguous | 287.571 | 315.333 | +9.7% | 0.91x (?) |
| Improvement | OLD | NEW | DELTA | RATIO |
| StringWithCString2 | 0.002 | 0.0 | -66.7% | 3.00x |
| NaiveRRC.append.smallContiguousRepeated | 176.0 | 94.0 | -46.6% | 1.87x |
| RemoveWhereSwapInts | 15.354 | 11.52 | -25.0% | 1.33x (?) |
| Array.removeAll.keepingCapacity.Object | 6.88 | 5.173 | -24.8% | 1.33x (?) |
Code size: -Osize
| Improvement | OLD | NEW | DELTA | RATIO |
|---|---|---|---|---|
| RangeAssignment.o | 2998 | 2658 | -11.3% | 1.13x |
| NaiveRangeReplaceableCollectionConformance.o | 11616 | 11058 | -4.8% | 1.05x |
| ArrayRemoveAll.o | 7350 | 7019 | -4.5% | 1.05x |
| IndexPathTest.o | 7305 | 7008 | -4.1% | 1.04x |
| RemoveWhere.o | 12449 | 12106 | -2.8% | 1.03x |
| PopFrontGeneric.o | 2427 | 2381 | -1.9% | 1.02x |
| MirrorTest.o | 11460 | 11300 | -1.4% | 1.01x |
Performance (x86_64): -Onone
| Regression | OLD | NEW | DELTA | RATIO |
|---|---|---|---|---|
| UTF8Decode_InitDecoding | 135.75 | 167.333 | +23.3% | 0.81x (?) |
| UTF8Decode_InitFromCustom_contiguous | 136.083 | 166.923 | +22.7% | 0.82x (?) |
| NSStringConversion.InlineBuffer.ASCII | 5282.0 | 6144.0 | +16.3% | 0.86x (?) |
| NSStringConversion.InlineBuffer.UTF8 | 3170.0 | 3605.0 | +13.7% | 0.88x (?) |
| ArrayOfGenericPOD2 | 1050.0 | 1145.0 | +9.0% | 0.92x (?) |
| Improvement | OLD | NEW | DELTA | RATIO |
| DataCreateMedium | 159400.0 | 138800.0 | -12.9% | 1.15x (?) |
| CharacterLiteralsLarge | 447.25 | 397.2 | -11.2% | 1.13x (?) |
| CxxStringConversion.cxx.to.swift | 162.333 | 144.667 | -10.9% | 1.12x (?) |
| PopFrontArrayGeneric | 3160.0 | 2890.0 | -8.5% | 1.09x (?) |
| BinaryFloatingPointPropertiesBinade | 55.0 | 51.2 | -6.9% | 1.07x (?) |
| Calculator | 930.0 | 867.0 | -6.8% | 1.07x (?) |
Code size: -swiftlibs
@swift-ci please benchmark
Adding the extra branches to skip .deinitialize / .moveInitialize helped with some of the regressions:
Performance (x86_64): -O
| Regression | OLD | NEW | DELTA | RATIO |
|---|---|---|---|---|
| StringWithCString2 | 0.0 | 0.002 | +200.0% | 0.33x (?) |
| InsertCharacterEndIndex | 90.455 | 101.118 | +11.8% | 0.89x (?) |
| InsertCharacterEndIndexNonASCII | 28.406 | 31.613 | +11.3% | 0.90x (?) |
| InsertCharacterTowardsEndIndex | 103.438 | 113.267 | +9.5% | 0.91x (?) |
| FindString.Loop1.Substring | 277.8 | 303.143 | +9.1% | 0.92x (?) |
| FlattenListLoop | 1205.0 | 1313.0 | +9.0% | 0.92x (?) |
| Improvement | OLD | NEW | DELTA | RATIO |
| NaiveRRC.append.largeContiguous | 0.124 | 0.0 | -99.2% | 125.00x (?) |
| NaiveRRC.append.smallContiguousRepeated | 177.9 | 98.588 | -44.6% | 1.80x |
| LessSubstringSubstringGenericComparable | 29.724 | 22.364 | -24.8% | 1.33x (?) |
| LessSubstringSubstring | 29.625 | 22.318 | -24.7% | 1.33x (?) |
| EqualSubstringSubstring | 30.6 | 23.077 | -24.6% | 1.33x (?) |
| EqualSubstringString | 30.16 | 22.864 | -24.2% | 1.32x |
| EqualStringSubstring | 30.375 | 23.179 | -23.7% | 1.31x (?) |
| EqualSubstringSubstringGenericEquatable | 29.96 | 22.97 | -23.3% | 1.30x (?) |
| UTF8Decode_InitFromData | 167.583 | 138.455 | -17.4% | 1.21x (?) |
| UTF8Decode_InitFromBytes | 170.889 | 143.0 | -16.3% | 1.20x (?) |
| StringComparison_longSharedPrefix | 246.3 | 209.545 | -14.9% | 1.18x (?) |
| NormalizedIterator_fastPrenormal | 553.023 | 486.531 | -12.0% | 1.14x (?) |
| Breadcrumbs.UTF16ToIdx.longASCII | 43.581 | 39.352 | -9.7% | 1.11x (?) |
| SortStringsUnicode | 2390.0 | 2165.0 | -9.4% | 1.10x (?) |
| NormalizedIterator_latin1 | 182.1 | 167.636 | -7.9% | 1.09x (?) |
| Breadcrumbs.MutatedUTF16ToIdx.Mixed | 210.545 | 195.0 | -7.4% | 1.08x (?) |
| Breadcrumbs.MutatedIdxToUTF16.Mixed | 217.875 | 201.875 | -7.3% | 1.08x (?) |
| SubstringEqualString | 181.0 | 168.2 | -7.1% | 1.08x (?) |
| RangeAssignment | 155.5 | 145.0 | -6.8% | 1.07x (?) |
Code size: -O
| Regression | OLD | NEW | DELTA | RATIO |
|---|---|---|---|---|
| StringSplitting.o | 36094 | 37118 | +2.8% | 0.97x |
| Improvement | OLD | NEW | DELTA | RATIO |
| RangeAssignment.o | 3302 | 2767 | -16.2% | 1.19x |
| ArrayRemoveAll.o | 7880 | 7611 | -3.4% | 1.04x |
| NaiveRangeReplaceableCollectionConformance.o | 11824 | 11472 | -3.0% | 1.03x |
| IndexPathTest.o | 9941 | 9733 | -2.1% | 1.02x |
| PopFrontGeneric.o | 2470 | 2422 | -1.9% | 1.02x |
| RemoveWhere.o | 14692 | 14407 | -1.9% | 1.02x |
| MirrorTest.o | 11588 | 11444 | -1.2% | 1.01x |
Performance (x86_64): -Osize
| Regression | OLD | NEW | DELTA | RATIO |
|---|---|---|---|---|
| StringBuilderLong | 778.333 | 916.842 | +17.8% | 0.85x (?) |
| DropLastCountableRangeLazy | 4.992 | 5.802 | +16.2% | 0.86x (?) |
| InsertCharacterEndIndex | 90.105 | 101.278 | +12.4% | 0.89x (?) |
| InsertCharacterEndIndexNonASCII | 28.469 | 31.0 | +8.9% | 0.92x (?) |
| String.replaceSubrange.String | 10.013 | 10.887 | +8.7% | 0.92x (?) |
| String.replaceSubrange.ArrChar.Small | 35.781 | 38.654 | +8.0% | 0.93x (?) |
| InsertCharacterTowardsEndIndex | 118.714 | 128.071 | +7.9% | 0.93x (?) |
| Improvement | OLD | NEW | DELTA | RATIO |
| NaiveRRC.append.smallContiguousRepeated | 172.9 | 93.923 | -45.7% | 1.84x |
| EqualSubstringString | 29.844 | 22.422 | -24.9% | 1.33x (?) |
| EqualSubstringSubstringGenericEquatable | 29.923 | 22.576 | -24.6% | 1.33x (?) |
| EqualSubstringSubstring | 29.844 | 22.576 | -24.4% | 1.32x (?) |
| EqualStringSubstring | 29.826 | 22.633 | -24.1% | 1.32x |
| LessSubstringSubstring | 30.387 | 23.088 | -24.0% | 1.32x |
| LessSubstringSubstringGenericComparable | 30.378 | 23.186 | -23.7% | 1.31x |
| Array.removeAll.keepingCapacity.Object | 6.875 | 5.432 | -21.0% | 1.27x (?) |
| UTF8Decode_InitFromData | 167.5 | 138.385 | -17.4% | 1.21x (?) |
| UTF8Decode_InitFromBytes | 175.125 | 146.8 | -16.2% | 1.19x |
| StringComparison_longSharedPrefix | 246.9 | 208.545 | -15.5% | 1.18x (?) |
| Breadcrumbs.UTF16ToIdx.longASCII | 44.673 | 39.297 | -12.0% | 1.14x (?) |
| SubstringEqualString | 183.778 | 164.8 | -10.3% | 1.12x (?) |
| StringFromLongWholeSubstringGeneric | 6.007 | 5.477 | -8.8% | 1.10x (?) |
| StringComparison_latin1 | 336.833 | 312.0 | -7.4% | 1.08x (?) |
| Breadcrumbs.MutatedIdxToUTF16.Mixed | 217.625 | 202.0 | -7.2% | 1.08x (?) |
| Breadcrumbs.MutatedUTF16ToIdx.Mixed | 210.636 | 195.636 | -7.1% | 1.08x (?) |
| SubstringEquatable | 316.286 | 295.5 | -6.6% | 1.07x (?) |
Code size: -Osize
| Regression | OLD | NEW | DELTA | RATIO |
|---|---|---|---|---|
| StringSplitting.o | 36281 | 37222 | +2.6% | 0.97x |
| Improvement | OLD | NEW | DELTA | RATIO |
| RangeAssignment.o | 2998 | 2690 | -10.3% | 1.11x |
| ArrayRemoveAll.o | 7350 | 7040 | -4.2% | 1.04x |
| IndexPathTest.o | 7305 | 7016 | -4.0% | 1.04x |
| NaiveRangeReplaceableCollectionConformance.o | 11616 | 11273 | -3.0% | 1.03x |
| RemoveWhere.o | 12449 | 12142 | -2.5% | 1.03x |
| PopFrontGeneric.o | 2427 | 2381 | -1.9% | 1.02x |
| MirrorTest.o | 11460 | 11316 | -1.3% | 1.01x |
Performance (x86_64): -Onone
| Regression | OLD | NEW | DELTA | RATIO |
|---|---|---|---|---|
| Array.removeAll.keepingCapacity.Object | 5.61 | 6.798 | +21.2% | 0.83x (?) |
| String.replaceSubrange.String | 11.102 | 12.425 | +11.9% | 0.89x (?) |
| InsertCharacterTowardsEndIndex | 131.462 | 144.5 | +9.9% | 0.91x (?) |
| ArrayAppendLatin1Substring | 21984.0 | 24036.0 | +9.3% | 0.91x (?) |
| ArrayAppendAsciiSubstring | 21780.0 | 23760.0 | +9.1% | 0.92x (?) |
| ArrayAppendUTF16Substring | 21792.0 | 23748.0 | +9.0% | 0.92x (?) |
| InsertCharacterEndIndex | 135.5 | 147.077 | +8.5% | 0.92x (?) |
| Improvement | OLD | NEW | DELTA | RATIO |
| PopFrontArrayGeneric | 3158.0 | 2433.846 | -22.9% | 1.30x (?) |
| LessSubstringSubstringGenericComparable | 33.0 | 25.962 | -21.3% | 1.27x (?) |
| EqualSubstringSubstringGenericEquatable | 32.96 | 25.968 | -21.2% | 1.27x (?) |
| LessSubstringSubstring | 34.69 | 27.5 | -20.7% | 1.26x |
| EqualSubstringSubstring | 34.327 | 27.469 | -20.0% | 1.25x (?) |
| EqualSubstringString | 34.5 | 27.636 | -19.9% | 1.25x (?) |
| EqualStringSubstring | 34.043 | 27.455 | -19.4% | 1.24x (?) |
| UTF8Decode_InitFromData | 169.2 | 139.538 | -17.5% | 1.21x (?) |
| UTF8Decode_InitFromBytes | 173.667 | 145.0 | -16.5% | 1.20x (?) |
| DataCreateMedium | 159500.0 | 138700.0 | -13.0% | 1.15x (?) |
| DataCreateSmall | 21850.0 | 19390.0 | -11.3% | 1.13x (?) |
| RangeAssignment | 11807.0 | 10685.0 | -9.5% | 1.11x (?) |
| Breadcrumbs.MutatedUTF16ToIdx.Mixed | 221.1 | 203.9 | -7.8% | 1.08x (?) |
| Breadcrumbs.MutatedIdxToUTF16.Mixed | 228.429 | 210.818 | -7.7% | 1.08x (?) |
@swift-ci please benchmark
Another commit, another set of benchmarks:
Performance (x86_64): -O
| Regression | OLD | NEW | DELTA | RATIO |
|---|---|---|---|---|
| StringWithCString2 | 0.0 | 0.002 | +200.0% | 0.33x (?) |
| ArrayAppendGenericStructs | 1442.0 | 1710.0 | +18.6% | 0.84x (?) |
| NormalizedIterator_emoji | 331.429 | 377.92 | +14.0% | 0.88x (?) |
| String.replaceSubrange.Substring.Small | 39.286 | 43.75 | +11.4% | 0.90x (?) |
| FindString.Loop1.Substring | 278.625 | 306.0 | +9.8% | 0.91x (?) |
| NormalizedIterator_nonBMPSlowestPrenormal | 418.75 | 458.889 | +9.6% | 0.91x (?) |
| InsertCharacterEndIndex | 90.5 | 99.05 | +9.4% | 0.91x (?) |
| InsertCharacterTowardsEndIndex | 103.588 | 112.933 | +9.0% | 0.92x (?) |
| String.replaceSubrange.ArrChar.Small | 36.444 | 39.462 | +8.3% | 0.92x (?) |
| InsertCharacterStartIndex | 255.278 | 275.625 | +8.0% | 0.93x (?) |
| Improvement | OLD | NEW | DELTA | RATIO |
| NaiveRRC.append.smallContiguousRepeated | 178.7 | 102.625 | -42.6% | 1.74x |
| FlattenListLoop | 1621.0 | 1021.0 | -37.0% | 1.59x (?) |
| LessSubstringSubstring | 29.875 | 22.42 | -25.0% | 1.33x (?) |
| LessSubstringSubstringGenericComparable | 29.719 | 22.422 | -24.6% | 1.33x (?) |
| EqualSubstringSubstring | 30.6 | 23.308 | -23.8% | 1.31x (?) |
| RangeAssignment | 155.444 | 118.643 | -23.7% | 1.31x (?) |
| EqualSubstringString | 30.051 | 22.97 | -23.6% | 1.31x (?) |
| EqualSubstringSubstringGenericEquatable | 29.938 | 23.056 | -23.0% | 1.30x (?) |
| EqualStringSubstring | 30.276 | 23.4 | -22.7% | 1.29x |
| UTF8Decode_InitFromData | 171.455 | 136.833 | -20.2% | 1.25x (?) |
| UTF8Decode_InitFromBytes | 172.6 | 140.0 | -18.9% | 1.23x |
| Set.isDisjoint.Int.Empty | 51.2 | 45.74 | -10.7% | 1.12x (?) |
| CxxStringConversion.cxx.to.swift | 156.333 | 140.5 | -10.1% | 1.11x (?) |
| Data.init.Sequence.809B.Count.RE.I | 22.963 | 20.788 | -9.5% | 1.10x (?) |
| Breadcrumbs.MutatedUTF16ToIdx.Mixed | 210.636 | 191.364 | -9.1% | 1.10x (?) |
| Breadcrumbs.MutatedIdxToUTF16.Mixed | 218.0 | 198.222 | -9.1% | 1.10x (?) |
| Set.subtracting.Empty.Box | 21.64 | 19.824 | -8.4% | 1.09x (?) |
| SortStringsUnicode | 2387.5 | 2192.5 | -8.2% | 1.09x (?) |
| Data.init.Sequence.809B.Count.RE | 23.042 | 21.333 | -7.4% | 1.08x (?) |
| FlattenListFlatMap | 3033.0 | 2816.0 | -7.2% | 1.08x (?) |
| Set.isDisjoint.Seq.Int.Empty | 53.13 | 49.36 | -7.1% | 1.08x (?) |
| ArraySetElement | 306.5 | 286.143 | -6.6% | 1.07x (?) |
Code size: -O
| Regression | OLD | NEW | DELTA | RATIO |
|---|---|---|---|---|
| StringSplitting.o | 36094 | 37118 | +2.8% | 0.97x |
| Improvement | OLD | NEW | DELTA | RATIO |
| RangeAssignment.o | 3302 | 2703 | -18.1% | 1.22x |
| NaiveRangeReplaceableCollectionConformance.o | 11824 | 10704 | -9.5% | 1.10x |
| ArrayRemoveAll.o | 7880 | 7611 | -3.4% | 1.04x |
| IndexPathTest.o | 9941 | 9733 | -2.1% | 1.02x |
| PopFrontGeneric.o | 2470 | 2422 | -1.9% | 1.02x |
| RemoveWhere.o | 14692 | 14407 | -1.9% | 1.02x |
| MirrorTest.o | 11588 | 11444 | -1.2% | 1.01x |
Performance (x86_64): -Osize
| Regression | OLD | NEW | DELTA | RATIO |
|---|---|---|---|---|
| SuffixAnySequence | 98.75 | 1587.0 | +1507.1% | 0.06x |
| SuffixSequence | 115.692 | 1698.0 | +1367.7% | 0.07x |
| SuffixSequenceLazy | 115.455 | 1607.0 | +1291.9% | 0.07x |
| SuffixCountableRangeLazy | 5.249 | 9.0 | +71.4% | 0.58x (?) |
| ArrayAppendGenericStructs | 1064.0 | 1756.667 | +65.1% | 0.61x (?) |
| PrefixAnySeqCntRangeLazy | 121.077 | 134.25 | +10.9% | 0.90x (?) |
| NormalizedIterator_nonBMPSlowestPrenormal | 415.254 | 458.293 | +10.4% | 0.91x (?) |
| NormalizedIterator_emoji | 331.04 | 365.0 | +10.3% | 0.91x (?) |
| InsertCharacterEndIndex | 89.947 | 99.105 | +10.2% | 0.91x (?) |
| String.replaceSubrange.Substring.Small | 39.818 | 43.5 | +9.2% | 0.92x (?) |
| String.replaceSubrange.ArrChar.Small | 35.815 | 39.08 | +9.1% | 0.92x (?) |
| StringEnumRawValueInitialization | 450.0 | 488.8 | +8.6% | 0.92x (?) |
| FindString.Loop1.Substring | 283.0 | 307.143 | +8.5% | 0.92x (?) |
| InsertCharacterTowardsEndIndex | 118.429 | 127.923 | +8.0% | 0.93x (?) |
| Improvement | OLD | NEW | DELTA | RATIO |
| NaiveRRC.append.largeContiguous | 0.39 | 0.104 | -73.1% | 3.72x (?) |
| NaiveRRC.append.smallContiguousRepeated | 176.0 | 102.688 | -41.7% | 1.71x |
| EqualSubstringSubstringGenericEquatable | 29.923 | 22.606 | -24.5% | 1.32x |
| EqualSubstringString | 29.844 | 22.6 | -24.3% | 1.32x (?) |
| EqualSubstringSubstring | 29.833 | 22.606 | -24.2% | 1.32x (?) |
| EqualStringSubstring | 29.826 | 22.621 | -24.2% | 1.32x (?) |
| LessSubstringSubstring | 30.375 | 23.286 | -23.3% | 1.30x (?) |
| LessSubstringSubstringGenericComparable | 30.385 | 23.515 | -22.6% | 1.29x (?) |
| RangeAssignment | 158.455 | 126.818 | -20.0% | 1.25x (?) |
| UTF8Decode_InitFromData | 167.4 | 134.417 | -19.7% | 1.25x (?) |
| UTF8Decode_InitFromBytes | 171.0 | 137.417 | -19.6% | 1.24x (?) |
| PrefixAnyCollection | 137.273 | 110.733 | -19.3% | 1.24x (?) |
| PrefixWhileAnyCollectionLazy | 147.769 | 121.143 | -18.0% | 1.22x (?) |
| StringComparison_longSharedPrefix | 246.3 | 206.1 | -16.3% | 1.20x (?) |
| DropFirstAnySeqCntRange | 120.714 | 107.333 | -11.1% | 1.12x (?) |
| DropFirstAnySeqCRangeIter | 120.625 | 107.333 | -11.0% | 1.12x (?) |
| Breadcrumbs.MutatedUTF16ToIdx.Mixed | 210.5 | 191.364 | -9.1% | 1.10x (?) |
| Breadcrumbs.MutatedIdxToUTF16.Mixed | 217.5 | 198.111 | -8.9% | 1.10x (?) |
| Array.removeAll.keepingCapacity.Object | 5.841 | 5.361 | -8.2% | 1.09x (?) |
Code size: -Osize
| Regression | OLD | NEW | DELTA | RATIO |
|---|---|---|---|---|
| Suffix.o | 17680 | 22991 | +30.0% | 0.77x |
| StringSplitting.o | 36281 | 37222 | +2.6% | 0.97x |
| Improvement | OLD | NEW | DELTA | RATIO |
| RangeAssignment.o | 2998 | 2615 | -12.8% | 1.15x |
| ArrayRemoveAll.o | 7350 | 7040 | -4.2% | 1.04x |
| IndexPathTest.o | 7305 | 7016 | -4.0% | 1.04x |
| NaiveRangeReplaceableCollectionConformance.o | 11616 | 11258 | -3.1% | 1.03x |
| RemoveWhere.o | 12449 | 12142 | -2.5% | 1.03x |
| PopFrontGeneric.o | 2427 | 2381 | -1.9% | 1.02x |
| MirrorTest.o | 11460 | 11316 | -1.3% | 1.01x |
Performance (x86_64): -Onone
| Regression | OLD | NEW | DELTA | RATIO |
|---|---|---|---|---|
| SubstringRemoveFirst1 | 0.143 | 0.167 | +16.7% | 0.86x (?) |
| String.replaceSubrange.Substring.Small | 40.98 | 45.68 | +11.5% | 0.90x (?) |
| String.replaceSubrange.ArrChar.Small | 36.92 | 41.143 | +11.4% | 0.90x (?) |
| InsertCharacterTowardsEndIndex | 131.769 | 142.385 | +8.1% | 0.93x (?) |
| ArrayOfGenericPOD2 | 1049.0 | 1130.0 | +7.7% | 0.93x (?) |
| Improvement | OLD | NEW | DELTA | RATIO |
| NaiveRRC.append.largeContiguous | 109.0 | 0.458 | -99.6% | 237.47x |
| NaiveRRC.init.largeContiguous | 103.75 | 0.49 | -99.5% | 211.31x |
| RangeAssignment | 11501.0 | 5693.0 | -50.5% | 2.02x |
| NaiveRRC.append.smallContiguousRepeated | 2286.0 | 1668.0 | -27.0% | 1.37x (?) |
| PopFrontArrayGeneric | 3160.0 | 2406.667 | -23.8% | 1.31x (?) |
| EqualSubstringSubstringGenericEquatable | 32.88 | 25.935 | -21.1% | 1.27x (?) |
| LessSubstringSubstring | 34.966 | 27.719 | -20.7% | 1.26x (?) |
| UTF8Decode_InitFromData | 167.9 | 133.727 | -20.4% | 1.26x (?) |
| LessSubstringSubstringGenericComparable | 32.548 | 26.0 | -20.1% | 1.25x (?) |
| UTF8Decode_InitFromBytes | 172.667 | 138.333 | -19.9% | 1.25x (?) |
| EqualSubstringString | 34.417 | 27.687 | -19.6% | 1.24x (?) |
| EqualSubstringSubstring | 34.25 | 27.667 | -19.2% | 1.24x |
| EqualStringSubstring | 34.182 | 27.857 | -18.5% | 1.23x (?) |
| DataCreateMedium | 160600.0 | 137200.0 | -14.6% | 1.17x (?) |
| DataCreateSmall | 21700.0 | 19030.0 | -12.3% | 1.14x (?) |
| Breadcrumbs.MutatedIdxToUTF16.Mixed | 228.0 | 208.182 | -8.7% | 1.10x (?) |
@swift-ci please benchmark
@swift-ci please benchmark
@swift-ci please benchmark Apple Silicon
@swift-ci Please Apple Silicon benchmark
The AS results are less noisy, but also highlight a different regression (???)
Performance (arm64): -O
| Regression | OLD | NEW | DELTA | RATIO |
|---|---|---|---|---|
| RemoveWhereQuadraticString | 167.0 | 210.583 | +26.1% | 0.79x (?) |
| NSStringConversion.InlineBuffer.UTF8 | 469.667 | 505.5 | +7.6% | 0.93x (?) |
| Improvement | OLD | NEW | DELTA | RATIO |
| NaiveRRC.append.smallContiguousRepeated | 75.762 | 55.438 | -26.8% | 1.37x (?) |
| SIMDReduce.Int32x16.Initializer | 13.036 | 11.075 | -15.0% | 1.18x (?) |
| Set.isDisjoint.Seq.Empty.Box | 45.241 | 39.0 | -13.8% | 1.16x (?) |
| ObserverForwarderStruct | 203.846 | 186.154 | -8.7% | 1.10x (?) |
Code size: -O
| Regression | OLD | NEW | DELTA | RATIO |
|---|---|---|---|---|
| StringSplitting.o | 27959 | 28675 | +2.6% | 0.98x |
| Improvement | OLD | NEW | DELTA | RATIO |
| RangeAssignment.o | 2889 | 2197 | -24.0% | 1.31x |
| NaiveRangeReplaceableCollectionConformance.o | 8022 | 7190 | -10.4% | 1.12x |
| ArrayRemoveAll.o | 6060 | 5704 | -5.9% | 1.06x |
| IndexPathTest.o | 7978 | 7710 | -3.4% | 1.03x |
| RemoveWhere.o | 11280 | 10916 | -3.2% | 1.03x |
| MirrorTest.o | 8490 | 8310 | -2.1% | 1.02x |
| PopFrontGeneric.o | 2009 | 1973 | -1.8% | 1.02x |
Performance (arm64): -Osize
| Regression | OLD | NEW | DELTA | RATIO |
|---|---|---|---|---|
| SuffixSequenceLazy | 88.409 | 704.0 | +696.3% | 0.13x |
| SuffixAnySequence | 88.409 | 678.0 | +666.9% | 0.13x |
| SuffixSequence | 88.423 | 677.667 | +666.4% | 0.13x |
| NSStringConversion.InlineBuffer.UTF8 | 469.667 | 506.0 | +7.7% | 0.93x |
| Improvement | OLD | NEW | DELTA | RATIO |
| NaiveRRC.append.largeContiguous | 0.389 | 0.0 | -99.7% | 390.00x |
| NaiveRRC.append.smallContiguousRepeated | 77.25 | 57.0 | -26.2% | 1.36x |
| RangeAssignment | 155.727 | 143.643 | -7.8% | 1.08x (?) |
Code size: -Osize
| Regression | OLD | NEW | DELTA | RATIO |
|---|---|---|---|---|
| Suffix.o | 13712 | 17404 | +26.9% | 0.79x |
| StringSplitting.o | 24583 | 25027 | +1.8% | 0.98x |
| Improvement | OLD | NEW | DELTA | RATIO |
| RangeAssignment.o | 2617 | 2205 | -15.7% | 1.19x |
| ArrayRemoveAll.o | 6096 | 5708 | -6.4% | 1.07x |
| NaiveRangeReplaceableCollectionConformance.o | 7994 | 7494 | -6.3% | 1.07x |
| IndexPathTest.o | 6194 | 5902 | -4.7% | 1.05x |
| RemoveWhere.o | 10036 | 9656 | -3.8% | 1.04x |
| MirrorTest.o | 8130 | 7946 | -2.3% | 1.02x |
| PopFrontGeneric.o | 2065 | 2021 | -2.1% | 1.02x |
Performance (arm64): -Onone
| Regression | OLD | NEW | DELTA | RATIO |
|---|---|---|---|---|
| ArrayOfGenericPOD2 | 855.0 | 1170.0 | +36.8% | 0.73x (?) |
| Set.filter.Int100.24k | 62.575 | 69.714 | +11.4% | 0.90x (?) |
| Set.filter.Int100.20k | 52.596 | 58.488 | +11.2% | 0.90x |
| Set.filter.Int100.16k | 42.586 | 47.241 | +10.9% | 0.90x |
| Set.filter.Int100.28k | 75.03 | 83.1 | +10.8% | 0.90x (?) |
| StringWordBuilderReservingCapacity | 811.667 | 875.0 | +7.8% | 0.93x (?) |
| Improvement | OLD | NEW | DELTA | RATIO |
| NaiveRRC.append.largeContiguous | 53.167 | 0.278 | -99.5% | 190.57x |
| NaiveRRC.init.largeContiguous | 53.295 | 0.288 | -99.5% | 184.42x |
| RangeAssignment | 5628.0 | 2854.0 | -49.3% | 1.97x |
| NaiveRRC.append.smallContiguousRepeated | 1234.5 | 957.5 | -22.4% | 1.29x |
| PopFrontArrayGeneric | 2456.0 | 2061.25 | -16.1% | 1.19x (?) |
@swift-ci please benchmark
@swift-ci please apple silicon benchmark
[removed one of the three branches that could have potentially introduced a regression in the Suffix benchmarks - working on better understanding for optimizer behavior on the other two to potentially fix the regression]
@swift-ci please benchmark
@swift-ci please apple silicon benchmark
@swift-ci please benchmark
@swift-ci please apple silicon benchmark
@swift-ci please benchmark
@swift-ci please apple silicon benchmark
Test again, same results:
Performance (arm64): -O
| Regression | OLD | NEW | DELTA | RATIO |
|---|---|---|---|---|
| Set.isSuperset.Seq.Empty.Int | 34.303 | 40.55 | +18.2% | 0.85x |
| DataAppendBytesSmall | 121.667 | 134.143 | +10.3% | 0.91x (?) |
| Improvement | OLD | NEW | DELTA | RATIO |
| NaiveRRC.append.smallContiguousRepeated | 75.727 | 55.529 | -26.7% | 1.36x |
| ObserverForwarderStruct | 219.583 | 197.308 | -10.1% | 1.11x (?) |
| NSStringConversion.InlineBuffer.UTF8 | 500.5 | 464.333 | -7.2% | 1.08x (?) |
| StringInterpolationSmall | 533.889 | 496.923 | -6.9% | 1.07x (?) |
Code size: -O
| Regression | OLD | NEW | DELTA | RATIO |
|---|---|---|---|---|
| NaiveRangeReplaceableCollectionConformance.o | 8090 | 9326 | +15.3% | 0.87x |
| Improvement | OLD | NEW | DELTA | RATIO |
| RangeAssignment.o | 2885 | 2165 | -25.0% | 1.33x |
| ArrayRemoveAll.o | 6136 | 5792 | -5.6% | 1.06x |
| IndexPathTest.o | 7974 | 7686 | -3.6% | 1.04x |
| RemoveWhere.o | 11152 | 10760 | -3.5% | 1.04x |
| MirrorTest.o | 8546 | 8330 | -2.5% | 1.03x |
| PopFrontGeneric.o | 1997 | 1961 | -1.8% | 1.02x |
Performance (arm64): -Osize
| Regression | OLD | NEW | DELTA | RATIO |
|---|---|---|---|---|
| SuffixSequenceLazy | 87.773 | 703.333 | +701.3% | 0.12x |
| SuffixAnySequence | 87.714 | 677.0 | +671.8% | 0.13x |
| SuffixSequence | 87.773 | 677.0 | +671.3% | 0.13x |
| ArrayInClass | 156.034 | 187.239 | +20.0% | 0.83x |
| DistinctClassFieldAccesses | 35.928 | 42.169 | +17.4% | 0.85x |
| ArraySetElement | 218.545 | 249.75 | +14.3% | 0.88x |
| Array2D | 5586.286 | 6237.333 | +11.7% | 0.90x |
| DropLastAnySeqCntRange | 278.714 | 310.714 | +11.5% | 0.90x (?) |
| DropLastAnySeqCRangeIter | 278.625 | 310.571 | +11.5% | 0.90x |
| DataAppendBytesSmall | 134.143 | 146.636 | +9.3% | 0.91x (?) |
| PrefixWhileSequence | 214.9 | 233.444 | +8.6% | 0.92x (?) |
| PrefixWhileAnySequence | 214.9 | 233.444 | +8.6% | 0.92x |
| DictionaryBridgeToObjC_Bridge | 6.489 | 6.988 | +7.7% | 0.93x (?) |
| Improvement | OLD | NEW | DELTA | RATIO |
| NaiveRRC.append.smallContiguousRepeated | 77.278 | 58.5 | -24.3% | 1.32x (?) |
| RangeAssignment | 164.667 | 143.538 | -12.8% | 1.15x |
| DataAppendArray | 2256.41 | 2052.941 | -9.0% | 1.10x (?) |
| NSStringConversion.InlineBuffer.UTF8 | 500.667 | 465.333 | -7.1% | 1.08x (?) |
| StringInterpolationSmall | 549.565 | 512.0 | -6.8% | 1.07x (?) |
| FindString.Rec3.Array | 95.192 | 88.9 | -6.6% | 1.07x (?) |
Code size: -Osize
| Regression | OLD | NEW | DELTA | RATIO |
|---|---|---|---|---|
| Suffix.o | 13684 | 17352 | +26.8% | 0.79x |
| StringSplitting.o | 24283 | 24535 | +1.0% | 0.99x |
| Improvement | OLD | NEW | DELTA | RATIO |
| RangeAssignment.o | 2617 | 2197 | -16.0% | 1.19x |
| NaiveRangeReplaceableCollectionConformance.o | 7970 | 7282 | -8.6% | 1.09x |
| ArrayRemoveAll.o | 6152 | 5784 | -6.0% | 1.06x |
| IndexPathTest.o | 6186 | 5886 | -4.8% | 1.05x |
| RemoveWhere.o | 9976 | 9544 | -4.3% | 1.05x |
| MirrorTest.o | 8214 | 8006 | -2.5% | 1.03x |
| PopFrontGeneric.o | 2053 | 2009 | -2.1% | 1.02x |
Performance (arm64): -Onone
| Regression | OLD | NEW | DELTA | RATIO |
|---|---|---|---|---|
| ArrayAppendGenericStructs | 706.154 | 872.727 | +23.6% | 0.81x (?) |
| PopFrontArrayGeneric | 2461.25 | 2994.545 | +21.7% | 0.82x |
| DataCreateMedium | 88400.0 | 101300.0 | +14.6% | 0.87x |
| RandomDoubleLCG | 15110.0 | 17028.0 | +12.7% | 0.89x (?) |
| DataCreateSmall | 12215.0 | 13700.0 | +12.2% | 0.89x (?) |
| Set.filter.Int100.24k | 62.575 | 69.714 | +11.4% | 0.90x (?) |
| Set.filter.Int100.20k | 52.587 | 58.469 | +11.2% | 0.90x (?) |
| Set.filter.Int100.16k | 42.604 | 47.24 | +10.9% | 0.90x |
| Set.filter.Int100.28k | 75.03 | 83.067 | +10.7% | 0.90x |
| TypeName | 830.0 | 897.5 | +8.1% | 0.92x (?) |
| StringWordBuilderReservingCapacity | 812.5 | 874.286 | +7.6% | 0.93x (?) |
| Improvement | OLD | NEW | DELTA | RATIO |
| NaiveRRC.init.largeContiguous | 53.417 | 0.296 | -99.4% | 179.86x |
| NaiveRRC.append.largeContiguous | 53.467 | 0.3 | -99.4% | 177.63x |
| RangeAssignment | 5662.0 | 2852.0 | -49.6% | 1.99x |
| NaiveRRC.append.smallContiguousRepeated | 1224.5 | 974.0 | -20.5% | 1.26x (?) |
| RawBuffer.1000.findLast | 66422.0 | 55569.0 | -16.3% | 1.20x |
| RawBuffer.128.findLast | 9080.0 | 7733.0 | -14.8% | 1.17x (?) |
| RawBuffer.39.findLast | 3307.0 | 2924.0 | -11.6% | 1.13x (?) |
| ObjectiveCBridgeStubToNSString | 1220.0 | 1127.143 | -7.6% | 1.08x (?) |
Code size: -swiftlibs
How to read the data
The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.
Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).
Hardware Overview
Model Name: Mac mini
Model Identifier: Macmini9,1
Total Number of Cores: 8 (4 performance and 4 efficiency)
Memory: 16 GB
@swift-ci please test
Marking as ready for review.
The performance regressions for SuffixSequence and friends are a result of an unfortunate optimizer decision early in the SIL phase to not inline Sequence.suffix because some parts of the callgraph in the future are shared now - since this only occurs in -Osize and is likely due to how the Suffix benchmark module is compiled (and, as far as I can tell, unlikely to affect real programs), I think this should be okay to merge performance-wise.
@swift-ci please test
@swift-ci please test
@swift-ci please test
@swift-ci Please clean test Linux platform
I haven't been able to reproduce the test failures locally on a Ubuntu 20.04 Intel container that ran the same buildbot_linux preset. I'm not sure what the source of the failure is, since it appears to crash in libc in the stdlibUnittest code, but it passes locally.
It doesn't appear to be flakiness, since multiple runs on multiple platforms had the test pass locally but the test still fails in CI.
@swift-ci please test
I think something in the past ~half year fixed it - I rebased onto main and could not reproduce the failures with buildbot_linux anymore! Hopefully the same is true in CI...
@swift-ci please test
EDIT: I finally know what tests are failing in release mode and why:
NoBoundsCheck/EvilShrinkage/*:
- expected/old release behavior: just keep forming indexes and reading past reported endIndex
- old DebugAssert behavior: _debugPrecondition to abort on collection shrinkage
- new behavior: stop at endIndex, see that buffer is underfull, always abort
BoundsChecked/EvilGrowth/*:
- expected/old release behavior: stop reading at first reported count, ignore growth
- old DebugAssert behavior: _debugPrecondition to abort on collection growth
- new behavior: continues reading past the old count, always aborts because the hole we created in the array is not large enough (abort generated by UMBP.initialize(fromContentsOf:))