swift icon indicating copy to clipboard operation
swift copied to clipboard

[SR-14516] Swift generates subscript.read accessors that allocate!

Open weissi opened this issue 4 years ago • 3 comments

Previous ID SR-14516
Radar rdar://problem/77008933
Original Reporter @weissi
Type Bug

Attachment: Download

Environment

Apple Swift version 5.4 (swiftlang-1205.0.22.2 clang-1205.0.19.29)

Additional Detail from JIRA
Votes 0
Component/s Compiler
Labels Bug, 5.4Regression, Optimizer
Assignee None
Priority Medium

md5: 06a0e70a5391143833033eeeb67e35f7

Issue Description:

The following (sorry, this used to be 1700 lines, and creduce reduced this) program allocates 10,000 times:

extension CircularBuffer {
    @inline(never)
    func F() {
        _ = first!     // << this allocates as it calls generic specialization <repro.NIOAny> of repro.CircularBuffer.subscript.read : (repro.CircularBuffer<A>.Index) -> A
    }
}
protocol p {
    init()
}
struct CircularBuffer<d: p>{
    var e: ContiguousArray<d?> = [.init()]
    struct Index: Comparable {
        var f: Int = 0
        var i: Int {
            return f
        }
        static func == (a: Index, j: Index) -> Bool {
                return a.i > j.i
        }
        static func < (aa: Index, j: Index) -> Bool {
                return aa.i < j.i
        }
    }
    func index(after: Index) -> Index {
        return index(after, offsetBy: 1)
    }
    subscript(k: Index) -> d {
        get {
            return e[k.i]!
        }
    }
    var startIndex = Index()
    var endIndex = Index()
}
extension CircularBuffer: Collection {}

enum XY {
    case X
    case Y
}

struct NIOAny: p {
    let av: aw = .ay(())
    enum aw {
        case ax(XY)
        case ay(Any)
        init<az>(a: az) {
            self = .ay(a)
        }
    }
}

let b = CircularBuffer<NIOAny>()
for _ in 0..<10000 {
    b.F()
}

here's how it goes:

  • b.F calls b.first (which is provided automatically from Collection)

  • b.first calls the subscript's read accessor which has an unconditional malloc (for coroutines I think) in there 🙁

                       _$s5repro14CircularBufferVyxAC5IndexVyx_GcirAA6NIOAnyV_Tg5:        // generic specialization <repro.NIOAny> of repro.CircularBuffer.subscript.read : (repro.CircularBuffer<A>.Index) -> A
0x0000000100002110         push       rbp                                       ; CODE XREF=_$s5repro14CircularBufferV1FyyFAA6NIOAnyV_Tg5+34
0x0000000100002111         mov        rbp, rsp
0x0000000100002114         push       r15
0x0000000100002116         push       r14
0x0000000100002118         push       r12
0x000000010000211a         push       rbx
0x000000010000211b         sub        rsp, 0x30
0x000000010000211f         mov        r14, rdx
0x0000000100002122         mov        r15, rsi
0x0000000100002125         mov        r12, rdi
0x0000000100002128         mov        edi, 0x21                                 ; argument "size" for method imp___stubs__malloc
0x000000010000212d         call       imp___stubs__malloc                       ; malloc

the read accessors really shouldn't allocate as that defeats many other optimisations.

Full program attached.

Repro:

$ swiftc -O ~/tmp/repro.swift && sudo ~/devel/swift-nio/dev/malloc-aggregation.d -c ./repro
dtrace: system integrity protection is on, some features will not be available


=====
This will collect stack shots of allocations and print it when you exit dtrace.
So go ahead, run your tests and then press Ctrl+C in this window to see the aggregated result
=====
[...]

              libsystem_malloc.dylib`malloc
              repro`specialized CircularBuffer.subscript.read+0x22
              repro`specialized CircularBuffer.F()+0x27
              repro`main+0x7c
              libdyld.dylib`start+0x1
              0x1
            10000

See how we get 10,000 allocations (malloc) through the subscript.read by calling F 10,000 times?

This seems to be a Swift 5.4 regression. We found this issue in the NIO CI which regresses allocations in a few allocation counter tests but only in 5.4.

Also affects

Swift version 5.4-dev (LLVM 7a20f40c45aca5d, Swift 031b848b7092c06)

In case you're into creduce, this is the interestingness test I used (for Linux)

#!/bin/bash

set -eu

swiftc -O repro.swift
objdump -d repro |  grep -A14 '14CircularBufferVyxAC5IndexVyx_GcirAA6NIOAnyV_Tg5>:' | grep -q malloc

weissi avatar Apr 21 '21 21:04 weissi

CC @eeckstein & @rjmccall because this seems to be coro related.

weissi avatar Apr 21 '21 21:04 weissi

@swift-ci create

weissi avatar Apr 22 '21 10:04 weissi

2024 edition:

cat repro.swift | swiftc -module-name T -O -emit-assembly - | grep -A1000 '^_$s1T14CircularBufferVyxAC5IndexVyx_Gcir:' | sed -e 1d -e '/^_/q' | grep malloc
	bl	_malloc

still calls malloc on today's Swift 6 preview.

weissi avatar May 03 '24 16:05 weissi

CC: @nate-chandler

tbkka avatar May 06 '24 17:05 tbkka