dynasm-rs icon indicating copy to clipboard operation
dynasm-rs copied to clipboard

Feature request: compile-time resolution of "super-local" label

Open mkeeter opened this issue 3 years ago • 3 comments

I noticed that hashmap lookups are taking a decent amount of JIT time when using local labels.

For example, if I manually compute jumps in this code:

dynasm!(ops
    // Basically the same as MinRegReg
    ; zip2 v4.s2, V(lhs_reg).s2, V(rhs_reg).s2
    ; zip1 v5.s2, V(rhs_reg).s2, V(lhs_reg).s2
    ; fcmgt v5.s2, v5.s2, v4.s2
    ; fmov x15, d5

    ; tst x15, #0x1_0000_0000
    ; b.ne >lhs

    ; tst x15, #0x1
    ; b.eq >both

    // LHS < RHS
    ; fmov D(out_reg), D(rhs_reg)
    ; mov w16, #CHOICE_RIGHT
    ; b >end

    // RHS < LHS
    ;lhs:
    ; fmov D(out_reg), D(lhs_reg)
    ; mov w16, #CHOICE_LEFT
    ; b >end

    ;both:
    ; fmax V(out_reg).s2, V(lhs_reg).s2, V(rhs_reg).s2
    ; mov w16, #CHOICE_BOTH

    ;end:
    ; strb w16, [x0], #1 // post-increment
)

I end up with something like this:

dynasm!(ops
    ; zip2 v4.s2, V(lhs_reg).s2, V(rhs_reg).s2
    ; zip1 v5.s2, V(rhs_reg).s2, V(lhs_reg).s2
    ; fcmgt v5.s2, v5.s2, v4.s2
    ; fmov x15, d5

    ; tst x15, #0x1_0000_0000
    ; b.ne #24 // -> lhs

    ; tst x15, #0x1
    ; b.eq #28 // -> both

    // LHS < RHS
    ; fmov D(out_reg), D(rhs_reg)
    ; mov w16, #CHOICE_RIGHT
    ; b #24 // -> end

    // <- lhs (when RHS < LHS)
    ; fmov D(out_reg), D(lhs_reg)
    ; mov w16, #CHOICE_LEFT
    ; b #12 // -> end

    // <- both
    ; fmax V(out_reg).s2, V(lhs_reg).s2, V(rhs_reg).s2
    ; mov w16, #CHOICE_BOTH

    // <- end
    ; strb w16, [x0], #1 // post-increment
)

In my codebase, this reduces the time spent in dynasm by about 30%, which is a decent chunk of performance!

It would be great to introduce a new flavor of label which is only valid during a single dynasm! block; the branch offset could then be computed at compile-time instead of runtime.

mkeeter avatar Sep 05 '22 00:09 mkeeter

Yes. Keep in mind that the original DynASM project for C (used as backend in LuaJIT 1.x, used as frontend in ) does not use a hash map for labels. It is a plain array.

I know this is hard given the constraints of rust proc-macros, but Ideally we would move in that direction......

Techcable avatar Sep 05 '22 01:09 Techcable

It wouldn't necessarily have to be a new type of label, it'd be possible to guarantee that local labels in a single block always get determined at compile time. It'll be quite annoying to implement though.

But before we have a try at that, The default LabelRegistry just uses the standard cryptographically secure hasher in the label hashMaps. You could try benchmarking it with FnvHasher instead.

CensoredUsername avatar Sep 05 '22 11:09 CensoredUsername

Idea: it wouldn't be hard to just special case strings of length 1 to just be array lookups instead. Make those bypass the internal hashmaps, if recent changes do not alleviate the bottleneck enough.

CensoredUsername avatar Oct 10 '22 20:10 CensoredUsername

As discussed in the related pull request, you can use dynamic labels to skip the overhead from hashmap lookups if you want to know the theoretical speedup due to that. That would be useful knowledge to have before proceeding on working on this.

CensoredUsername avatar Jan 31 '23 16:01 CensoredUsername

Closing this due to inactivity after requested information.

CensoredUsername avatar Mar 13 '24 16:03 CensoredUsername