heapless
heapless copied to clipboard
SingleCore mode of spsc::Queue lost
During the const generics port, the SingleCore mode of the queue has been removed, but this is not mentioned in the changelog (like was done for the support for different index sizes).
Is it possible to reintroduce that mode? No need to generate all those memory barrier instructions if they are unneeded.
Hi, this was chosen to make deriving correctness of the implementation easier.
The issue with the old SingleCore mode is that it's not possible to auto detect, and even so, not quite sure it's correct on all Cortex-M MCUs.
Re-adding it from a coding standpoint is not an issue, showing correctness for all the targets is.
Would you feel up to the task to try and show correctness for the SingleCore impl and make a small write-up about it with motivation?
Just out of curiosity, how much performance did you loose?
In my tests it only impacts 3 clock cycles (Cortex-M3) on enqueue for example, so I thought no-one would miss the feature.
Would you feel up to the task to try and show correctness for the SingleCore impl and make a small write-up about it with motivation?
Hehe, that's out of my league, unfortunately.
I haven't measured speed as much as saw a 0.3k code size increase after upgrading to 0.7 and could trace that back to an (apparently unrolled) loop in a dequeue operation, where one of the additional instructions was a dmb sy.
If you're interested I can post the disassembly of the old/new comparison.
Hehe I understand :) Please do, is like to have a look. I'd only expect a few bytes difference, not 0 .3k. Unless the compiler is not quite as good at optimizing const generics code yet..
Sorry for taking a while. The biggest single increase is in a loop
for &byte in some_slice {
queue.enqueue_unchecked(byte);
}
It appears to be unrolled by the compiler to work in increments of 16. (Current nightly, lto = true, release with no specific opt-level set.)
The size of the loop is 412 bytes in 0.7 vs 194 bytes in 0.6.
heapless 0.7:
8004644: f810 3f10 ldrb.w r3, [r0, #16]!
8004648: f10b 0208 add.w r2, fp, #8
800464c: f8db 6004 ldr.w r6, [fp, #4]
8004650: 5593 strb r3, [r2, r6]
8004652: 1c73 adds r3, r6, #1
[repeated block starts here]
8004654: f3bf 8f5f dmb sy
8004658: f36f 239f bfc r3, #10, #22
800465c: f8cb 3004 str.w r3, [fp, #4]
8004660: 7843 ldrb r3, [r0, #1]
8004662: f8db 6004 ldr.w r6, [fp, #4]
8004666: 5593 strb r3, [r2, r6]
8004668: 1c73 adds r3, r6, #1
[15 more copies of repeated block]
800479e: f3bf 8f5f dmb sy
80047a2: f36f 229f bfc r2, #10, #22
80047a6: f8cb 2004 str.w r2, [fp, #4]
80047aa: f100 0210 add.w r2, r0, #16
80047ae: 428a cmp r2, r1
80047b0: f47f af48 bne.w 8004644 <_ZN16display_firmware18__cortex_m_rt_main17hfb688fea963e9033E+0x1358>
heapless 0.6 (sorry for the long snippet, but there doesn't seem to be a literally repeated block since the compiler is happily using lots of registers):
80042d8: 4606 mov r6, r0
80042da: f10b 0304 add.w r3, fp, #4
80042de: f36f 269f bfc r6, #10, #22
80042e2: f811 5f10 ldrb.w r5, [r1, #16]!
80042e6: 559d strb r5, [r3, r6]
80042e8: 1c46 adds r6, r0, #1
80042ea: f36f 269f bfc r6, #10, #22
80042ee: 784d ldrb r5, [r1, #1]
80042f0: 788c ldrb r4, [r1, #2]
80042f2: 559d strb r5, [r3, r6]
80042f4: 1c85 adds r5, r0, #2
80042f6: f100 0609 add.w r6, r0, #9
80042fa: f36f 259f bfc r5, #10, #22
80042fe: 78ca ldrb r2, [r1, #3]
8004300: f891 e004 ldrb.w lr, [r1, #4]
8004304: f36f 269f bfc r6, #10, #22
8004308: 555c strb r4, [r3, r5]
800430a: 1cc4 adds r4, r0, #3
800430c: f36f 249f bfc r4, #10, #22
8004310: 794d ldrb r5, [r1, #5]
8004312: f891 8009 ldrb.w r8, [r1, #9]
8004316: 551a strb r2, [r3, r4]
8004318: 1d02 adds r2, r0, #4
800431a: f36f 229f bfc r2, #10, #22
800431e: 7a0c ldrb r4, [r1, #8]
8004320: f803 e002 strb.w lr, [r3, r2]
8004324: 1d42 adds r2, r0, #5
8004326: f36f 229f bfc r2, #10, #22
800432a: 549d strb r5, [r3, r2]
800432c: 1d82 adds r2, r0, #6
800432e: f36f 229f bfc r2, #10, #22
8004332: 798d ldrb r5, [r1, #6]
8004334: 549d strb r5, [r3, r2]
8004336: 1dc2 adds r2, r0, #7
8004338: f36f 229f bfc r2, #10, #22
800433c: 79cd ldrb r5, [r1, #7]
800433e: 549d strb r5, [r3, r2]
8004340: f100 0208 add.w r2, r0, #8
8004344: f36f 229f bfc r2, #10, #22
8004348: 549c strb r4, [r3, r2]
800434a: f100 020a add.w r2, r0, #10
800434e: f36f 229f bfc r2, #10, #22
8004352: f803 8006 strb.w r8, [r3, r6]
8004356: 7a8e ldrb r6, [r1, #10]
8004358: 549e strb r6, [r3, r2]
800435a: f100 020b add.w r2, r0, #11
800435e: f36f 229f bfc r2, #10, #22
8004362: 7ace ldrb r6, [r1, #11]
8004364: 549e strb r6, [r3, r2]
8004366: f100 020c add.w r2, r0, #12
800436a: f36f 229f bfc r2, #10, #22
800436e: 7b0e ldrb r6, [r1, #12]
8004370: 549e strb r6, [r3, r2]
8004372: f100 020d add.w r2, r0, #13
8004376: f36f 229f bfc r2, #10, #22
800437a: 7b4e ldrb r6, [r1, #13]
800437c: 549e strb r6, [r3, r2]
800437e: f100 020e add.w r2, r0, #14
8004382: f36f 229f bfc r2, #10, #22
8004386: 7b8e ldrb r6, [r1, #14]
8004388: 549e strb r6, [r3, r2]
800438a: f100 020f add.w r2, r0, #15
800438e: 3010 adds r0, #16
8004390: f36f 229f bfc r2, #10, #22
8004394: 7bce ldrb r6, [r1, #15]
8004396: 549e strb r6, [r3, r2]
8004398: f101 0210 add.w r2, r1, #16
800439c: 4562 cmp r2, ip
800439e: d19b bne.n 80042d8 <_ZN16display_firmware18__cortex_m_rt_main17hdabc9134f3bbe990E+0x104c>