heapless icon indicating copy to clipboard operation
heapless copied to clipboard

Implement Copy for Vec when T implements Copy

Open qiuchengxuan opened this issue 4 years ago • 9 comments

Since user should be aware Vec are based on array, implement Copy would be convienent

qiuchengxuan avatar Aug 22 '21 05:08 qiuchengxuan

Emm seems not that easy than I expected

qiuchengxuan avatar Aug 22 '21 05:08 qiuchengxuan

I would very much appreciate this feature. This would enable heapless::String to be Copy, which would be super convenient.

sffc avatar Dec 21 '21 01:12 sffc

This would be super convenient in many cases indeed. If only we could impl<T: !Copy, const N: usize> Drop for Vec<T, N> or impl<T: Drop, const N: usize> Drop for Vec<T, N>.

I guess it would be maintenance burden to have a CopyVec that bounds on T: Copy?

nickray avatar Feb 20 '22 22:02 nickray

Perhaps you guys could make some try to implement Copy for Vec?

qiuchengxuan avatar Feb 21 '22 13:02 qiuchengxuan

Can't implement Copy if Vec should allow drop. https://stackoverflow.com/questions/51704063/why-does-rust-not-allow-the-copy-and-drop-traits-on-one-type You can't conditionally implement Drop either so the choice is to implement Copy conditionally on copy types or respect Drop. Respecting Drop is a reasonable choice, as behind the scenes the item is wrapped in a ManuallyDrop, less risk of unexpected stuff happening to an unsuspecting developer.

That leaves either Generics or macros, Generics are breaking while macros aren't. I threw together a split implementation and implemented Copy on String here https://github.com/MarcusGrass/heapless/tree/copy_vec

Imo both generics and macros are pretty horrible in general but it'd be a really nice feature to have.

MarcusGrass avatar Mar 23 '22 18:03 MarcusGrass

Copy-ing a Vec<T> is more expensive than Clone-ing a Vec<T> except in the case where the Vec> is full.

The reason is that a Copy will copy all the uninitialized data (O(capacity)) in the stack buffer from the parent to the child. OTOH, a Clone will only copy the initialized data (O(len)) from the parent to the child. This can be seen in the following machine code.

#[no_mangle]
fn _start() {
    unsafe {
        (&(clone as usize) as *const usize).read_volatile();
        (&(copy as usize) as *const usize).read_volatile();
    }
}

#[no_mangle]
fn clone(vec: &Vec<u8, 1024>) {
    let mut vec = vec.clone();
    black_box(&mut vec);
}

#[no_mangle]
fn copy(vec: &Vec<u8, 1024>) {
    let mut vec2 = *vec;
    black_box(&mut vec2);
}

fn black_box<T>(val: &mut T) {
    unsafe { asm!("// {0}", in(reg) val) }
}
00020100 <clone>:
   20100:              b5d0             push    {r4, r6, r7, lr}
   20102:              af02             add     r7, sp, #8
   20104:              f5ad 6d81        sub.w   sp, sp, #1032 ; reserve 1028 bytes of stack space
   20108:              2300             movs    r3, #0
   2010a:              c802             ldmia   r0!, {r1}
   2010c:              9301             str     r3, [sp, #4]
   2010e:              aa01             add     r2, sp, #4
   20110:       /--/-X b141             cbz     r1, 20124 <clone+0x24> ; for loop to copy r1 (self.len) bytes of data
   20112:       |  |   4413             add     r3, r2
   20114:       |  |   f810 4b01        ldrb.w  r4, [r0], #1
   20118:       |  |   3901             subs    r1, #1
   2011a:       |  |   711c             strb    r4, [r3, #4]
   2011c:       |  |   9b01             ldr     r3, [sp, #4]
   2011e:       |  |   3301             adds    r3, #1
   20120:       |  |   9301             str     r3, [sp, #4]
   20122:       |  \-- e7f5             b.n     20110 <clone+0x10>
   20124:       \----> a801             add     r0, sp, #4
   20126:              f50d 6d81        add.w   sp, sp, #1032
   2012a:              bdd0             pop     {r4, r6, r7, pc}

0002012c <copy>:
   2012c:       b5d0            push    {r4, r6, r7, lr}
   2012e:       af02            add     r7, sp, #8
   20130:       f5ad 6d81       sub.w   sp, sp, #1032
   20134:       ac01            add     r4, sp, #4
   20136:       4601            mov     r1, r0
   20138:       f240 4204       movw    r2, #1028
   2013c:       4620            mov     r0, r4
   2013e:       f000 f803       bl      20148 <__aeabi_memcpy4> ; will copy r2 (1028) bytes of data
   20142:       f50d 6d81       add.w   sp, sp, #1032
   20146:       bdd0            pop     {r4, r6, r7, pc}

Aside from the implementation complication, the fact that Copy is less performant than Clone makes me inclined towards not adding such trait implementation.


semi on-topic: given that we cannot use 'trait specialization' on stable there are some reasons for adding a CopyVec<T: Copy> (even if it doesn't implement Copy): as destructors don't matter we can change the source code implementation to get better optimizations. for instance, the clone avoid would be optimized to a memcpy if one used a CopyVec<u8, N>. having both CopyVec and Vec is not a great user experience and would be more maintenance work (cannot use macros / generics as some functions need different source code implement to be properly optimized) so I would prefer to just wait for 'trait specialization' to be stabilized.

japaric avatar May 10 '22 12:05 japaric

Copy-ing a Vec<T> is more expensive than Clone-ing a Vec<T> except in the case where the Vec> is full.

The reason is that a Copy will copy all the uninitialized data (O(capacity)) in the stack buffer from the parent to the child. OTOH, a Clone will only copy the initialized data (O(len)) from the parent to the child. This can be seen in the following machine code.

#[no_mangle]
fn _start() {
    unsafe {
        (&(clone as usize) as *const usize).read_volatile();
        (&(copy as usize) as *const usize).read_volatile();
    }
}

#[no_mangle]
fn clone(vec: &Vec<u8, 1024>) {
    let mut vec = vec.clone();
    black_box(&mut vec);
}

#[no_mangle]
fn copy(vec: &Vec<u8, 1024>) {
    let mut vec2 = *vec;
    black_box(&mut vec2);
}

fn black_box<T>(val: &mut T) {
    unsafe { asm!("// {0}", in(reg) val) }
}
00020100 <clone>:
  20100:              b5d0             push    {r4, r6, r7, lr}
  20102:              af02             add     r7, sp, #8
  20104:              f5ad 6d81        sub.w   sp, sp, #1032 ; reserve 1028 bytes of stack space
  20108:              2300             movs    r3, #0
  2010a:              c802             ldmia   r0!, {r1}
  2010c:              9301             str     r3, [sp, #4]
  2010e:              aa01             add     r2, sp, #4
  20110:       /--/-X b141             cbz     r1, 20124 <clone+0x24> ; for loop to copy r1 (self.len) bytes of data
  20112:       |  |   4413             add     r3, r2
  20114:       |  |   f810 4b01        ldrb.w  r4, [r0], #1
  20118:       |  |   3901             subs    r1, #1
  2011a:       |  |   711c             strb    r4, [r3, #4]
  2011c:       |  |   9b01             ldr     r3, [sp, #4]
  2011e:       |  |   3301             adds    r3, #1
  20120:       |  |   9301             str     r3, [sp, #4]
  20122:       |  \-- e7f5             b.n     20110 <clone+0x10>
  20124:       \----> a801             add     r0, sp, #4
  20126:              f50d 6d81        add.w   sp, sp, #1032
  2012a:              bdd0             pop     {r4, r6, r7, pc}

0002012c <copy>:
  2012c:       b5d0            push    {r4, r6, r7, lr}
  2012e:       af02            add     r7, sp, #8
  20130:       f5ad 6d81       sub.w   sp, sp, #1032
  20134:       ac01            add     r4, sp, #4
  20136:       4601            mov     r1, r0
  20138:       f240 4204       movw    r2, #1028
  2013c:       4620            mov     r0, r4
  2013e:       f000 f803       bl      20148 <__aeabi_memcpy4> ; will copy r2 (1028) bytes of data
  20142:       f50d 6d81       add.w   sp, sp, #1032
  20146:       bdd0            pop     {r4, r6, r7, pc}

Aside from the implementation complication, the fact that Copy is less performant than Clone makes me inclined towards not adding such trait implementation.

semi on-topic: given that we cannot use 'trait specialization' on stable there are some reasons for adding a CopyVec<T: Copy> (even if it doesn't implement Copy): as destructors don't matter we can change the source code implementation to get better optimizations. for instance, the clone avoid would be optimized to a memcpy if one used a CopyVec<u8, N>. having both CopyVec and Vec is not a great user experience and would be more maintenance work (cannot use macros / generics as some functions need different source code implement to be properly optimized) so I would prefer to just wait for 'trait specialization' to be stabilized.

I understand and agree with you, but in my situation which Vec are defined with small capacity in a relatively large struct, copy cost doesn't matter, instead target binary size matters.

And thanks for your reply

I think we can wait till an optimal solution exists

qiuchengxuan avatar May 10 '22 13:05 qiuchengxuan

thanks for the input I can see the perf/size trade-off and see how the perf hit may not matter much for small capacities and almost full Vecs.

I'd now be in favor of a Copy implementation provided that the trade-off and difference in perf vs Clone is documented. I would still like to wait for a way to implement this that does not require duplicating code or macros

japaric avatar May 10 '22 13:05 japaric

Copy for String would be very convenient.

robertbastian avatar Jan 28 '24 16:01 robertbastian