Results 1 comments of Sakari N

The loop copies 8 x 16 byte blocks each iteration so incrementing source and destination pointers by 128 bytes on each iteration is correct.