smack icon indicating copy to clipboard operation
smack copied to clipboard

Vector operations

Open michael-emmi opened this issue 8 years ago • 3 comments

Here’s a fun piece of code that we cannot handle at all.

#include <immintrin.h>
#include <smack.h>

int main() {
  __m128i A = _mm_set_epi32(13,12,11,10);
  __m128i B = _mm_set_epi32(23,22,21,20);

  A = _mm_shuffle_epi32(A,2*1 + 3*4 + 2*16 + 3*64);
  B = _mm_shuffle_epi32(B,2*1 + 3*4 + 2*16 + 3*64);

  __m128i C = _mm_blend_epi16(A,B,0xf0);

  assert(C[0] != 0);
  return 0;
}

I am able to compile this as follows:

clang -S -emit-llvm a.c -I/usr/local/share/smack/include -mpclmul -mmmx -msse -msse2 -msse3 -msse4.1 -msse4.2 -maes
llvm2bpl a.ll -ll b.ll -entry-points=main

Note that besides the fact that we don’t handle vector instructions well — especially shufflevector — it seems that we also do not handle regular (e.g., arithmetic) operations on vector types correctly.

michael-emmi avatar Apr 28 '16 02:04 michael-emmi

I'm very interested in working on this, for using ct-verif with some vectorized Rust cipher implementations. However, I'll definitely need some pointers on where this sort of change would need to happen.

emberian avatar Apr 06 '17 17:04 emberian

I can imagine this being implemented in a few ways, with varying degrees of possibly-premature optimization. The first would be to encode LLVM vectors with Boogie maps. For instance,

// arbitrarily-sized vectors of i16 elements
type v.i16 = [int] i16;

// some vector operations
function $add.v4.i16(x: v.i16, y: v.i16) returns (v.i16);
function $insertelement.v4.i16(x: v.i16, val: i16, idx: int) returns (v.i16);
function $extractelement.v4.i16(x: v.i16, idx: int) returns (i16);
function $shufflevector.v4.i16.m6(x: v.i16, y: v.i16, …);

// some vector-type variables and operations
var x, y, z: v.16;
y := $insertelement.v4.i16(x, 13, 1);
z := $add.v4.i16(x, y);
assert $extractelement.v4.i16(z, 1) == 20;

where, e.g., the .m6 suffix fixes that shuffle operation to masks of size 6.

The second would be to avoid using maps; for instance,

// some vector operations
procedure $add.v4.i16(x.1: i16, x.2: i16, …, y.4: i16) returns (z.1: i16, …, z.4: i16);
procedure $insertelement.v4.i16(x.1: i16, …, x.4: i16, val: i16, idx: int) returns (y.1: i16, …, y.4: i16);
procedure $extractelement.i16(x.1: i16, …, x.4: i16, idx: int) returns (val: i16);
procedure $shufflevector.v4.i16.m6(x.1: i16, …, y.4: i16, …) returns (z.1: i16, …, z.6: i16);

// some vector-type variables and operations
var w: i16;
var x.1, x.2, x.3, x.4: i16;
var y.1, y.2, y.3, y.4: i16;
var z.1, z.2, z.3, z.4: i16;
call y.1, y.2, y.3, y.4 := $insertelement.v4.i16(x.1, x.2, x.3, x.4, 13, 1);
call z.1, z.2, z.3, z.4 := $add.v4.i16(x.1, x.2, x.3, x.4, y.1, y.2, y.3, y.4);
call w := $extractelement.v4.i16(z.1, z.2, z.3, z.4, 1);
assert w == 20;

where here we must use procedures because Boogie functions cannot return multiple values. Besides that quirk, this is essentially just an expansion of the map-based encoding.

The third way I can imagine would encode vectors with opaque types,

// length 4 vectors of i16 elements
type v4.i16;
var alloc.v4.i16: [v4.i16] bool;
function $sel.v4.i16(x: v4.i16, idx: int) returns (i16);
procedure $alloc.v4.i16() returns (x: v4.i16);
modifies alloc;
ensures !old(alloc.v4.i16(x));
ensures alloc.v4.i16(x);

// some vector operations
procedure $add.v4.i16(x: v4.i16, y: v4.i16) returns (z: v4.i16);
procedure $insertelement.v4.i16(x: v4.i16, val: i16, idx: int) returns (y: v4.i16);
procedure $extractelement.v4.i16(x: v4.i16, idx: int) returns (val: i16);
procedure $shufflevector.v4.i16.m6(x: v4.i16, y: v4.i16, …);

// some vector-type variables and operations
var w: i16;
var x, y, z: v4.16;
call y := $insertelement.v4.i16(x, 13, 1);
call z := $add.v4.i16(x, y);
call w := $extractelement.v4.i16(z, 1);
assert w== 20;

and the operations would allocate unique vectors and constrain them using $sel, e.g.,

procedure $add.v4.i16(x: v4.i16, y: v4.i16) returns (z: v4.i16) {
    z := alloc.v4.i16();
    assume $sel.v4.i16(z, 0) == $sel.v4.i16(x, 0) + $sel.v4.i16(y, 0);
    assume $sel.v4.i16(z, 1) == $sel.v4.i16(x, 1) + $sel.v4.i16(y, 1);
    assume $sel.v4.i16(z, 2) == $sel.v4.i16(x, 2) + $sel.v4.i16(y, 2);
    assume $sel.v4.i16(z, 3) == $sel.v4.i16(x, 3) + $sel.v4.i16(y, 3);
}

On the one hand, my guess is that the third option would be the most inefficient, and that the second option would be the most efficient. On the other hand, the first option seems to be the most elegant and easy to deal with. I am not really sure how to decide whether the first or second is best, since it is hard to predict the performance difference.

Does anybody have any strong reason to favor one over the others?

michael-emmi avatar Apr 25 '17 02:04 michael-emmi

By the way, since the mask argument to shufflevector is a constant vector, we can optimize our encoding by creating a distinct $shufflevector function for each distinct mask. For instance, the following function shuffles with the mask <1,5,3>:

function $shufflevector.v4.i16.m1.m5.m3(x: v.i16, y: v.i16) returns (v.i16) {
  empty[0 := x[1]][1 := y[1]][2 := x[3]]
}
const empty: v.i16;

i.e., taking the second, sixth, and fourth elements of x and y concatenated.

michael-emmi avatar Apr 25 '17 02:04 michael-emmi