jasmin icon indicating copy to clipboard operation
jasmin copied to clipboard

checksafety - integration in libjade issues

Open tfaoliveira opened this issue 2 years ago • 20 comments

In this link, you can find the results of a libjade workflow run that runs all the examples through the -checksafety analysis. I explain the issues I found next, and then we can split these into different issues if appropriate.

Within the attached artifact, libjade-logs-src.tar.gz, there are some *.error files corresponding to executions that returned error:

libjade-logs-src$ find . -name "*.error" | wc -l 
53
libjade-logs-src$ grep -nr "Violation" | wc -l 
14
libjade-logs-src$ grep -nr "124" | wc -l 
9
libjade-logs-src$ grep -nr "Assertion failed" | wc -l 
30

In the current status of Libjade-main / Jasmin-main, all implementations fail to check:

  • 9 are timeouts (set to 30 minutes for this first try, which can be increased later);
  • 30 are exception File "src/safety/safetyAbsExpr.ml", line 1055;
  • 14 are possible safety violations.

Regarding the implementations that reach the timeout: 7 out of 9, include ChaCha/Salsa reference code. This might indicate that there could a pattern in the code (rounds loop maybe?) that doesn't allow the safety checker to conclude relatively quickly. Any suggestions on how to handle these/thoughts if they can be improved?

crypto_stream/salsa20/salsa20/amd64/ref/.ci/checksafety.error
crypto_stream/salsa20/salsa2012/amd64/ref/.ci/checksafety.error
crypto_stream/chacha/chacha20-ietf/amd64/ref/.ci/checksafety.error
crypto_stream/chacha/chacha12/amd64/ref/.ci/checksafety.error
crypto_stream/chacha/chacha20/amd64/ref/.ci/checksafety.error
crypto_stream/xsalsa20/amd64/ref/.ci/checksafety.error
crypto_secretbox/xsalsa20poly1305/amd64/ref/.ci/checksafety.error
crypto_hash/sha512/amd64/ref/.ci/checksafety.error
crypto_xof/shake256/amd64/spec/.ci/checksafety.error

Regarding src/safety/safetyAbsExpr.ml, line 1055: is this error related to something that can be changed in the code?


Regarding possible safety violations: I believe most, if not all, might be related to the following isolated example. For instance, the following program fails to check:

inline fn __keccak_init_ref() -> stack u64[25]
{
  stack u64[25] state;
  reg u64 i t;

  _,_,_,_,_, t = #set0();
  i = 0;
  while (i < 25)
  { state[(int)i] = t;
    i += 1;
  }
 
  return state;
}

inline fn __theta_sum_ref(stack u64[25] a) -> reg u64[5]
{
  inline int x y;
  reg u64[5] c;

  // C[x] = A[x, 0]
  for x=0 to 5
  { c[x] = a[x + 0]; }

  // C[x] ^= A[x,1] ^ A[x,2] ^ A[x,3] ^ A[x,4]
  for y=1 to 5
  { for x=0 to 5
    { c[x] ^= a[x + y*5]; }
  }

  return c;
}

export fn test(reg u64 rp)
{
  inline int i;
  stack u64[25] state;
  reg u64[5] c;

  state = __keccak_init_ref();
  c = __theta_sum_ref(state);
 
  for i=0 to 4
  { [rp + i*8] = c[i]; }
}

And if we update __keccak_init_ref to the following code, the safety checker goes through:

inline fn __keccak_init_ref() -> stack u64[25]
{
  stack u64[25] state;
  inline int i;
  reg u64 t;

  _,_,_,_,_, t = #set0();
  for i=0 to 25
  { state[i] = t; }
 
  return state;
}

I'm unsure if this is related to this issue.

tfaoliveira avatar Dec 06 '22 13:12 tfaoliveira

I'm unsure if this is related to this https://github.com/jasmin-lang/jasmin/issues/67. I think it’s not related. Here the problem is about proving that an array is properly initialized; there it’s about relations between loop indices to prove that an access is in bounds.

vbgl avatar Dec 06 '22 13:12 vbgl

Here are some details about sha512.

Given enough time (a bit more than 30mn on my laptop) the analysis terminates and rises two alarms, both at line 291 of sha512.jinc. The first one is about sblocks being initialized, the second about oblocks + t being in bounds.

Are these alarms expected? Let’s look at them in turn.

The sblocks array is initialized in function __lastblocks_ref. I encourage the readers to give a look to the function and to try to convince themselves that indeed, it properly initialized said array. First bytes 0 to inlen are set, in increasing order; then from inlen to j (for some j), again in increasing order; then from j to j+8, in decreasing order; finally from j+8 to 256. Even assuming that the general problem of initialization of arrays in while loops can be solved this function remains full of tricks.

In order to prove that oblocks + t is in bounds, we have to understand that the loop does not iterate too much. (Sorry, I can’t find a compelling argument that this is hard.)

What can we do? If quick and automated checking of program safety is a goal, we can work in two complementary directions: improve the safety checker, or rewrite the programs. For the initialization issue, it might be enough to zero-initialize the whole array at the beginning of the function with a for loop. I’d be curious to know whether this has a noticeable impact in practice.

The second issue can also be easily workaround by using a regular loop counter.

With these two small modifications to the jasmin code, safety checking goes through in about 20mn.

vbgl avatar Dec 07 '22 11:12 vbgl

Hi @vbgl, thanks for the corresponding PR. I just pushed the same modifications for sha256. CI should be green for that example during the afternoon.

I also increased the timeout to 90m, allowing to check crypto_xof/shake256/amd64/spec/.

Regarding improvements to the safety checker: do you think it could be reasonable to support annotations that would allow the safety checker to handle array initializations using while loops? Maybe a 'special' while loop? What concerns me is code size, especially for future ARM implementations.

Any idea what is causing exception File "src/safety/safetyAbsExpr.ml", line 1055?

tfaoliveira avatar Dec 09 '22 12:12 tfaoliveira

Another question, by running jasminc --help:

  -checksafety : automatically check for safety
  -safetyparam parameter for automatic safety verification:
    format: "f_1>param_1|f_2>param_2|..." where each param_i is of the form:
    pt_1,...,pt_n;len_1,...,len_k
    pt_1,...,pt_n: input pointers of f_i
    len_1,...,len_k: input lengths of f_i
  -safetyconfig [filename]: use filename (JSON) as configuration file for the safety checker
  -safetymakeconfigdoc [dir]: make the safety checker configuration docs in [dir]

Is there extended documentation/usage examples of -safetyparam, -safetyconfig, and -safetymakeconfigdoc? I would like to understand if I could take advantage of these for some examples (or all of them) and support them in libjade infrastruture;

Currently, I'm running all the examples with just -checksafety in this branch.

tfaoliveira avatar Dec 09 '22 12:12 tfaoliveira

The assertion failure is related to array copies (which should not be too hard to fix). See #308.

There is some explanation about -safetyparam in the wiki.

Of course you should use the -safetyparam option:

time …/jasminc -checksafety -noinsertarraycopy -safetyparam "jade_xof_shake256_amd64_spec>;" shake256_amd64_spec.jazz

terminates with “no safety violation” in about 6mn on my laptop (ok, the inferred precondition is a bit optimistic, but still a non-trivial result).

vbgl avatar Dec 09 '22 15:12 vbgl

With -safetyparam "jade_xof_shake256_amd64_spec>output,input;output_length,input_length", the result is more precise (and as fast).

vbgl avatar Dec 09 '22 15:12 vbgl

PR #312 should fix the assertion failures.

vbgl avatar Dec 12 '22 09:12 vbgl

With -safetyparam "jade_xof_shake256_amd64_spec>output,input;output_length,input_length", the result is more precise (and as fast).

Do you know if it is possible to specify the information from the cmd line argument jade_xof_shake256_amd64_spec>output,input;output_length,input_length within a JSON file to be used with -safetyconfig? This would simplify the integration: my current idea would be to produce template JSON files and preprocess them with the contents of */include/api.h (for fixed-length arguments).

tfaoliveira avatar Dec 14 '22 22:12 tfaoliveira

PR #312 should fix the assertion failures.

Thanks!! I updated the CI machine and I will give it a try

tfaoliveira avatar Dec 14 '22 22:12 tfaoliveira

Do you know if it is possible to specify the information from the cmd line argument […] within a JSON file to be used with -safetyconfig?

It is not possible: -safetyconfig is for something else.

But feel free to suggest the interface of your liking.

vbgl avatar Dec 15 '22 07:12 vbgl

If this is the case, I can integrate it as it currently is (using -safetyparam option and corresponding syntax), and then I will update the Makefiles/scripts when necessary. Meanwhile, could you point me to some (extended) documentation of the options related to -safetyconfig?

I will write some notes next about possible improvements.


I found the input_range field in the JSON documentation (produced with -safetymakeconfigdoc) hence the last question. It is documented as follows:

  "input_range": {
    "default": {},
    "description": {
      "descr": "Assumed ranges for inputs.",
      "example": {
        "var1": { "min": "0", "max": "100" },
        "var2": { "min": "200", "max": "300" }
      }
    }
  }

My current understanding is that this field can define the ranges of some input variables (for instance, the minimum and maximum length of an array -- I still don't know if it is possible to scope some variable's interval to a given function: in a small example that I wrote, definitions like "var1": { "min": "0", "max": "100" }, seem to be global and variables must have different names across different functions).

If my understanding is correct, it would also be nice to have something like input_pointers which would allow specifying the properties on -safetyparam.

Even better, it would be to separate the declaration of properties of a given program and the configuration itself:

  • -safetyconfig would do what it currently does minus the input_range field;
  • -safety_someothername_ would allow specifying properties about the program on a function-level basis.

I think this would have the advantage that we could (more quickly) search (automatically, maybe) for configurations that perform faster. Additionally, suppose we need to use input_range as it currently is. In that case, it might be more challenging to avoid using default configurations from the past (for instance, config files would need to be generated and then patched for each CI run).

tfaoliveira avatar Dec 15 '22 13:12 tfaoliveira

To the best of my knowledge, the -safetyconfig option is not documented beyond what is output by -safetymakeconfigdoc.

vbgl avatar Dec 15 '22 14:12 vbgl

PR #312 should fix the assertion failures.

That was not enough… PR #317 should help.

vbgl avatar Dec 16 '22 21:12 vbgl

Hi, I merged the safety branch from libjade into the main branch (there's still some work to do regarding parameter generation for the -safetyparam option, based on the contents of api.h files).

Meanwhile, I logged in the machine that is running this job and found:

"libjade/src/crypto_kem/kyber/common/amd64/avx2/shuffle.jinc", line 23 (8-22):
warning: vector suffix simplified from 8u32 to 256
Default checker parameters.

Analyzing function jade_kem_kyber_kyber768_amd64_avx2_keypair
Fatal error: exception File "src/safety/safetyAbsExpr.ml", line 764, characters 11-17: Assertion failed
Command exited with non-zero status 2
13242.83user 4.84system 3:40:49elapsed 99%CPU (0avgtext+0avgdata 272128maxresident)k
0inputs+408outputs (0major+2455143minor)pagefaults 0swaps

Which seems to correspond to this assert false.

Is this something that can be fixed on the compiler or the implementation?

tfaoliveira avatar Jan 28 '23 11:01 tfaoliveira

Is this an ArrayInit?

vbgl avatar Jan 28 '23 12:01 vbgl

I don't understand your question. The exception does not seem to mention what in the source code caused this.

tfaoliveira avatar Jan 28 '23 17:01 tfaoliveira

You may want to always set OCAMLRUNPARAM=b in your CI machines; this will provide much clearer exception stack traces.

Is it the case that you’ve let a machine compute during more than three hours and a half to prove a theorem that you know is false?

vbgl avatar Feb 03 '23 11:02 vbgl

You may want to always set OCAMLRUNPARAM=b in your CI machines; this will provide much clearer exception stack traces.

Thanks, will likely do that early next week.

Is it the case that you’ve let a machine compute during more than three hours and a half to prove a theorem that you know is false?

Did the previous report contribute to the next PR? https://github.com/jasmin-lang/jasmin/pull/343/ If that is the case, I consider the CPU time well spent.

I think that, for the moment, I will continue to include all libjade implementations in these runs as they might provide insightful feedback for improvements and consistently check that updates to the compiler or the libjade source code do not cause any changes in the expected behavior (even if that expected behavior is a "possible safety violation" when the corresponding execution ends).

tfaoliveira avatar Feb 03 '23 12:02 tfaoliveira

Did the previous report contribute to the next PR? #343

Not at all. Your previous report is worthless.

I think we know that most implementations of kyber are unsafe (unaligned memory accesses).

vbgl avatar Feb 03 '23 12:02 vbgl

Not at all. Your previous report is worthless.

Thanks for the feedback. Do you expect that https://github.com/jasmin-lang/jasmin/pull/343/ will eliminate the previously reported error?

tfaoliveira avatar Feb 03 '23 12:02 tfaoliveira