riscv-crypto icon indicating copy to clipboard operation
riscv-crypto copied to clipboard

Reconsider destructive encoding form for AES instructions

Open David-Horner opened this issue 4 years ago • 7 comments

The Concern

The Public Review raised the concern that

  1. opcode space was being wasted unnecessarily
  2. encoding did not directly support the preferred and customary use of the instructions.

The Public Review frozen specification has three 5 bits fields: destination, rd; source 1, rs1; and source 2, rs2 Thus any of the 32 general purpose X registers can be independently used as destination and either source.

Recommendation:

The AES instructions encoding specify only 2 X registers. The destination register rd will also define the rs1 source register. This is termed a destructive encoding. The remaining source register field is defined as in the frozen spec.

This encoding provides a factor of 32 reduction in opcode use for each of the AES instructions. It also directly supports the preferred and recommended concise instruction sequence of the RISCV crypto design.

David-Horner avatar Oct 22 '21 11:10 David-Horner

The initial post on ISA-DEV is posted verbatim here:

The rationale document "The design of scalar AES Instruction Set Extensions for RISC-V" intentionally avoids discussing encoding.

Nor does the public review explicitly justify the specific encodings.

Never the less, the design intentionally gravitates to a destructive format for the crypto operations.

The reason is clear:

Instructions designed to require the retention of less state information,

 1) thus fewer general purpose registers are required for the algorithm, reducing register pressure

 2) successive instructions can be fused (as intermediate results can be discarded) for greater performance, and

 3) defer register file writeback and improve power efficiency.

These characteristics are especially evident in the rationale document code for saes.v3.encsm and variants (that is the RV32 Zk instructions aes32dsi, aes32dsmi, aes32es, aes32esmi)

There the destination is always a source register (specifically rd = rs1 in RV32 Zk)

Thus the usual and customary usage form for these instructions will be rd = rs1.

The other forms of the instruction will only very rarely be used if ever.

So why provide them?

These 4 instructions are in actuality 16 variants of the instruction, with 2 bits determining the byte to select from rs2.

By dropping the rs1 field and defining it to be rd, we buy back 5 bits, so that these 16 variants consume only 1/8th the encoding space of a typical two source instruction (say, like add).

This is particularly beneficial to such a domain specific application. The expectation is that future functionality will want 32bit instruction encoding; not only in crypto post quantum?) , but also in other fields.

If effort is made to reduce the encoding space of specifically targeted macro operations, room will be left for future "requirements"

The same argument applies to the RV64 variant, which uses a significantly different AES implementation.

The RV64 version code unrolled the loop to remove unnecessary moves, but no evidence of opcode optimization consideration is present.

Never the less, the same destructive, rd = rs1 approach can be used by introducing only one additional move per iteration.

Specifically, the first of the aes64* instruction pair use a C.mv to the aes64* rs1/rd register to avoid the loss of either of the two source data pair registers.

This is a very reasonable tradeoff when

  1. the C.mv can readily be fused in a high performance implementation

  2. the C.mv comprises a small fraction of cycles in the implementation given aes64* cycle count

  3. one less working register is used [this is also possible with the proposed non-destructive form, but becomes obvious when looking for reduced encoding opportunities]

  4. 5 by 4 instruction encoding code points are conserved, preserved for the future.

David-Horner avatar Oct 22 '21 11:10 David-Horner

For the sake of keeping as much information in one place as possible...

There have been numerous prior discussions on this issue long before we hit public review. As a TG, we actually started with the current design, then changed to exactly this proposed scheme, then back again.

Issues #65 and #75 talk about this at length. Issue #75 in particular shows us going "around the loop" after receiving input from others and having our perception of available opcode space updated.

In the end, the task group accepted the recommendations of the arch review committee, and ended up with the scheme as proposed during public review. We were told that opcode space was not as tight as we initially supposed, and that the cost of introducing this new idiom (sourcing rs1 from rd) was not worth the cost of the opcode space.

There are material advantages in terms of static code size (and runtime cache performance) to having rd distinct from rs1 in some cases, as it can make the core of the AES round function half the (code) size, at the expense of four move instructions per round. This is mentioned in #75.

I'll record this in our summary of public review issues. Ultimately it will be down to the architecture review committee to decide if the change should go ahead.

ben-marshall avatar Oct 22 '21 13:10 ben-marshall

encoding did not directly support the preferred and customary use of the instructions.

Unless I'm misunderstanding, I'm not sure this is true. The current encoding does support the expected common use case, and others. Your proposal is to constrain the range of expressible source/destination register addresses to only the expected common use case in order to save opcode space.

ben-marshall avatar Oct 22 '21 13:10 ben-marshall

On Fri., Oct. 22, 2021, 09:58 Ben Marshall, @.***> wrote:

encoding did not directly support the preferred and customary use of the instructions.

Unless I'm misunderstanding, I'm not sure this is true. The current encoding does support the expected common use case, and others.

That is why it says "directly" support. It would be similar to having a shift immediate instruction that could specify > 64 bits on RV64 and RV32. Completely useless in practice and potentially confusing.

.

Your proposal is to constrain the range of expressible source/destination

register addresses to only the expected common use case in order to save opcode space.

Constraining the register usage even if it did not reduce the number of encoding bits is still valuable; it directs the coder to the optimal formulation. It is fortuitous that both benefits are achieved.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/riscv/riscv-crypto/issues/135#issuecomment-949656949, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFAWIKORQPWEJ6FCGIDGZNDUIFUWVANCNFSM5GQLG6GQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

David-Horner avatar Oct 23 '21 02:10 David-Horner

On 2021-10-22 10:10 p.m., David-Horner wrote:

On Fri., Oct. 22, 2021, 09:58 Ben Marshall, @.***> wrote:

encoding did not directly support the preferred and customary use of the instructions.

Unless I'm misunderstanding, I'm not sure this is true. The current encoding does support the expected common use case, and others.

That is why it says "directly" support. It would be similar to having a shift immediate instruction that could specify > 64 bits on RV64 and RV32. Completely useless in practice and potentially confusing.

I realize that this analogy appears contrived and an exercise in hyperbole.

I have attempted to find [either in the literature or in my imagination] examples that are closer in egregiousness and substance to the current overly permissive/redundant/wasteful/unnecessary encoding.

I have discovered some further instances, any of which could be debated as to their clarity and application to this instant situation.

The gist is no analogy or comparable situation is going to dissuade or persuade if the facts in the situation are not compelling enough for a person's specific mindset.

I hope the analogy presented is sufficient to illustrate the point, flawed though it be, that for a given operation there are optimal encodings and that form should follow function.

Specifically, our AES instructions are an example in which

  1. novel encoding can provide an overall benefit to the ecosystem,

2)  minimalist formulation is often beneficial and directly follows from the instruction design, and

  1. that destructive encoding, in particular, is an appropriate feature/infrastructure to add to the  RISCV ISA.

Note:

   The Vector Extension adds such an encoding, although apparently in that case for no other benefit than to reduce encoding space.

      [certainly one could argue for other tangible benefits, such as the uncompelling assertion that the encoding acknowledges known patterns of use of Multiply-Add instructions]

   Just because the Vector Extension could be tainted with a charge of "only" adopting destructive encoding because of opcode space pressure, we are not also guilty as co-accused.

.

Your proposal is to constrain the range of expressible source/destination

register addresses to only the expected common use case in order to save opcode space.

Constraining the register usage even if it did not reduce the number of encoding bits is still valuable; it directs the coder to the optimal formulation. It is fortuitous that both benefits are achieved.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub

https://github.com/riscv/riscv-crypto/issues/135#issuecomment-949656949,

or unsubscribe

https://github.com/notifications/unsubscribe-auth/AFAWIKORQPWEJ6FCGIDGZNDUIFUWVANCNFSM5GQLG6GQ

. Triage notifications on the go with GitHub Mobile for iOS

https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675

or Android

https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/riscv/riscv-crypto/issues/135#issuecomment-950040924, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFAWIKKJNN6H5IRTZ7Q6N7LUIIKPZANCNFSM5GQLG6GQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

David-Horner avatar Oct 23 '21 10:10 David-Horner

I've tried to capture all of this in an update to the public review summary. I couldn't capture everything you've written, so have linked to this issue and tried to summarise.

In terms of what happens now, I'll share the public review summary with the wider TG shortly and we can see what discussion arises there.

I don't think there is anything drastically new here to what we've discussed & considered before. So given the arguments for doing this remain the same, I'd like to figure out if the wider context we're having this discussion in has changed, since that would be a motivation to revisit things.

Edited to add: You didn't mention the SM4 instructions in your issue at all. However, they have the same encoding and usage pattern, so I have broadened the scope of your issue slightly in the public review summary to include them. I hope this is okay.

ben-marshall avatar Oct 27 '21 09:10 ben-marshall

I have reviewed the publicly available discussions on rt format and destructive format in particular.

Arguments against the "rt" encoding appear sound, although I don't fully follow the #75 waste issue. Using rd=0 as a trigger/signal that rs1 is both source and destination presents legitimate issues, not unexpected, as the encoding appeared to be using tricks to reduce encoding size.

However, discussion on the merit of saving/wasting 32bit opcode space is particular sparse.

No substantiated justification is given that an encoding that retained rd as destination and was used as a source was problematic. Andrew specifically said the muxing is not an issue.

Further, Andrew's guidance, to use a standard R format with hardware enforced rd = rs1, was not followed. No discussion on why it was not acceptable/obeyed/implemented.

I believe these last two failings should be addressed. Certainly justification should be documented as it is absent from the public narrative.

Additionally, I maintain that the issue is not just retaining 32bit opcode space, although that is an appropriate objective.

Rather, the current unnecessary degrees of freedom create a increased validation/analysis burden that is inappropriate for a security oriented extension. Side-channel concerns may previously have been out of scope when these issues were discussed; it is good that they have recently been embraced. So it is understandable/forgivable/defensible/excusable that such scrutiny was not previously afforded this issue. Further, nothing will make an implementation errata proof, but reducing degrees of freedom will be a better defence against them than a multitude of unnecessary options.

David-Horner avatar Nov 01 '21 07:11 David-Horner

This issue is related to Scalar Crypto and was decided as a part of the review and ratification process. There is nothing else to be done and therefore this is being closed.

kdockser avatar Oct 29 '22 22:10 kdockser