mbedtls icon indicating copy to clipboard operation
mbedtls copied to clipboard

AES and RSA (bn_mul) optimizations for Visual Studio 64bit

Open orlx opened this issue 6 years ago • 5 comments

Description

The changes improve mbedtls performance when compiling with Visual Studio for 64bit by adding code paths that use intrinsic functions, particularly:

  • AES encryption/decryption using AES-NI
  • bignum multiplication by using 128bit umul and adc instructions

Also switch to TSC timing as QueryPerformanceCounter resolution is too low and it doesn't return CPU cycles.

Status

READY/IN DEVELOPMENT

Requires Backporting

NO

  • This PR is a new feature\enhancement

Migrations

NO

Additional comments

I ran the compat.sh tests (with some modification that are not part of this PR) on Windows in WSL/bash environment and they do pass. Although I noticed some instabilities, related to shutting down the client (./tests/compat.sh: 1052: kill: No such process) but those are not related to this PR.

As tests are not run as CI under Windows (afaiu), I'm interested how to ensure ongoing correctness of the code I'm submitting.

Also, and it's related to testing, as at least the current versions of clang and probably gcc appear to support intrinsic functions for AES-NI, CLMUL and ADC - I'm wondering if switching inline assembly to intrinsics would be feasible while deduplicating code for other than msvc/windows compilers too? I'm a bit worried about possible compiler/optimizer issues though.

Todos

  • [ ] Tests
  • [ ] Documentation
  • [ ] Changelog updated
  • [ ] Backported

Steps to test or reproduce

On Windows 64bit, compiled with Visual Studio, observe numbers from .\programs\test\benchmark.exe without and with the PR. Selected numbers on my machine:

before

  AES-CBC-128              :     203930 KiB/s,         15 cycles/byte
  AES-CBC-192              :     171486 KiB/s,         18 cycles/byte
  AES-CBC-256              :     159248 KiB/s,         19 cycles/byte
  AES-XTS-128              :     175882 KiB/s,         17 cycles/byte
  AES-XTS-256              :     138802 KiB/s,         22 cycles/byte
  AES-GCM-128              :      95727 KiB/s,         33 cycles/byte
  AES-GCM-192              :      90064 KiB/s,         35 cycles/byte
  AES-GCM-256              :      85229 KiB/s,         37 cycles/byte
  AES-CCM-128              :     102824 KiB/s,         31 cycles/byte
  AES-CCM-192              :      90658 KiB/s,         35 cycles/byte
  AES-CCM-256              :      80782 KiB/s,         39 cycles/byte
  CTR_DRBG (NOPR)          :     174899 KiB/s,         18 cycles/byte
  CTR_DRBG (PR)            :     121910 KiB/s,         26 cycles/byte
  RSA-2048                 :    8674  public/s
  RSA-2048                 :     212 private/s
  RSA-4096                 :    2173  public/s
  RSA-4096                 :      33 private/s

after

  AES-CBC-128              :     432099 KiB/s,          7 cycles/byte
  AES-CBC-192              :     385591 KiB/s,          8 cycles/byte
  AES-CBC-256              :     360928 KiB/s,          9 cycles/byte
  AES-XTS-128              :     407371 KiB/s,          8 cycles/byte
  AES-XTS-256              :     336302 KiB/s,          9 cycles/byte
  AES-GCM-128              :     190789 KiB/s,         17 cycles/byte
  AES-GCM-192              :     171782 KiB/s,         18 cycles/byte
  AES-GCM-256              :     173023 KiB/s,         18 cycles/byte
  AES-CCM-128              :     260800 KiB/s,         12 cycles/byte
  AES-CCM-192              :     241114 KiB/s,         13 cycles/byte
  CTR_DRBG (NOPR)          :     397238 KiB/s,          8 cycles/byte
  CTR_DRBG (PR)            :     262009 KiB/s,         12 cycles/byte
  RSA-2048                 :   19703  public/s
  RSA-2048                 :     458 private/s
  RSA-4096                 :    5197  public/s
  RSA-4096                 :      76 private/s

orlx avatar Jul 30 '18 12:07 orlx

AWESOME! :D I have a question. Is AES-NI not supported in 32 bit mode?

mrsshr avatar Aug 23 '18 08:08 mrsshr

@mrsshr No, it is not.

orlx avatar Aug 23 '18 14:08 orlx

Hi @orlx,

thanks alot for your contribution! :+1: We will look into it and come back to you afterwards.

Kind regards, Hanno

hanno-becker avatar Aug 23 '18 14:08 hanno-becker

@mrsshr the inline assembler code for GCC and Clang appears to be 32-bit clean (It even limits itself to the first six FPU regs!).

Given the proper ISA support, it works without issues in protected mode. 🤷🏻‍♂️

despair86 avatar Sep 04 '18 01:09 despair86

@hanno-arm any update on this?

AndreasReich avatar Dec 05 '18 12:12 AndreasReich

We are now converting older PRs to draft PRs where the following conditions are met: They have not been updated in the last 3 months, and they need more than non-trivial work to complete.

tom-daubney-arm avatar Jul 07 '23 09:07 tom-daubney-arm