chacha-avr icon indicating copy to clipboard operation
chacha-avr copied to clipboard

A public domain constant-time implementation of ChaCha20 on 8-bit AVR microcontroller.


2018-06-10 Markku-Juhani O. Saarinen [email protected]

A public domain implementation of ChaCha20 on 8-bit AVR microcontroller, just because you never know when you might need one ! We implement the raw ChaCha permutation and include ChaCha20 keystream block generation per RFC 7539 as a higher level use case.

The actual assembly language implementation is in chacha_core_avr.S, the rest is basically for serial output, testing, and benchmarking.

Comparison with a fast C implementation

An unrolled C implementation of the same permutation is included in chacha_core_c.c and the Makefile contains instructions for swapping it in.

As can be seen, the C language ChaCha8 block operation requires 18163 ticks, i.e. 3.6 times more time. Furthermore, the code size is 2594 bytes larger. My handwritten assembly language permutation is only 324 bytes. This means that you can get high security cryptography running with less than half a kilobyte of code altogether.

Running on Arduino

The flashing code is incorporated into Makefile. You need to install (just apt install on Debian/Ubuntu) at least avr-gcc and avrdude packages.

Plug Arduino into a USB port and do a make flash. Now the thing will compile code, flash it, and dump output via UART with stty magic to your standard output. Most relevant settings are in Makefile -- you should edit that first if something fails. The default settings work with my Arduino Mega 2560 with Ubuntu 18.04.

$ make flash
mkdir -p obj/
avr-gcc -Wall -Ofast -mmcu=atmega2560 -DF_CPU=16000000 -c chacha_core_avr.S -o obj/chacha_core_avr.o
mkdir -p obj/

.. more stuff.. and then the interesting bit:

stty -F /dev/ttyACM0 raw icanon eof \^d 38400
cat < /dev/ttyACM0
Run #00 5073 ticks / block
Run #01 5063 ticks / block
Run #02 5057 ticks / block
Run #03 5055 ticks / block
Run #04 5053 ticks / block
Run #05 5052 ticks / block
Run #06 5052 ticks / block
Run #07 5051 ticks / block
Run #08 5052 ticks / block
Run #09 5052 ticks / block

That's a success. The long hex string there should match with keystream from section 2.4.2 of RFC 7539 and can be used to verify correctness.

The performance numbers are for a ChaCha8 block operation, not ChaCha20.

Running on SIMAVR

If you don't have an Arduino lying about, you can use simavr ( and test the code with make sim.

This cycle-perfect simulator simulates even the UART so the output will be basically equivalent to above:

simavr -v -v -v -m atmega2560 chacha-avr
Loaded 3822 .text at address 0x0
Loaded 72 .data
UART: 0 configured to 0019 = 2403.8462 bps (x1), 8 data 1 stop
UART: Roughly 4576 usec per byte
Run #00 5073 ticks / block.
Run #01 5063 ticks / block.
^Csignal caught, simavr terminating

Have fun

Cheers, -markku