uClock icon indicating copy to clipboard operation
uClock copied to clipboard

Feature request: uClock nano - low memory mode

Open awonak opened this issue 8 months ago • 17 comments

I have been thinking about ways to help reduce memory usage with this library that might be useful to other folks. In my current use case, I only want to use uClock for the internal / external clock sync, and use setOnOutputPPQN to drive all of my app's logic. I experimented by commenting out all of the onSyncXXCallback and all associated counters, as well as the shuffle code that I don't need for my app. This freed up 1830 bytes of program memory and 130 of dynamic memory. For smaller microcontrollers like the Arduino Nano, this is a huge savings and frees me up to keep adding more features.

Can we make a low memory mode feature like this available to users? Is this an idea you would be interested in pursuing? Could we possibly wrap some of these allocation and implementation blocks in a flag like #ifndef UCLOCK_NANO to allow us to use the library in a very simple, low memory mode? Do you have any other ideas that would help us reduce memory allocation of features our firmware doesn't use?

awonak avatar Jun 21 '25 04:06 awonak

It looks like my assumption was incorrect that using the #ifndef UCLOCK_NANO would reduce program memory usage. It helps reduce dynamic memory, but the program memory is relatively unchanged.

awonak avatar Jun 21 '25 05:06 awonak

Yes, this is a common issue for some libraries used on microcontrollers that have limited RAM. To address the memory footprint in another project (uCtrl), I’ve used macro definitions to selectively include or exclude features at compile time.

A similar approach could be applied here by modularizing the library using configurable macros—just like in uCtrl. For example:

#define UCLOCK_NO_SHUFFLE
#define UCLOCK_NO_SYNC_CALLBACKS
// ... and so on

This would allow users to reduce memory usage by disabling unused features.

I'll mark this as a feature request and plan to expand on it as soon as possible.

midilab avatar Jun 21 '25 12:06 midilab

It looks like my assumption was incorrect that using the #ifndef UCLOCK_NANO would reduce program memory usage. It helps reduce dynamic memory, but the program memory is relatively unchanged.

One of the reasons I haven’t implemented macro-based feature selection in uClock yet is that most features don’t consume much code memory. However, they can impact runtime memory usage, which is more critical on constrained systems.

That said, I do believe it's worthwhile to optimize the library for selective features at compile time. Even saving a few bytes can make a significant difference in resource-limited environments—sometimes those few bytes can save lives!

midilab avatar Jun 21 '25 12:06 midilab

I think my assumptions of how the compiler works here might be incorrect. I attempted implementing the feature restriction define flags here: https://github.com/awonak/uClock/commit/f72c725ba12bb61cd0c086038847e1e7ff642292

When I build my library against main/develop here is my compiled memory usage:

Sketch uses 27958 bytes (91%) of program storage space. Maximum is 30720 bytes.
Global variables use 1755 bytes (85%) of dynamic memory, leaving 293 bytes for local variables. Maximum is 2048 bytes.

When I enable the flags implemented above (and I confirm the additional sync callbacks are no longer available), here is my compiled memory usage:

Sketch uses 27958 bytes (91%) of program storage space. Maximum is 30720 bytes.
Global variables use 1755 bytes (85%) of dynamic memory, leaving 293 bytes for local variables. Maximum is 2048 bytes.

Given I do not see any change, I wonder if the compiler is already taking care of the task of removing the unused code?

However, I tested another branch where I just remove all of the unused code, and I do see memory usage reduction. https://github.com/awonak/uClock/commit/6ed1218e6e43f41cce89466711d1828ac003af19

Sketch uses 26544 bytes (86%) of program storage space. Maximum is 30720 bytes.
Global variables use 1663 bytes (81%) of dynamic memory, leaving 385 bytes for local variables. Maximum is 2048 bytes.

Perhaps the first implementation using DEFINE blocks to omit unused code will still help in the area of dynamic memory used in the stack (vs global or program), but I'm not sure how to test that.

awonak avatar Jun 21 '25 18:06 awonak

Arduino build system does not propagate #define macros from .ino files to external libraries. During compilation, all .ino files are combined into a single translation unit, which is then compiled separately from the libraries. As a result, any macro defined in the .ino file—or even in a header located within the src/ directory of the sketch—won’t affect the library code directly.

To make a macro available to the library, it needs to be defined in a header file that’s part of the library itself, or passed as a compiler flag via build_flags (when using platforms like PlatformIO).

This limitation makes modularizing libraries using macros quite inconvenient for the end user on Arduino platform. Since they can’t simply define macros in their .ino file or sketch-level headers, they’re often forced to modify the library source directly or use more advanced build systems like PlatformIO.

midilab avatar Jun 22 '25 07:06 midilab

I (independently) implemented #define USE_SHUFFLE and #define USE_MANY_SYNCS and came by to see if anyone wanted a copy :) It free'd about 75 bytes of SRAM which isn't nothing on my Nano. The biggest saver is the 4 bytes per array element in the input sync tempo averager though, volatile uint32_t ext_interval_buffer[EXT_INTERVAL_BUFFER_SIZE];

Happy to contribute this, would you like a PR? Cheers!

tuulikauri avatar Oct 10 '25 15:10 tuulikauri

Hi @tuulikauri,

Thank you for your implementation! I've made some optimizations on the develop branch to reduce the size of unused structures. I'm still exploring alternatives to a macro-based solution, as the Arduino IDE can be challenging for beginners to navigate with macros. If we can't find a suitable alternative, we can proceed with the macro-based approach. Significant changes have been made in the develop branch, which I'm preparing to merge into main for the next release. I'm currently finalizing the basic documentation for this.

@awonak , Could you please test the develop branch to evaluate how much SRAM is saved? The implementation now only allocates sync signatures defined during setup(), and the EXT_INTERVAL_BUFFER_SIZE defaults to 1 unless a higher value is specified in setup(). Let me know your feedback!

midilab avatar Oct 10 '25 18:10 midilab

the Arduino IDE can be challenging for beginners to navigate with macros

Yeah completely agree... I also looked for other ways to edit the settings with #define, without diving into the library's files; but research I did found that the *.ino files are compiled separately; so #define and other compiler pre-directives dont carry through to library files in the Arduino IDE unless you edit the library files and include them there. If we find an alternate way that would be great.

In lieu of that I'd suggest a clearly defined section for editing user preferences; and documenting this in the Wiki for the library. A configuration section clearly delineated and obvious like this would be ideal for beginners to search for and edit confidently.

Config Area

tuulikauri avatar Oct 15 '25 04:10 tuulikauri

Excellent work! I'll try it out this weekend and report back.

awonak avatar Oct 15 '25 17:10 awonak

@awonak , @tuulikauri , could you guys please check the status of v2.3.0 for memory footprint? i have made some changes without macros to reduce ram and code size. depending on the results we can go for macro based features select on next release.

midilab avatar Nov 08 '25 13:11 midilab

2.2.1 was the base for my custom library, tuulClock.h where I added TIMER2 functionality (for use with AltSoftSerial, which needs TIMER1) and compiler directives for USE_SHUFFLE and USE_ALL_SYNCS.

Results on a nano compile (note, I'm not actually uploading this code. Just compiling in the IDE, with the board selected as Nano - the used memory is reported on compile, not on upload).

Is there something specific you want done with the hardware tests? I don't have a reproducible crash rig, but it was happening around 85% memory usage, which is when I started looking into freeing 100 bytes :)

Bottom line is the library consumes SRAM as:

  • 164 bytes with EXT_INTERVAL_BUFFER_SIZE to 1 and no compiler directives
  • 89 bytes by using compiler directives to remove shuffle and many of the syncs, and set EXT_INTERVAL_BUFFER_SIZE to 1, on baseline 2.2.1.
  • 99 bytes by using 2.3.0. Nice!

This is the space consumed by shuffle and many syncs; so I'm interested to poke into it at some point and see if there's still room to save more space with directives, on top of the changes in 2.3.0 //With: #define USE_SHUFFLE //costs 21 bytes of dynamic memory (SRAM) on a Nano //With: #define USE_ALL_SYNCS //costs 54 bytes of dynamic memory (SRAM) on a Nano

Here's my test notes with various configurations of settings in uClock.h.

//No includes: Sketch uses 444 bytes (1%) of program storage space. Maximum is 30720 bytes.                 Global variables use 9 bytes (0%) of dynamic memory, leaving 2039 bytes for local variables. Maximum is 2048 bytes.

//#include <LiquidCrystal.h> //Sketch uses 444 bytes (1%) of program storage space. Maximum is 30720 bytes.   Global variables use 9 bytes (0%) of dynamic memory, leaving 2039 bytes for local variables. Maximum is 2048 bytes.
//#include <MIDI.h> //Sketch uses 444 bytes (1%) of program storage space. Maximum is 30720 bytes.            Global variables use 9 bytes (0%) of dynamic memory, leaving 2039 bytes for local variables. Maximum is 2048 bytes.
//#include "Adafruit_seesaw.h"//Sketch uses 1658 bytes (5%) of program storage space. Maximum is 30720 bytes. Global variables use 122 bytes (5%) of dynamic memory, leaving 1926 bytes for local variables. Maximum is 2048 bytes.
//#include <AltSoftSerial.h> //Sketch uses 1084 bytes (3%) of program storage space. Maximum is 30720 bytes.  Global variables use 169 bytes (8%) of dynamic memory, leaving 1879 bytes for local variables. Maximum is 2048 bytes.

//#include <tuulClock.h> //Sketch uses 4680 bytes (15%) of program storage space. Maximum is 30720 bytes.     Global variables use 675 bytes (32%) of dynamic memory, leaving 1373 bytes for local variables. Maximum is 2048 bytes.
//Baseline:
//Sketch uses 4680 bytes (15%) of program storage space. Maximum is 30720 bytes.                            Global variables use 675 bytes (32%) of dynamic memory, leaving 1373 bytes for local variables. Maximum is 2048 bytes.
//without #define USE_SHUFFLE
//Sketch uses 4360 bytes (14%) of program storage space. Maximum is 30720 bytes.                            Global variables use 654 bytes (31%) of dynamic memory, leaving 1394 bytes for local variables. Maximum is 2048 bytes.
//Without #define USE_ALL_SYNCS
//Sketch uses 3472 bytes (11%) of program storage space. Maximum is 30720 bytes.                            Global variables use 600 bytes (29%) of dynamic memory, leaving 1448 bytes for local variables. Maximum is 2048 bytes.
//With EXT_INTERVAL_BUFFER_SIZE 1 //128 volatile uint32_t array size.... 4 bytes each. Saves the clock message time diffs in handleExternalClock(), called atomically by clockMe
//Sketch uses 3338 bytes (10%) of program storage space. Maximum is 30720 bytes.                            Global variables use 92 bytes (4%) of dynamic memory, leaving 1956 bytes for local variables. Maximum is 2048 bytes.

//With: #define USE_UCLOCK_SOFTWARE_TIMER //dont use any hardware timers
//x#define USE_TIMER2 //uses arduino nano TIMER2. Probably works on other 328p avr boards but untested. 
//x#define USE_SHUFFLE //costs 21 bytes of dynamic memory (SRAM) on a Nano 
//x#define USE_ALL_SYNCS //costs 54 bytes of dynamic memory (SRAM) on a Nano
//With: #define EXT_INTERVAL_BUFFER_SIZE 1 //128
//Sketch uses 676 bytes (2%) of program storage space. Maximum is 30720 bytes.                              Global variables use 89 bytes (4%) of dynamic memory, leaving 1959 bytes for local variables. Maximum is 2048 bytes.

//With: #define USE_UCLOCK_SOFTWARE_TIMER //dont use any hardware timers
//x#define USE_TIMER2 //uses arduino nano TIMER2. Probably works on other 328p avr boards but untested. 
//With: #define USE_SHUFFLE //costs 21 bytes of dynamic memory (SRAM) on a Nano 
//With: #define USE_ALL_SYNCS //costs 54 bytes of dynamic memory (SRAM) on a Nano
//With: #define EXT_INTERVAL_BUFFER_SIZE 1 //128
//Sketch uses 948 bytes (3%) of program storage space. Maximum is 30720 bytes.                              Global variables use 164 bytes (8%) of dynamic memory, leaving 1884 bytes for local variables. Maximum is 2048 bytes.


//All: Sketch uses 6532 bytes (21%) of program storage space. Maximum is 30720 bytes.                       Global variables use 948 bytes (46%) of dynamic memory, leaving 1100 bytes for local variables. Maximum is 2048 bytes.
//All, after tuulClock cleanup: Sketch uses 5190 bytes (16%) of program storage space. Maximum is 30720 bytes. Global variables use 365 bytes (17%) of dynamic memory, leaving 1683 bytes for local variables. Maximum is 2048 bytes.


//#include <uClock.h> //2.2.1
//With: #define USE_UCLOCK_SOFTWARE_TIMER //dont use any hardware timers
//With: #define EXT_INTERVAL_BUFFER_SIZE 128
//Sketch uses 1086 bytes (3%) of program storage space. Maximum is 30720 bytes.                             Global variables use 672 bytes (32%) of dynamic memory, leaving 1376 bytes for local variables. Maximum is 2048 bytes.
//With: #define EXT_INTERVAL_BUFFER_SIZE 1
//Sketch uses 948 bytes (3%) of program storage space. Maximum is 30720 bytes.                              Global variables use 164 bytes (8%) of dynamic memory, leaving 1884 bytes for local variables. Maximum is 2048 bytes.

#include <uClock.h> //2.3.0
//With: #define USE_UCLOCK_SOFTWARE_TIMER //dont use any hardware timers
//It seems there isnt #define EXT_INTERVAL_BUFFER_SIZE anymore, instead its a variable
//Sketch uses 1654 bytes (5%) of program storage space. Maximum is 30720 bytes.                             Global variables use 99 bytes (4%) of dynamic memory, leaving 1949 bytes for local variables. Maximum is 2048 bytes.


void setup() {
  // put your setup code here, to run once:
}

void loop() {
  // put your main code here, to run repeatedly:
}

tuulikauri avatar Nov 08 '25 20:11 tuulikauri

Further update; I opened a separate issue for the 2.3.0 update; but just focussing on some further SRAM results here:

Ported my TIMER2 feature into 2.3.0 so that I could compile my project code on baseline 2.3.0 library.

2.3.0 mod with TIMER2: Sketch uses 20812 bytes (67%) of program storage space. Maximum is 30720 bytes. Global variables use 1182 bytes (57%) of dynamic memory, leaving 866 bytes for local variables. Maximum is 2048 bytes.

2.2.1 with #defines to remove extra sync and shuffle and TIMER2: Sketch uses 20328 bytes (66%) of program storage space. Maximum is 30720 bytes. Global variables use 1253 bytes (61%) of dynamic memory, leaving 795 bytes for local variables. Maximum is 2048 bytes.

Compiling the AcidStepSequencer.ino example. 2.3.0: Sketch uses 6556 bytes (21%) of program storage space. Maximum is 30720 bytes. Global variables use 387 bytes (18%) of dynamic memory, leaving 1661 bytes for local variables. Maximum is 2048 bytes.

2.2.1: Sketch uses 5624 bytes (18%) of program storage space. Maximum is 30720 bytes. Global variables use 452 bytes (22%) of dynamic memory, leaving 1596 bytes for local variables. Maximum is 2048 bytes.

tuulikauri avatar Nov 10 '25 17:11 tuulikauri

hi @tuulikauri ,

TIMER2 is a 8bits timer, its not a option for the uClock archtecture that demands at minumun of a 16bits timer. AVR only have one 16 bits timer wich is the one used for uClock as default.

If you can't free up TIMER1 on your project, you can try to use software implementation. Or make the other library use TIMER2.

You can use uClock with TIMER2 but its not recommended, all the clock algorithm is expected to work with a 16bits presicion timer.

midilab avatar Nov 10 '25 18:11 midilab

@tuulikauri ,

Thanks for the tests so far.

Is there something specific you want done with the hardware tests? I don't have a reproducible crash rig, but it was happening around 85% memory usage, which is when I started looking into freeing 100 bytes :)

I just want to have an idea on how both v.2.2.1 vs v2.3.0 were reducing sram and code size for different projects.

Now i think there is no need anymore for a macro to remove onSync, everything was encapsulated on a simple small code and data structure to hold X onSync, removing repetitive code for each setOnSyncXX.

The shuffle i think still worth crete a macro for it, since it take a large part of code size.

Everything sram related im using now a alloc once and forever policy on the heap to control shuffle and onsync data, if the user dont setup any of those explicity, no memory will be used, same for EXT_INTERVAL_BUFFER_SIZE wich now is a dinamyc allocation if the user requests a larger buffer, other wise no memory wasted.

midilab avatar Nov 10 '25 19:11 midilab

its not a option for the uClock archtecture that demands at minumun of a 16bits timer.

Thanks for the ideas! Yes, I created a TIMER2 implementation for uClock that I'm using on my Nano; I needed this to support two serial ports on the Nano as AltSoftSerial consumes TIMER1. It uses an integer counter and 8 bit timer, in lieu of a big 16 bit counter. Code below in case you wanted to look. I'm not a professional dev and make no claim this works as nicely as the original 16 bit TIMER1, or on any hardware other than my Nano; but happy to support getting this feature into uClock library if you are interested and need anything further!

uClock.cpp edits

    //
    // General Arduino AVRs port
    //
    #if defined(ARDUINO_ARCH_AVR)
		#ifdef USE_TIMER2
			#include "platforms/avr2.h"
		#else 
			#include "platforms/avr.h"
		#endif        
        #define UCLOCK_PLATFORM_FOUND
    #endif

avr2.h, in it's entireity

// Timer2 implementation by Tuuli, with info from
// https://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-7810-Automotive-Microcontrollers-ATmega328P_Datasheet.pdf

#include <Arduino.h>

#define ATOMIC(X) noInterrupts(); X; interrupts();

// want a different avr clock support?
// TODO: we should do this using macro guards for avrs different clocks freqeuncy setup at compile time
#define AVR_CLOCK_FREQ	16000000

// forward declaration of uClockHandler
void uClockHandler();
volatile uint8_t cycleCounter = 0;
uint8_t desiredCycles = 0;
uint8_t remainder = 0;

// AVR ISR Entrypoint
ISR(TIMER2_OVF_vect){  
  ++cycleCounter;
}

ISR(TIMER2_COMPB_vect)
{
	if (cycleCounter >= desiredCycles){ 
		ATOMIC(
			TCNT2 = 0; 			
			cycleCounter = 0;
		)
	
    uClockHandler();
	}
}

void initTimer(uint32_t init_clock)
{
	// 8bits Timer2 init							
	// begin at 120bpm (48.0007680122882 Hz)
	uint32_t desiredTimerCounts = 41665; // = 16000000 / (8 * 48.0007680122882) - 1 (must be <65536)
	desiredCycles = desiredTimerCounts / 256;
	remainder = desiredTimerCounts % 256;
	
	cycleCounter = 0;
	
	ATOMIC(	
		TCCR2A = 0; // set entire TCCR2A register to 0
		TCCR2B = 0; // same for TCCR1B

		TCNT2 = 0; // initialize counter value to 0
		OCR2B = remainder - 1;	
		
		//Does NOT use CTC; instead cycles repeatedly and counts overflow cycles, and uses COMPB to trigger upon the remainder; when the full cycle overflow counter "cycleCounter" has been met.
		
		// Set CS22, CS21 and CS20 bits for 8 prescaler
		TCCR2B |= B00000010;  
		// Enable Timer Overflow Interrupt; TOIE2, bit 0
		TIMSK2 |= B00000001;  
		// Enable Timer COMPB Interrupt; OCIE2B, bit 2
		TIMSK2 |= B00000100;  
	)
}

void setTimer(uint32_t us_interval)
{
    float tick_hertz_interval = 1/((float)us_interval/1000000);

    uint32_t desiredTimerCounts;
    uint8_t tccr = 0;
	
    // 8bits avr timer setup
    if ((desiredTimerCounts = AVR_CLOCK_FREQ / ( tick_hertz_interval * 8 )) < 65535) {
        // 8 prescaler
        tccr |= (0 << CS22) | (1 << CS21) | (0 << CS20);
    } else if ((desiredTimerCounts = AVR_CLOCK_FREQ / ( tick_hertz_interval * 32 )) < 65535) {
        // 32 prescaler
        tccr |= (0 << CS22) | (1 << CS21) | (1 << CS20);
    } else if ((desiredTimerCounts = AVR_CLOCK_FREQ / ( tick_hertz_interval * 64 )) < 65535) {
        // 64 prescaler
        tccr |= (1 << CS22) | (0 << CS21) | (0 << CS20);
    } else if ((desiredTimerCounts = AVR_CLOCK_FREQ / ( tick_hertz_interval * 128 )) < 65535) {
        // 128 prescaler
        tccr |= (1 << CS22) | (0 << CS21) | (1 << CS20);
	} else if ((desiredTimerCounts = AVR_CLOCK_FREQ / ( tick_hertz_interval * 256 )) < 65535) {
        // 64 prescaler
        tccr |= (1 << CS22) | (1 << CS21) | (0 << CS20);
    } else if ((desiredTimerCounts = AVR_CLOCK_FREQ / ( tick_hertz_interval * 1024 )) < 65535) {
        // 1024 prescaler
        tccr |= (1 << CS22) | (1 << CS21) | (1 << CS20);				
    } else {
        // tempo not achievable
        return;
    }
			
	desiredCycles = desiredTimerCounts / 256;
	remainder = desiredTimerCounts % 256;
	
	cycleCounter = 0;
	
    ATOMIC(
        TCCR2B = 0;        
		OCR2B = remainder - 1;
        TCCR2B |= tccr;
    )
}

please check the status of v2.3.0 for memory footprint

Was there any other testing that would be helpful, or was it just the compiled memory sizes you were after? Happy to help!

Thanks for the library!

tuulikauri avatar Nov 10 '25 19:11 tuulikauri

Thanks for the ideas! Yes, I created a TIMER2 implementation for uClock that I'm using on my Nano; I needed this to support two serial ports on the Nano as AltSoftSerial consumes TIMER1. It uses an integer counter and 8 bit timer, in lieu of a big 16 bit counter. Code below in case you wanted to look. I'm not a professional dev and make no claim this works as nicely as the original 16 bit TIMER1, or on any hardware other than my Nano; but happy to support getting this feature into uClock library if you are interested and need anything further!

The TIMER2 will work, but there is a issue of time convertion range, a 16bits can handle more grained time like 120.0, 120.1, 120.2, 120.3, where the 8bit timer can't reach some of those fine grainined bpms. a example - just a visualization example, i didn't made the math for valid values - 8bits real bpms avaliable > 119.8, 120.3, 120.7, 130.1 .

For use with internal sync it will be no problem, but trying to sync externaly you will have troubles when the external source are sending a time that your 8bits timer couldn't achieve, so the sync will fall apart.

I decide to not add TIMER2 some time ago because of that limitation. The tests i made some time ago with 8bits didnt reach a level of quality i was expecting, especialy for external sync, but also to not allow more grained BPMs setup.

The v1.0 of uClock were implemented using a 8bit timer but a different approach where it interrupts at a constant rate all the time with the minimun time where we can count the cycles and set a precise BPM by know exaclty time to call a tick, but it were interrupting too much leaving almost no room to implement a more comlpex sequencer logic without having issues with interruption overflow. The v2.0 changes it applying a 16bit timer to interrupt only at a tick time, full precision for BPM generation and lots of room to code a sequencer, as i know all old hardware sequencers and newer ones do the same approach, a minimun of 16bits timer.

@doctea created the software timer, wich is not precise as a hardware timer but solves the problem of using uClock with other libraries that uses 16bit timer, maybe you can tryit out to see how your code goes with it in case you start having issues with your 8bits timer.

Was there any other testing that would be helpful, or was it just the compiled memory sizes you were after? Happy to help!

So far i think it is ok.

But im still curious on how much did you save with v2.3.0 with macro shuffle in case you have done it.

midilab avatar Nov 10 '25 19:11 midilab

Ah, thank you very much for that insight on the 8 bit timers! Yes, I did some thinking in spreadsheets of how it should not just assume hardware interrupts are free. Used some VERY rough ideas of clock cycles to execute the ISRs, and then looking how often and how long interrupt handling should take, and therefore what reasonable limits should be on using TIMER2. It was a few months ago now, I'd have to revisit it but the basic idea was to set the accuracy with the preScaler settings, so that the BPM is flexible but doesn't consume too much time with the overflow interrupts. Using a prescaler of 1 I decided was unacceptable (triggering an overflow every 256 clock cycles at 16 MHz was an interrupt every 16 us) for starters; minimum is 8. It will be interesting to try with external sync, I have not yet implemented that. Thanks for that, and yes I will fall back to the software timer in case of issues.

But im still curious on how much did you save with v2.3.0 with macro shuffle in case you have done it.

I haven't done it (yet) and a few quick diffs showed how much had changed between 2.3.0 and 2.2.1. But previously it was 21 bytes, so given all the optimizations I would imagine that would be the upper end of what was available. I will share the results if I implement it! Thanks again!

tuulikauri avatar Nov 10 '25 20:11 tuulikauri