trezor-firmware Decrease micropython code size

Connected with https://github.com/trezor/trezor-firmware/issues/1947

Decreases the size of compiled micropython code by around 48 kb, and str1 symbol in .flash2 by 6 kb.

Makes a lot of code changes with the goal of making it more space-efficient, so the resulted compiled bytecode is smaller. It does it by consistently applying some strategies of code-size-reduction like inlining small functions or caching global variables in a local scope (more below).

The biggest decreases were achieved in the automatically generated data (tokens.py, networks.py or coininfo.py), where one small thing (like removing the keyword arguments) makes a huge difference.

Includes a tool for automatic detection of some size-decrease strategies (under core/tools/upy_size). Its README covers all the used strategies in more details.

Micropython function sizes before and after.

Tree of size changes shows differences in specific directories/files.

It would be hardly possible without https://github.com/trezor/trezor-firmware/pull/2327, which offered quick feedback on any code change.

There is still some potential for further decreases (in order of ~ 5 kb), which would mean inlining even more one-time functions (the longer ones) or making the Message types not enforce keyword-only arguments.

All the changes are up to discussion and we may decide to not include some of those, as optimization towards one side (size efficiency) worsens some other factors (like readability).

Sep 22 '22 10:09 grdddj

All further space-decreasing suggestions - https://gist.github.com/grdddj/917fcf293a56c4a374618588994bb189

Sep 22 '22 15:09 grdddj

I have separated the upysize tool into its own repository, and therefore removed it completely from this branch.

It can be installed by pip install git+https://github.com/grdddj/upy_size, which will create upysize command.

It takes the file/directory as an argument and will analyze all the python files inside. (No possibility of excluding files/strategies yet.)

What was left in firmware repo is the upysize_ignore.json, which defines which warnings should be ignored (only function inlining so far).

To use this ignore file, usage is then upysize -i tools/upysize_ignore.json src/apps

Sep 29 '22 10:09 grdddj

Is there anything you want me to review?

Sep 30 '22 15:09 onvej-sl

Is there anything you want me to review?

@onvej-sl I do not think so, it (hopefully) did not change any logic. Sorry for not removing the request for your review earlier

Oct 03 '22 07:10 grdddj

Sorry for not removing the request for your review earlier

No problem. :)

Oct 03 '22 11:10 onvej-sl

please figure out if not-top-level const()s have an effect on code size

please figure out how many total bytes is saved by "if-elif-else" change

let's change # cache to # local_cache_attribute so that it is searchable

let's not use global-symbol-in-local-function caching: it's rather confusing and saves maybe 50 bytes overall (that thing where you cache a non-imported symbol, e.g., another function from the same file)

Oct 06 '22 13:10 matejcik

please figure out if not-top-level const()s have an effect on code size

They do not have any positive impact, on the contrary, if we do not have to import it, it saves space. Non-top-level usages removed in d3ca4c6

please figure out how many total bytes is saved by "if-elif-else" change

label = {
    0: "north",
    90: "east",
    180: "south",
    270: "west",
}.get(rotation)
if label is None:
    raise wire.DataError("Unsupported display rotation")

is saving 19 bytes over the if-elif-else counterpart. Together with some other usages of dict.get it is saving around 50 bytes, so I agree we can get rid of these small gains and removing this possibly-confusing optimization. Done in 0b0b8c5

let's change # cache to # local_cache_attribute so that it is searchable

Labeling of all used caching strategies done in 880a052

let's not use global-symbol-in-local-function caching: it's rather confusing and saves maybe 50 bytes overall (that thing where you cache a non-imported symbol, e.g., another function from the same file)

Majority of these things deleted in a2489f6. Very frequently used symbols (like DataError being thrown 15 times in one function) solved by re-importing, instead of aliasing. The only aliases I kept are the VeryLongEnumNames, which make sense both from the space-wise and readability-wise perspective.

In 1517905 I am decreasing the granularity (alignment) of flash sections from 512 to 64 bytes. This saves up to 450 bytes and also explains the phenomenon in the comment above, when decreasing of size in a module was accompanied by the same increase if str1 - that was the .flash2 section aligning to 512 bytes.

Oct 10 '22 15:10 grdddj

ed1690f introduces a simple tool for better understanding of bytecode in frozen_mpy.c - translator of the opcodes into the human-readable symbols.

Thanks to it we can easily quantify the cost of any operation - import, function call, attribute access, global symbol lookup etc.

Some od these insights documented in https://github.com/trezor/upysize/commit/d915a441d9c88f125d67c31cbfdb352ca42537c5

Oct 11 '22 13:10 grdddj

I added local_cache_attribute.py tool that can convert code back and forth between representations of the local-cache-attribute thing.

The expected usage from this point is:

run python local_cache_attribute.py core/src --reverse --simplify By default, for rename foo = msg.foo, the tool will convert to # local_cache_attribute: msg.foo. If the name is not the last component (i.e, foo in this case), the comment will be # local_cache_attribute: msg.foo -> some_other_name. The --simplify option tells the tool to also strip names like msg_foo, so that on re-generation, the variable name will be just foo instead of msg_foo.
fix problems -- in several cases the local cache attribute is something more complex than "a_b_c = a.b.c", such as msg_ecdsa_curve_name = msg.ecdsa_curve_name or "secp256k1". In such cases I'd prefer the "local_cache_attribute" comment be removed, and perhaps the variable name simplified (in the example, curve_name would be very sufficient).
re-run as necessary
run python local_cache_attribute.py core/src
fix problems -- in some cases the simplification is not appropriate, so you can manually add the -> reasonable_name instruction to the local_cache_attribute functions
run make style to fix stray formatting

Please check that this tool does what it is supposed to do; if yes, please apply the results to the appropriate commits.

Please also squash fixups and tidy up the WIP commits before the next (probably final) review round.

Oct 17 '22 14:10 matejcik

Please check that this tool does what it is supposed to do; if yes, please apply the results to the appropriate commits.

It was mostly working fine, apart from couple edge-cases. These limitations summed up in the docstring of the script.

Please also squash fixups and tidy up the WIP commits before the next (probably final) review round.

Force-pushed with cleaned-up commits.

Oct 19 '22 14:10 grdddj

Possible idea - we could also comment all the inside-function imports with something like # local_import and have a local_import.py script to "turn it on and off".

Even without the script it makes sense to flag all those imports so they are easily searchable.

Oct 20 '22 10:10 grdddj

Even without the script it makes sense to flag all those imports so they are easily searchable.

possibly, yeah. but with the move to Rust, we'll stop needing function-local imports for run-time properties, so we will be free to consider all function-local imports for the local_import feature

Oct 20 '22 11:10 matejcik

it seems that there is missing simplification in a lot of places?

I am not seeing this, if simplification means "not including the object name in the cached name". The only places where it is done are for example self_account_path or msg_safety_checks where account_path or safety_checks are already defined symbols in the current scope

Oct 24 '22 10:10 grdddj

Force-pushed after rebase on master.

It was a little painful to resolve all the cardano and btc merge conflicts.

I also noticed that the _write_uint writer function is not behaving correctly for bigendians (used only in tezos and stellar), so I have added core/tests/test_apps.common.writers.py

Nov 08 '22 17:11 grdddj

I also noticed that the _write_uint writer function is not behaving correctly for bigendians (used only in tezos and stellar), so I have added core/tests/test_apps.common.writers.py

i'm thinking we might even want to replace the helper with w.extend(num.to_bytes(8, "big")), but that would cause an allocation in all cases which we kind of want to avoid. for that matter, the reversed() call is definitely allocating. an option is to replace the original wrong range with range(bits - 8, -1, -8). still the range call is allocating, where the original code didn't. (the original code was not completely allocation free either: operations with numbers > 2^29 (i think) will cause an allocation. still, that would usually not be a problem)

for now let's keep it this way, but let's also have a look at the number of allocations between these variants

Nov 09 '22 11:11 matejcik

trezor-firmware trezor-firmware copied to clipboard

Decrease micropython code size

trezor-firmware
trezor-firmware copied to clipboard