rgbds icon indicating copy to clipboard operation
rgbds copied to clipboard

Improve performance of emitting data

Open ISSOtm opened this issue 1 year ago • 4 comments

Primarily, avoids reserving the max size for ROM sections' data. Also performs bulk reads.

Fixes #1483

ISSOtm avatar Aug 22 '24 23:08 ISSOtm

Note that the PR is, by itself, complete, but that the design of both INCBIN functions sucks; I will refactor it to perform bulk reads and updates of the various section-related variables. (I anticipate that this should provide a further performance boost when building INCBIN-heavy code, though this has yet to be measured.)

ISSOtm avatar Aug 22 '24 23:08 ISSOtm

There are CI failures, and Sylvie pointed me out some further improvements this PR could use, so I've marked it as draft for the time being.

ISSOtm avatar Aug 22 '24 23:08 ISSOtm

Yay! All green! The claimed performance improvements are yet to be measured, though.

ISSOtm avatar Aug 23 '24 03:08 ISSOtm

Times to build pokecrystal:

  1. master = 4.79s
  2. "Avoid reserving the max size for ROM sections' data" = 4.75s
  3. "Make section size and related variables 16-bit" = 4.81s
  4. "Reduce the size of some enums and structs" = 4.75s
  5. "Bulk file reads in INCBIN" = 4.74s

Just like the attempt in #1488 (which builds pokecrystal in 4.78s), this doesn't actually improve performance. I think we should focus on code quality (correctness, simplicity, consistency, DRY, ease of maintenance, etc) over trying to optimize; just avoid pessimizing things. :P

(For example, if reordering the struct fields doesn't provide a visible improvement, then IMO it's not worth losing their semantic arrangement. Also, I wonder if using unsigned chars for enums and uint16_ts for some sizes is slightly harming performance, since the native register size is 64-bit anyway so these quantities get shifted/masked redundantly? (And please let's not use uint_fast16_t.))

Rangi42 avatar Aug 27 '24 06:08 Rangi42

Since the CPU is able to load only individual data pieces, and I believe that operation is not slower than loading the “native size”, I don't think using smaller data units can harm performance. What definitely could is “packing“ the structs so they don't have any padding anymore, since it forces unaligned reads (and makes pointers to unaligned fields invalid). But I eliminated almost all padding anyway, so we don't need to try.

ISSOtm avatar Sep 04 '24 00:09 ISSOtm