InfiniTime icon indicating copy to clipboard operation
InfiniTime copied to clipboard

Copy in-use resources from SPI flash to built-in flash

Open pipe01 opened this issue 1 year ago • 6 comments

Since reading files from the SPI flash is very slow we could copy the files that the current watchface needs into the internal flash, then use them from there instead of the flash. This could be implemented as an LVGL file system, although I don't think we need to use LittleFS and could instead use a trivial flat file system structure.

For example, when the casio watchface is selected the three font files it uses will be copied into the internal flash, and when the watchface is shown it will load the files from there much quicker.

Alternatively, we could implement a sort of caching LVGL file system that sits in between the LittleFS file system and the LVGL adapter that caches some files in the internal flash, but that might be trickier due to fragmentation.

pipe01 avatar Dec 10 '24 20:12 pipe01

I've thought about this a bit too. The one thing I'm worried about is the durability of the inbuilt flash, which is only rated for 10K cycles. So I think we'd want to avoid a cache setup where resources get swapped in and out often. An architecture that copies in watchface files on watchface change could be good though I think, watchface is not changed often.

mark9064 avatar Dec 10 '24 23:12 mark9064

Following up from this, it appears that there's definitely some performance issues with the external flash. The font files for G7710 are less than 10K in total. In theory, reading 10K from the flash should take no more than 10ms (SPI bus is 1MB/s).

I suspect littleFS issues with LVGL doing many file ops, but I'd need to test. If optimising resource loading is possible, I'd prefer that to reflashing the internal flash

mark9064 avatar Jan 10 '25 17:01 mark9064

I've done some research on this.

TLDR test conditions

  • LittleFS updated to latest
  • Filesystem formatted after update
  • Threadsafe mode enabled
  • Metric is number of ticks to load all 3 fonts on my modified version of G7710 (I have modified it to include seconds so the main digit font is 95 rather than 115) - So not directly comparable to stock G7710

I implemented a new external font format that places the font on flash in the exact memory layout that the CPU uses. This is done by compiling the font and parsing the .o ELF. The result is that loading each font file requires 1 stat (to check filesize), 1 read (the entire file), and then a few lines of C++ to fix all of the internal pointers in the font structure. This improves performance and also (AFAIK) consumes a little less memory as the entire font is contained within one contiguous allocation (haven't tested this too much).

Do note that total load time depends on the internal metadata state of LittleFS. As writes are performed (e.g changing the watchface in the settings menu which rewrites the settings.bin), this state fills up and loading becomes slower. Eventually it becomes full and is flushed, and then FS performance improves. This metadata is on-disk/persistent, and I don't know of a way to flush it (lfs_fs_gc does not work for this).

I modified G7710 to include a tick count at the top, which measures the time just to load the fonts with xTaskGetTickCount() before and after.

Performance:

Condition Fastest load ticks (approx) Slowest load ticks (approx)
lv_font_load 260 400
New format 130 300
New format+128 cache size 95 225

Comparing identical metadata states at an arbitrary metadata utilisation (I don't how to read it), 267 ticks with LVGL format corresponded to 155 ticks with the new format. (rebooting to change the loading method is fine as LittleFS is never written to)

Firstly, performance is poor in all scenarios. There are only 6250 bytes to read; theoretically this should 6.25ms (7 ticks) (SPI bus is 1MB/s), but I'd expect some filesystem overhead. 20 ticks would be good, but 50 would be still acceptable. The absolute fastest performance LittleFS can offer being 95 ticks is not ideal. That said, 95 feels significantly better than 260 and especially 400 (which is embarrassing honestly - and it's not even loading as much data as the stock G7710 as the fonts in my version with seconds are smaller). Also of note is how the performance varies so much based on LittleFS's internal state. I don't know what's happening here really.

At this point, I'm quite convinced that LittleFS in its current form is not fit for purpose when it comes to loading fonts. Fonts are expected to be rewritten at most once every InfiniTime update, and having fonts load quickly is important (thus the power loss tolerance and wear levelling features of LittleFS are not important for resources). I think using the external flash is fine for fonts, but that LittleFS isn't performant enough for this usecase.

I'm not sure what the next move here is. Some ideas:

  • Shrink LittleFS, and then use the space at the end for resources in a simpler filesystem. A map of the resources could be stored in LittleFS and/or kept in RAM
  • Ditch LittleFS entirely for some other FS? But I'm not seeing a lot of better options
  • Do a deep dive into LittleFS internals to try and optimise the caching / read path. A lot of people are looking into write performance, but for us it seems that reading is the issue

Very interested to hear everyone's thoughts @JF002 maybe you'd be interested in reading this too

mark9064 avatar Jan 27 '25 01:01 mark9064

Very interesting, thank you for the research. It does seem like LittleFS in its current state doesn't work great for the resources use case, maybe it would be worth changing lfs_config values and seeing if it improves meaningfully? If not, I could get behind a very simple FS for fonts/resources only. Given how read heavy our workload is I don't think wear leveling is mandatory. I would also be curious to see raw SPI read performance, perhaps the issue goes deeper than LittleFS?

pipe01 avatar Jan 27 '25 01:01 pipe01

The cache size improves performance a bit as seen above, but I don't think we can realistically increase it much past 128: LittleFS needs two cache_size buffers which are alive constantly and then one for each open file. I have just tried to test 256, but the filesystem immediately corrupts (LFS_ERR_CORRUPT). No clue why. Points to more broken internals :(. Changing the lookahead size has no impact, makes sense as it's for writes. Disabling inline files had no impact. Also, I am now sure that the slowdown is caused by uncompacted metadata, as calling lv_fs_gc with a low compact_threshold caused the time to drop down to the minimum (around 90). So the metadata_max parameter could be use to help bound maximum time at the cost of more wasted space (and possibly slower block scanning) - see https://github.com/littlefs-project/littlefs/issues/827#issuecomment-1559926210 and https://github.com/littlefs-project/littlefs/issues/965#issuecomment-2059696525

And that's pretty much all of the available parameters


So I did 3 things:

  • fixed name_max to be 50 (previously it was actually 255, because it uses a compiler define for the actual struct size)
  • set metadata_max to 256
  • cache_size still 128
  • still using custom loader

G7710 now loads in 23 ticks!

Downsizing the cache to 16 sees load times around 40 ticks.

Using the original loader + cache size 16 takes around 150 ticks.

So nevermind! It looks like LittleFS can stay, which is great news. I think the original cache size still performs fine, but I might test 32/64 to see how they do. Additionally, load times are now stable and don't vary with the internal filesystem state. Thanks for the recommendation to try all the parameters! I probably would have overlooked metadata_max otherwise

mark9064 avatar Jan 28 '25 01:01 mark9064

This is all done code wise, but there's a nasty bug (littlefs I suspect) prevent fonts bigger than 1 sector (4K) from being loaded. Once I've got that figured I'll open a PR

mark9064 avatar Feb 07 '25 21:02 mark9064