TruBudget icon indicating copy to clipboard operation
TruBudget copied to clipboard

API heap problems SP 8

Open mayrmartin opened this issue 2 years ago • 1 comments

Description 😯

The API runs out of heap in deployment. This is probably due to missing cache invalidation. There is a cache invalidation function but the function only gets executed on API restart.

There are several ways to solve this issue i.e. using a TTL or a Ring-Buffer or even a complete rewrite. If a complete rewrite is chosen as a solution 3rd party libraries (also have a look at the data structures & their limitations)

NB: This affects Trubudget 1.x and 2.0 so make sure to apply possible changes to both versions

mayrmartin avatar Apr 19 '22 09:04 mayrmartin

TODOs:

  • Check if cache working as expected via load test script
  • Find and implement a solution to invalidate cache regularly

Stezido avatar May 05 '22 10:05 Stezido

Writing a couple of ideas down:

Remove cache

  • Reads without cache take a long time, I don't know if blockchain can handle too many IO requests

Improve existing cache

  • in mutex, replace polling by events
  • fine grained cache locking (get, set, lock by key) - don't lock whole cache when it's not necessary.
  • eviction strategy for managing cache size (TTL, least recently used, ...)
  • separate aggregation functionality from caching. if there are updates on stream 1, aggregated projects(subprojects, workflowitems) in cache prevent us from locking just stream1 cache. aggregates need to be locked too in this case.
  • at the moment, cache locks and checks for updates on every api request. is there a more suitable heuristic when to check BC for updates? using walletnotify or walletnotifynew, message can be passed to cache through socket, pipe or docker IPC, informing it about new data on chain. (for example https://github.com/openkfw/TruBudget/pull/1384). Problem is, if data is requested immediately, it might not be in cache yet (at the moment individual request for data checks blockchain for cache updates, which guarantees the client does not receive stale data) image

Cache granularity

logical progression from the roughest to some finer granularity would be to split the cache by streams. however, if cache gets big, evicting a single stream seems to be too severe. All the data clients need in the main page of TB UI is a bare minimum that should be available at all times. This data needs to be identified and kept "hot".

Reimplement cache with lru-cache npm library

  • Some of the above points apply, except we don't have to implement lower level cache functionality like eviction, locking mechanism, etc.

Remove cache from server, save chain snapshots on chain

Use DB for data reads, sync with chain using walletnotify

image

SamuelPull avatar May 23 '23 14:05 SamuelPull

Lets pursue the option with removing cache from server and saving chain snapshots on chain, this seems like the best option at the moment.

I can imagine the following TODOs (might change in time after further analysis):

  • [x] periodically save the results of eventsourcing each stream in a separate event (as a json) in the chain. You could introduce a new event type like e.g. `project_snapshot_published``
  • [x] each time the snapshot is published, eventsourcing is performed since the latest snapshot and only the latest events are applied on the previous snapshot - that means the eventsourcing needs to be updated as well
  • [ ] publishing normal events works as it did previously
  • [ ] in-memory api cache can be removed, during read operations just the snapshot needs to be read and the latest x events applied. -

Things that still need clarification:

  • [ ] think about how to avoid redundant updates
  • [ ] think about how ofter the snapshot needs to be published
  • [ ] what happens during provisioning

georgimld avatar Jun 14 '23 15:06 georgimld