joystream
joystream copied to clipboard
Minimise use and processing of large arrays
We previously identified cpu intensive tasks processing large arrays, in particular the lodash
differenceWith
method, comparing two very large arrays. intersection
used in getLocalDataObjectsByBagId()
state api endpoint could also be problematic.
We should generally avoid this.
Choosing Map
or Set
if we are storing the data long term in memory.
For processing, try to fetch data in chunks, (gql queries with paging and result set limits). Using Async generators to make programming around this approach more efficient.
Another place where we "produce" large arrays, is with fs.promises.readdir()
when reading list of objects in the uploads folder. Its okay to do it on startup, but look for places where we might do it more frequently like in the state api endpoint: getLocalDataStats()
Some examples in https://github.com/Joystream/joystream/pull/5026
We can fetch the updates incrementally, even for the cleanup. We don't need to get a full list at all times, for example we can get a list of events for deleted objects since last run.