joystream icon indicating copy to clipboard operation
joystream copied to clipboard

Minimise use and processing of large arrays

Open mnaamani opened this issue 1 year ago • 1 comments

We previously identified cpu intensive tasks processing large arrays, in particular the lodash differenceWith method, comparing two very large arrays. intersection used in getLocalDataObjectsByBagId() state api endpoint could also be problematic.

We should generally avoid this.

Choosing Map or Set if we are storing the data long term in memory.

For processing, try to fetch data in chunks, (gql queries with paging and result set limits). Using Async generators to make programming around this approach more efficient.

Another place where we "produce" large arrays, is with fs.promises.readdir() when reading list of objects in the uploads folder. Its okay to do it on startup, but look for places where we might do it more frequently like in the state api endpoint: getLocalDataStats()

Some examples in https://github.com/Joystream/joystream/pull/5026

mnaamani avatar Jan 19 '24 07:01 mnaamani

We can fetch the updates incrementally, even for the cleanup. We don't need to get a full list at all times, for example we can get a list of events for deleted objects since last run.

kdembler avatar Feb 02 '24 10:02 kdembler