mwdb-core
mwdb-core copied to clipboard
Multiple object relations performance
Feature Category
- [ ] Correctness
- [ ] User Interface / User Experience
- [X] Performance
- [ ] Other (please explain)
Describe the problem
A file with 1400 related samples takes around 30-40s to load.
Clicking on Relations tab in file details kills the browser tab with this amount of relations.
Describe the solution you'd like
Implement some kind of pagination on related samples.
OR/AND
Improve the process of getting related samples from the db e.g. optimize the db joins.
Describe alternatives you've considered
Increasing number of processes available.
Increasing requests timeouts.
Both slightly helped, but we can scale like this just for a while, before we will start hitting the limits again.
Hi,
thanks for creating new issue!
I can see the problem with Relations tab, but I don't fully understand solutions you suggested.
I have some questions:
- could you describe your idea for paginating related samples?
- displaying a graph with 1400 nodes and making it readable is generally a hard task. How would you like MWDB to display it?
Could you describe your use case, please? Maybe we can find another solution, which doesn't require rendering the graph.
Or maybe some kind of warning would be a good solution?
Something like: Graph you want to display is very big, which may cause performance issue. Are you sure? Yes | No
Relations are eagerly loaded by GET /api/object/{identifier} (see parents and children in scheme). If there are thousands of children: MWDB loads information about all of them.
There are few things we can do in API:
- add limitation for
parents/childrenserved by ObjectItemResponseSchema (https://github.com/CERT-Polska/mwdb-core/blob/master/mwdb/schema/object.py#L96) e.g. to 50 objects - allow to load more relations via
/api/{type}/{identifier}/relationsbut we need to redesign that endpoint a bit:- it should allow to load entries in the same way as listing endpoints (accepting older_than and count), but we need to ensure that entries are in stable order (e.g. are sorted by
relation.creation_time) - maybe it should be additionally divided to
/api/{type}/{identifier}/relations/parentsand/api/{type}/{identifier}/relations/children
- it should allow to load entries in the same way as listing endpoints (accepting older_than and count), but we need to ensure that entries are in stable order (e.g. are sorted by
- use that pagination in UI and mwdblib
For graph: there could be an extra node with text more or 999 more... that loads next chunk of children/parents into graph after click.
I guess we will opt for removing relationship information from ObjectItemResponseSchema and make endpoints that will allow iteration
Initiated by https://github.com/CERT-Polska/mwdb-core/pull/881