mwdb-core icon indicating copy to clipboard operation
mwdb-core copied to clipboard

Multiple object relations performance

Open pokorny-martin opened this issue 2 years ago • 4 comments

Feature Category

  • [ ] Correctness
  • [ ] User Interface / User Experience
  • [X] Performance
  • [ ] Other (please explain)

Describe the problem

A file with 1400 related samples takes around 30-40s to load.
Clicking on Relations tab in file details kills the browser tab with this amount of relations.

Describe the solution you'd like

Implement some kind of pagination on related samples.
OR/AND
Improve the process of getting related samples from the db e.g. optimize the db joins.

Describe alternatives you've considered

Increasing number of processes available. Increasing requests timeouts.
Both slightly helped, but we can scale like this just for a while, before we will start hitting the limits again.

pokorny-martin avatar Mar 15 '23 10:03 pokorny-martin

Hi, thanks for creating new issue! I can see the problem with Relations tab, but I don't fully understand solutions you suggested. I have some questions:

  1. could you describe your idea for paginating related samples?
  2. displaying a graph with 1400 nodes and making it readable is generally a hard task. How would you like MWDB to display it?

Could you describe your use case, please? Maybe we can find another solution, which doesn't require rendering the graph.

Or maybe some kind of warning would be a good solution? Something like: Graph you want to display is very big, which may cause performance issue. Are you sure? Yes | No

Repumba avatar Mar 15 '23 14:03 Repumba

Relations are eagerly loaded by GET /api/object/{identifier} (see parents and children in scheme). If there are thousands of children: MWDB loads information about all of them.

There are few things we can do in API:

  • add limitation for parents/children served by ObjectItemResponseSchema (https://github.com/CERT-Polska/mwdb-core/blob/master/mwdb/schema/object.py#L96) e.g. to 50 objects
  • allow to load more relations via /api/{type}/{identifier}/relations but we need to redesign that endpoint a bit:
    • it should allow to load entries in the same way as listing endpoints (accepting older_than and count), but we need to ensure that entries are in stable order (e.g. are sorted by relation.creation_time)
    • maybe it should be additionally divided to /api/{type}/{identifier}/relations/parents and /api/{type}/{identifier}/relations/children
  • use that pagination in UI and mwdblib

For graph: there could be an extra node with text more or 999 more... that loads next chunk of children/parents into graph after click.

psrok1 avatar Mar 16 '23 14:03 psrok1

I guess we will opt for removing relationship information from ObjectItemResponseSchema and make endpoints that will allow iteration

Initiated by https://github.com/CERT-Polska/mwdb-core/pull/881

psrok1 avatar Oct 11 '23 16:10 psrok1