opencti
opencti copied to clipboard
Extremely slow ingestion rate for Hygiene and ImportExternal
Description
Ingestion rate for these two connectors seem very slow even when providing more resources to the workers (CPU and RAM). Other connectors have no issue and are fast but these connectors' queues seem to be building up.
Environment
- OS (where OpenCTI server runs): docker
- OpenCTI version: 5.12.29
- OpenCTI client: python
- Other environment details:
Reproducible Steps
Steps to create the smallest reproducible scenario: None
Expected Output
Actual Output
Additional information
Listen queue is building up on both connectors, not the push queue
Screenshots (optional)
@brianyschae may be worth a check if Redis is throwing issues
https://github.com/OpenCTI-Platform/opencti/issues/4936
We are still chasing the issue, is there anyone can give any direction?
We found some logs related to message queue, for example, "('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))" and "message": "ConnectionClosedByBroker", ...
The suspect is our system recently started to ingest an unusual amount of data per day (~500K/d), and the connectors or the feeding system to the connectors are not being able to cope up with this ingestion volume,
- is this suspicion make sense?
- Does anyone know how scalable are these connectors?
Also,
- how frequently these connectors are triggered to run? The question is, if new batch of data is pushed to the queue next day (while the old data is still being processed), will another trigger create new instance of the connector?
- How the trigger mechanism works?
- Is it for each observable or for a batch of data?
Sorry for all the questions, we are trying to pin point the issue with clear understanding with the codebase which is not so familiar to us for everyday job. So, any direction / diagram / link would be super helpful!
Hello,
The hygiene connector is known to be able to handle 1 enrichment / sec. If you need more, please just spawn multiple hygiene connectors with the same ID and same token.
Kind regards, Samuel
@SamuelHassine thank you for the answer.
- do
ImportExternalReferencealso has the same speed?
In our deployment, adding more pods should do the job I guess. Thank you!
Enrichment connectors are doing the work sequentially (not in parallel), ImportExternalReference is downloading a page and generating a PDF. In some cases I think it can be even worse.
We will work on more parallel processing in the upcoming weeks.