pycti library has problem with non-ASCII characters
Description
In a stream connector, we use the "self.helper.listen_stream()" function to listen to a stream. The problem is that the data is truncated when a non-ascii character passes through it.
The use case is as follows:
- I send File Observables in a stream.
- Some of my File have non-ascii characters in the "x_opencti_additional_names" field because the file has non-ascii characters in its name.
- The pycti function truncates the response at the non-ASCII character level.
Seen in the stream
Retrieved by my connector
Environment
OCTI 6.2.16
Reproducible Steps
Steps to create the smallest reproducible scenario:
- Create a live stream with the filters "Entity type: File AND label: test-bug".
- Create a File with only a MD5 hash (no author, no marking, etc to avoid noise in the stream).
- Run in debug mode a stream connector listening your stream and with a breakpoint to the place where it processes the retrieved data.
- Add in the "name" field of the File: 2020ë ì°êµ¬ ì 문ì ë° ìììë¶ì¼ ê²½ë ¥ì¬ì ì ë° ëª¨ì§ìê°.hwp
- Add the label "test-bug" on the File to send it in the stream.
- Look at the connector side for the data retrieved. -> truncated data
Expected Output
Have the whole data, like what I have in my stream
@richard-julien : Are you aware of that ? I'll try to reproduce it
I remember one case where we was not able to reproduce. If we have a good repro case, we need to fix that :)
Ping me for the repro case if needed @romain-filigran
@romain-filigran & @Lhorus6 did you manage to reproduce? Is that an issue that we need to handle?
Yes I did it myself (screenshots are mine)
Reproduced this morning with @Megafredo : the issue is that the lib used to read SSE Event truncates the message, i.e. the 'msg.data' content at this line is already truncated (missing closing } for json data) https://github.com/OpenCTI-Platform/client-python/blob/b539fff059434471aead2144b89b66db78c7f9f4/pycti/connector/opencti_connector_helper.py#L631
So I think that the fix must be ahead and done in filigran-sseclient lib.
https://github.com/OpenCTI-Platform/client-python/pull/795