client-python icon indicating copy to clipboard operation
client-python copied to clipboard

pycti library has problem with non-ASCII characters

Open Lhorus6 opened this issue 1 year ago • 3 comments

Description

In a stream connector, we use the "self.helper.listen_stream()" function to listen to a stream. The problem is that the data is truncated when a non-ascii character passes through it.

The use case is as follows:

  • I send File Observables in a stream.
  • Some of my File have non-ascii characters in the "x_opencti_additional_names" field because the file has non-ascii characters in its name.
  • The pycti function truncates the response at the non-ASCII character level.

Seen in the stream

image

Retrieved by my connector

Screenshot 2024-08-29 220415

Environment

OCTI 6.2.16

Reproducible Steps

Steps to create the smallest reproducible scenario:

  1. Create a live stream with the filters "Entity type: File AND label: test-bug".
  2. Create a File with only a MD5 hash (no author, no marking, etc to avoid noise in the stream).
  3. Run in debug mode a stream connector listening your stream and with a breakpoint to the place where it processes the retrieved data.
  4. Add in the "name" field of the File: 2020년 연구 ì „ë¬¸ì› 및 수자원분야 ê²½ë ¥ì‚¬ì› ì„ ë°œ 모집요강.hwp
  5. Add the label "test-bug" on the File to send it in the stream.
  6. Look at the connector side for the data retrieved. -> truncated data

Expected Output

Have the whole data, like what I have in my stream

Lhorus6 avatar Aug 29 '24 20:08 Lhorus6

@richard-julien : Are you aware of that ? I'll try to reproduce it

romain-filigran avatar Sep 04 '24 07:09 romain-filigran

I remember one case where we was not able to reproduce. If we have a good repro case, we need to fix that :)

richard-julien avatar Sep 04 '24 11:09 richard-julien

Ping me for the repro case if needed @romain-filigran

Lhorus6 avatar Sep 04 '24 12:09 Lhorus6

@romain-filigran & @Lhorus6 did you manage to reproduce? Is that an issue that we need to handle?

nino-filigran avatar Sep 30 '24 07:09 nino-filigran

Yes I did it myself (screenshots are mine)

Lhorus6 avatar Sep 30 '24 22:09 Lhorus6

Reproduced this morning with @Megafredo : the issue is that the lib used to read SSE Event truncates the message, i.e. the 'msg.data' content at this line is already truncated (missing closing } for json data) https://github.com/OpenCTI-Platform/client-python/blob/b539fff059434471aead2144b89b66db78c7f9f4/pycti/connector/opencti_connector_helper.py#L631

So I think that the fix must be ahead and done in filigran-sseclient lib.

aHenryJard avatar Oct 17 '24 08:10 aHenryJard

https://github.com/OpenCTI-Platform/client-python/pull/795

flavienSindou avatar Dec 24 '24 12:12 flavienSindou