trice Processing with `trice l` appears to be capped at a certain size

Is your feature request related to a problem? Please describe. I'm implementing the backend data handling for a fleet of IoT devices that use trice for logs. Part of this involves taking the binary messages that we receive from the devices, decoding the messages, and inserting the resulting textual logs in a database so we can filter and visualize the logs in Grafana.

The way this is done is that I have a script that writes the binary data to file and runs the trice l command to decode each file and capture the output which I can then insert into the database.

What I've noticed is that if the input binary size is "large" trice only reads parts of the file before stopping. It doesn't stop with a non-zero return code, it just stops. For example, I had a 50K input binary file, and the result was only 20K of textual log. Since this seemed too small, I tried just taking the first 4K of that 50K file and the result was exactly the same output so trice was clearly ignoring most of the file.

I've found that if I limit the files to 2K it seems to work fine.

So I'm wondering if you are aware of there being a limit in the tool that could explain this? If so, would it be possible to lift this?

We're on a version just after 0.56.4, git SHA is 2f519e051fd92e7c1a0e0ddc5fb9872d72d11d39

Describe the solution you'd like A trice command that doesn't have a limit on the input binary size, or at least one that can handle bigger files than 2K.

Describe alternatives you've considered

Additional context I can control the size of the binary files, our devices send very small binary chunks, but for efficiency I recombine them on the backend so that I can process them faster, right now I've limited that recombination so it won't create files of more than 2K.

Dec 22 '23 08:12 andnofence

To investigate this, it would be helpful if you provide a binary log file causing the problem together with the til.json file. If your data sensible, I could try to generate such data by my own. Anyway please give details about the used trice switches.

The trice tool was written, reading continues binary data in mind and not reading files. Therefore I didn't test that intensive. I am sorry for that and will try to fix that asap.

Dec 22 '23 21:12 rokath

Thanks for the quick response @rokath. Having thought about this a little more I realized that I can't recombine the files like I was doing as I lose important information. The trice tool itself is very fast, the primary inefficiency comes from using Python to script the processing.

But what would be very useful for our use case would be having a library version of the tool as that would allow us to natively decode the binary files instead of forking out the trice subprocess. Do you think exposing the decoding code as a (Go) library would be a lot of work?

Dec 23 '23 10:12 andnofence

Am I right, that you ask for a Go package which you can use in your own Go program for decoding the binary data, @andnofence? That exists and is called "trexDecoder". It should also be possible to use the trice tool with CLI switches inside a script. I could add some CLI switches helping with your task. For that I need to understand more details. If you can, avoid Python in time critical paths. BTW: Learning Go is a children's play with some programming know-how. It is an incredible powerful language.

Dec 23 '23 11:12 rokath

Did you realize the display server option? It could be the solution for you.

Dec 23 '23 11:12 rokath

The binary encoding is documented, what allows to write your own decoder in any environment. For that it may be easier to use COBS instead of TCOBS or simply use an escape character for framing.

Dec 23 '23 23:12 rokath

The display server is interesting, although it would add quite a bit of complexity to the pipeline. To give some context: We collect binary payloads from our devices in the field and store them in our database. Then we have a job that runs in the cloud and pulls out these binary payloads and feed them to the trice CLI to decode them. The overhead of this is fine for now, but it might become a problem as the data volumes grow.

Thanks for the tip on trexDecoder, will definitely check that out. If we can speed things up by a factor of 10x using native Go I think that will be good enough for a very long time. It's been quite some years since I wrote much Go code, but I agree with you, it was one of the easiest languages I've picked up!

Dec 25 '23 21:12 andnofence

The trice tool supports with -p and -args from where it reads the binary data. Internally a Reader is used. Maybe it makes sense to extend the existing options with a database read to avoid the additional job.

If you store the IoT device ID with its binary data, you can sort the log lines later (-prefix).

If getting the IoT device ID is not possible you could add it to the logs, for example using a few target timestamp bits.

Depending on your use case you could avoid target timestamps at all and record their storage time.

Dec 27 '23 11:12 rokath

Integrating database reading into the tool wouldn't help for my use case, the overhead primarily comes from having to create a shell subprocess for the trice command. Library support (either in Go or in Python) means everything can happen in-process which is a lot more efficient, so I'll look into the trexDecoder code when I have some time.

I noticed you have a task on supporting Grafana (#146) and we actually use Grafana to visualize the data. I think the data pipeline stacks will be unique to each company, but Python is generally the language of choice for data pipelines so having the decoder available as a Python library would probably be popular. I've never written Python libraries to wrap Go code, but it might also be an option to avoid reimplementing the decoding code in Python.

Dec 27 '23 15:12 andnofence

Can we close this?

Feb 18 '24 21:02 rokath

trice trice copied to clipboard

Processing with `trice l` appears to be capped at a certain size

trice
trice copied to clipboard