viztracer
viztracer copied to clipboard
Write a compressor/decompressor for the trace log file
Now the trace log file is huge, which is okay on local machines. However, it makes it difficult to share the trace file through network, or to store it somewhere in the cloud.
Most of the info in the trace file is duplicated and we should be able to get a very decent compress ratio for the trace file.
I'm trying to write compressor for vizTracer , but i do not know how to start up with a project of c mixed with python . maybe , could u tell me a simple way to start a debug env with the "vcompressor" . in my point , i just need to focus on the impl of the process of compression.
You can start with trace-log-compressor
branch, then follow this documentation to setup the environment. You should be able to do viztracer --compress <your_result.json>
to trigger the existing function.
The example testcase for vcompressor will just fail as the input vdb_multithread.json
is missing.
I tried to generate this file using tests/data/vdb_multithread.py
with --vdb
option and only got a 2.35 KB json, which I would expect larger than exsiting multithread.json
(105.22KB) as the help of --vdb
option says that it will bring overhead.
By the way, it seems that ci is disabled for the new branch trace-log-compressor
in this repo. Is it intentional?
vdb_multithread.json
is checked in and the example test case uses get_json_file_path("vdb_multithread.json")
to locate it in test/data
directory. Did you run the test and it failed for you?
-vdb
is actually deprecated, but the help message means that it will bring overhead time-wise. You can check the json file and figure out what is not there, FEE or file info.
The CI is disabled on push to any branches other than master
. However, it is enabled for all the pull requests to any branch.
vdb_multithread.json
is checked in and the example test case usesget_json_file_path("vdb_multithread.json")
to locate it intest/data
directory. Did you run the test and it failed for you?
I've double checked and confirmed that there is no vdb_multithread.json
in tests/data/
which get_json_file_path
trys to locate.
You are correct, this is an error on my side, will fix soon.
The latest fix is pushed to trace-log-compressor
branch. There's no need to use vdb_multithread.json
, it can simply use multithread.json
. Please pull from the branch to your repo.
Should we consider using protobuf? I think it is good for serializing and de-serializing, and it is popular in RPC cases.
protobuf is not for compressing. It's an alternative for json.
The current basic test fails on Windows and I'd love to leave this bug to you guys as a first issue to work on!
The document doesn't seems to illustrate the save format of other event, e.g. instant event. Is this determined by me?
The document doesn't seems to illustrate the save format of other event, e.g. instant event. Is this determined by me?
I would suggest to make a PR the the protocol first. We can discuss the design, then you can implement it after the design is accepted.
For instant event, I consider using the following format: header(header) - pid(pid) - tid(tid) - name(str) - count(uint64) - [start(ts) - scopes(str)]*
For instant event, I consider using the following format: header(header) - pid(pid) - tid(tid) - name(str) - count(uint64) - [start(ts) - scopes(str)]*
header(header) - pid(pid) - tid(tid) - name(str) - scopes(str)-count(uint64) - [start(ts)]* may be a better protocol.
For instant event, I consider using the following format: header(header) - pid(pid) - tid(tid) - name(str) - count(uint64) - [start(ts) - scopes(str)]*
header(header) - pid(pid) - tid(tid) - name(str) - scopes(str)-count(uint64) - [start(ts)]* may be a better protocol.
Should probably get a better idea of instant events. For example, args
is a critical part of an instant event which contains the data that user logs.
Should probably get a better idea of instant events. For example,
args
is a critical part of an instant event which contains the data that user logs.
Since args
has to be a jsonifiable object, I think we could simpily dump args to string and save it.
header(header) - pid(pid) - tid(tid) - name(str) - scopes(str)-count(uint64) - [start(ts)-args(str)]*
Should probably get a better idea of instant events. For example,
args
is a critical part of an instant event which contains the data that user logs.Since
args
has to be a jsonifiable object, I think we could simpily dump args to string and save it.header(header) - pid(pid) - tid(tid) - name(str) - scopes(str)-count(uint64) - [start(ts)-args(str)]*
This is probably doable. You can check in the README first then start prototyping on it.
Maybe it's a better way to store arg_name
and arg_value
separately, for example:
If a function is called many times and we added '--log_func_args' option. Store the string directly may be redundant because the arg_name
is stored many times.