server
server copied to clipboard
Understanding Trace File
Hey,
Basic Info
We are trying to create a production ready triton container and get its request logs. For that we found the trace triton configurations that give us exactly what we want. We are trying to achieve metrics info about each request sent to the server. The reason we are not using the prometheus & /metrics route will be written by my co-worker in the next comment in this Issue (questions).
Usage Info
We raise the container on k8s with a logstash sidecar container that reads the trace json file and send it to elastic (each logstash read will delete what it read from the json file in order to not have an infinitely growing file).
We have some questions about the intention of this feature and its functionality:
TRITON VERSION -> 22.02-py3 -> Documentation -> Trace Configurations
$ tritonserver --trace-file=/tmp/trace.json --trace-rate=100 --trace-level=TIMESTAMPS ...
TRITON VERSION -> 21.12-py3 -> Documentation -> Trace json file structure
[
{
"model_name": $string,
"model_version": $number,
"id": $number
"parent_id": $number,
"timestamps": [
{ "name" : $string, "ns" : $number },
...
]
},
...
]
----------------------------------------------------------------------------
[
{
"model_name": "simple",
"model_version": -1,
"id":1,
"timestamps" : [
{ "name": "http recv start", "ns": 2259961222771924 },
{ "name": "http recv end", "ns": 2259961222820985 },
{ "name": "request handler start", "ns": 2259961223164078 },
{ "name": "queue start", "ns": 2259961223182400 },
{ "name": "compute start", "ns": 2259961223232405 },
{ "name": "compute end", "ns": 2259961230206777 },
{ "name": "request handler end", "ns": 2259961230211887 },
{ "name": "http send start", "ns": 2259961230529606 },
{ "name": "http send end", "ns": 2259961230543930 }
]
}
]
- In the docs you set the
trace-rate
to 100 and I think the default is either 1000 or 100. As said above, we want the data given by this feature per request (setting thetrace-rate
to 1), Is this the correct use for the intent of the feature? (as said above logstash will delete the read data from the file -> sort of a retention policy). - We have encountered a mysterious thing (triton 21.12-py3/21.09-py3), the logs of this trace feature aren't written to the trace json file until the tritonserver is exited (not the container, the command). That makes no sense to us and we found a parameter that can be configured only in the 22.02-py3 triton version (not checked the 22.01-py3) -> the
trace-log-frequency
parameter which would be the equivalent of a bulk setting, the problem is we cant find how to set it in the 21.09/21.12 versions. Furthermore, what would you advise the correct amount set it would be considering the IO involved and how does the IO effect the availability of the server if at all? - The previous question raises an interesting point about the features added in Triton per version. I have a pytorch model but the pytorch backend that has the pytorch version I want is supported only up until Triton version 21.09-py3, but at the same time I obviously want this insane feature. The only thing we have thought to do is maybe set the
--backend-directory
parameter that holds the pytorch backend directory given in 21.09-py3 backends directory, but that also has its possible future issues, the main one which is that by my understanding the pytorch backends uses the python backend python stub which makes its dependencies set by the default python version set on the server (which coincidentally I feel is 3.8 between all the versions I have used so far, but that doesn't fell future proof). It feels weird that extra features like this come at an expense of backends versioning. - As shown in docs, the json file data structure is great. We have tried it in the 21.12-py3 version and we have gotten a different result, a usable one but not as good and nice to work with as the one in the docs.
[
{
"id": $number,
"model_name": $string
},
{
"id": $number,
"model_version": $number
},
{
"id": $number,
"http recv start": $number
},
...
]
This is the structure we receive, the id will be the same per request traced so we can achieve the structure shown in the docs, but maybe it is an error or we have forgotten to set something. What should we do?
Hi @bro-adm, I'll try to answer the questions in a few comments to keep it easier to read.
1. Trace Rate
In the docs you set the trace-rate to 100 and I think the default is either 1000 or 100. As said above, we want the data given by this feature per request (setting the trace-rate to 1), Is this the correct use for the intent of the feature?
Setting --trace-rate=1
should trace for every request as you describe.
CC @GuanLuo @tanmayv25 in case you are aware of any issues with doing this.
2. Trace/log frequency
We have encountered a mysterious thing (triton 21.12-py3/21.09-py3), the logs of this trace feature aren't written to the trace json file until the tritonserver is exited (not the container, the command). That makes no sense to us and we found a parameter that can be configured only in the 22.02-py3 triton version (not checked the 22.01-py3) -> the trace-log-frequency parameter which would be the equivalent of a bulk setting, the problem is we cant find how to set it in the 21.09/21.12 versions
Yes the --log-frequency
setting was introduced recently in 22.02 as you point out: https://github.com/triton-inference-server/server/pull/3849. The behavior you see in earlier versions where the trace was only logged when the server exits is expected, and is now the default behavior when --log-frequency=0
unless you set this to a different value. Here is the help output for the flag from tritonserver
:
- "log_frequency" : the frequency that Triton will log the
trace output to the files. If the value is "0", Triton will only log
the trace output to ${trace_file} when shutting down. Otherwise, Triton will log
the trace output to ${trace_file}.${idx} when it collects
the specified number of traces. For example, if the log frequency is "100",
when Triton collects the 100-th trace, it logs the traces to file
"${trace_file}.0", and when it collects the 200-th trace, it logs the 101-th to
the 200-th traces to file "${trace_file}.1". Note that the file index will be
reset to 0 when "trace_file" setting is updated.
CC @GuanLuo do we need a ticket to update the trace docs to include this new flag?
3. Pytorch versions
The previous question raises an interesting point about the features added in Triton per version. I have a pytorch model but the pytorch backend that has the pytorch version I want is supported only up until Triton version 21.09-py3, but at the same time I obviously want this insane feature. The only thing we have thought to do is maybe set the --backend-directory parameter that holds the pytorch backend directory given in 21.09-py3 backends directory, but that also has its possible future issues, the main one which is that by my understanding the pytorch backends uses the python backend python stub which makes its dependencies set by the default python version set on the server (which coincidentally I feel is 3.8 between all the versions I have used so far, but that doesn't fell future proof). It feels weird that extra features like this come at an expense of backends versioning.
@CoderHam any thoughts on this?
4. Trace JSON Output structure
As shown in docs, the json file data structure is great. We have tried it in the 21.12-py3 version and we have gotten a different result, a usable one but not as good and nice to work with as the one in the docs.
If I understand the question correctly, we document something like this:
[
{
"model_name": "simple",
"model_version": -1,
"id": 1,
"timestamps" : [
{ "name": "http recv start", "ns": 2259961222771924 },
{ "name": "http recv end", "ns": 2259961222820985 },
{ "name": "request handler start", "ns": 2259961223164078 },
{ "name": "queue start", "ns": 2259961223182400 },
{ "name": "compute start", "ns": 2259961223232405 },
{ "name": "compute end", "ns": 2259961230206777 },
{ "name": "request handler end", "ns": 2259961230211887 },
{ "name": "http send start", "ns": 2259961230529606 },
{ "name": "http send end", "ns": 2259961230543930 }
]
}
]
but the user is seeing it broken out pairwise like this:
[
{
"id": $number,
"model_name": $string
},
{
"id": $number,
"model_version": $number
},
{
"id": $number,
"http recv start": $number
},
@GuanLuo @tanmayv25 any thoughts on this? We may want to update the docs if we changed the structure intentionally.
@rmccorm4
- Pytorch versions
The thing is, we would think that maybe the Trace feature could come in handy as a separate image in itself that acts as a proxy container (sidecar container) or just another deployment that can receive all the request that the server gets (with the hep of API Gateways) when run on k8s (while still having the possibility to have this in the tritonserver image for those that run the server without k8s). The disadvantages would be that it would mean to handle all the routes given y the server as well if run as a proxy or just handling the grpc and http infer routes. The proxy solution would require more resources per server deployment, whilst Trace as a second deployment would mean other resources are used (not per server) but their would be a single point of failure. As a bonus this would also downgrade the size of the server image which in itself is great.
Side Note
We like the fact that the Trace feature writes to a file and lets us configure the extra usability of it like a sidecar logstash container where we can have the full configurability of where to write the data and how (instead of this feature implementing only one ability and needing to allow all configuration itself).
Yes, the trace documentation needs to be updated, the trace log structure is no longer the same as what is described in the documentation, @rmccorm4 can you file a ticket to track this?
@bro-adm The backend is decoupled from Triton core, so you should be able to change the backend to the version that you intend to use while using the newer version of Triton. Of course you need to satisfy the backend requirement as you described, but Triton itself doesn't have too many external dependencies, so you can probably copy Triton to the environment that the desired backend can run on.
Ok, we have decided to try and migrate our models to a compatible backends version accessible in this newer triton versions.
We are having use cases when we set the log-frequency to 1000, but then we have a slower inference hour and we would like to have a log-frequency by time parameter as well. That being said, triton should be configurable by both log frequencies at the same time -> --log-frequency-amount=y -- log-frequency-time=x -> after y time from the last log written to the file it writes again (not just after every y time).
Is this possible?
In recent triton versions there is a trace API that you can use to change the trace parameters: https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_trace.md
The trace logging is event based so it is hard to write to file periodically. However, the collected traces will be written to file in the case of updating trace setting, so you can utilize the trace API mentioned above to trigger the write on demand.
Hi, we have another question regarding the trace file:
We tried using the trace file in order to extract some timestamps of each request (more specifically the timestamp of the request and the response), but it seems that the timestamps in the tracefile are not actually absolute timestamps, but timestamps generates using steady_clock
(a monotonic clock).
As I understand it, it is not possible to convert the timestamps in the tracefile to an actual time (datetime), which is what we need.
Is it possible to add a flag to make the timestamps in the tracefile be an epoch timestamp instead of what it currently is?
Thanks!
I think we can add a cmdline option to let tracer convert steady_clock timestamp to system_clock timestamp. But note that there can be inaccuracy because "system time can be adjusted at any moment".
@rmccorm4 @GuanLuo @nadavs2310
Schema and Values
We are working on a k8s cluster, so we need to somehow manage the timestamps per the nodes as its currently implemented and their is no value for a time if its related to the node the deployment is set in (especially when each replica can be on a different node).
We have found a way with initContainers and stuff like that to transform the monotonic timestamps to epoch timestamps. (We have implemented the schema change transformation and the timestamp values in the logstash configurations set as side containers)
Trace per X Time
The trace logging is event based so it is hard to write to file periodically. However, the collected traces will be written to file in the case of updating trace setting, so you can utilize the trace API mentioned above to trigger the write on demand.
I think we would not like to have the events traced by demand be by changing a setting of the trace with the api. But I dont want to change a setting in order to get the trace logs -> if so I would prefer an api that does trace now. Furthermore we would prefer the trace by timing as explained above (while keeping the per x requests configuration) and you said its hard because its based on events but I guess you have somewhere a trace manager for example to manage the per x requests feature you have, so I think it would be possible to add to such a component -> a per x time trace logic no?
Trace File Overflow
each time you write to the trace file you append the time you wrote the trace -> for example, we configure trace file to be /<dir>/trace.json
so each time you wrute to a different file -> trace1.json, trace2.json, trace3.json ...
.
The problem with this is that the server is something that should remain alive because its production ready for most use cases meaning sometime in the distant future the deployed server trace file will have a overflow number of amount of chars in the name of the file.
Solution
We think you should keep writing always to the the file configured trace.json
Problem
Here as well the file can become really big -> thus both cases have a future problem, but the one we thought about is manageable
Explanation
There is nothing to do with the trace file all alone. Most if not all use cases will end up writing the data elsewhere and as explained above we do so in logstash. After every input we delete the file it took the input from (logstash config) -> meaning it would require to create the file before writing to it (as you already do).
Sorry for the late reply. I misunderstood your earlier comment on trace by time, and now it makes sense to me after reading through your clarification. Reiterating the request to make sure my understanding is correct: "In addition to the frequency setting, when determining if the request should be traced, the time lapse since the last traced request should also be considered, and the request will be traced regardless if the interval is too long." Then I think this is a good enhancement to add.
Regarding the file, the reason of adding index to file is to distinguish the traces written to disk, so that the user will know which file to look for new traces, and the traces from different model trace setting (but with the same file name) won't interfere with each other. For your use case and concern on max file name length, we may add a new trace option to use indexed name when writing traces to file based on log frequency, which will overwrite the file content if the file exists.
Terrific !!!
To sum up:
- log by time lapses
- index file rule
- epoch timestamps in the logs (instead of monotonic - as a cli command)
Thanks
Hi, Is there any update on this feature request?