Question about 'UNAVAILABLE' and 'UNKNOWN' values in MSCallGraph data
Subject: Question about 'UNAVAILABLE' and 'UNKNOWN' values in MSCallGraph data
Description:
Hello,
I hope this message finds you well. Firstly, I would like to express my sincere gratitude for open-sourcing the 'cluster-trace-microservices-v2022' dataset. It has been instrumental in my research efforts.
I have been exploring the MSCallGraph data, and I noticed that the 'um' column contains numerous 'UNAVAILABLE' and 'UNKNOWN' entries. Specifically, I have a few questions regarding this, illustrated with the following dataset snippet:
| timestamp | traceid | service | rpc_id | rpctype | um | uminstanceid | interface | dm | dminstanceid | rt |
|---|---|---|---|---|---|---|---|---|---|---|
| 666816332 | T_20087312301 | S_38528029 | 0 | http | USER | USER | 7H3KUQVcpx | MS_3386 | MS_3386_POD_1 | 3.0 |
| 666816333 | T_20087312301 | S_38528029 | 0.1 | rpc | UNAVAILABLE | UNAVAILABLE | 7H3KUQVcpx | MS_70648 | MS_70648_POD_170 | 1.0 |
| 666816333 | T_20087312301 | S_38528029 | 0.1.1 | http | UNKNOWN | UNAVAILABLE | 7H3KUQVcpx | MS_38945 | MS_38945_POD_107 | 0.0 |
| 666816333 | T_20087312301 | S_38528029 | 0.1.1.1 | http | MS_38945 | MS_38945_POD_107 | MaC4sWx7iJ | MS_30732 | MS_30732_POD_33 | 0.0 |
My questions are:
-
What is the reason for the presence of 'UNAVAILABLE' and 'UNKNOWN' in the 'um' column? Specifically, how are records like the one above generated where 'rpc_id' seems complete, but 'um' is 'UNKNOWN' or 'UNAVAILABLE'?
-
Is it possible to infer 'um' based on 'dm'? For example, can we repair the data by substituting 'um' based on the corresponding 'dm'? The repaired data might look like this:
| timestamp | traceid | service | rpc_id | rpctype | um | uminstanceid | interface | dm | dminstanceid | rt |
|---|---|---|---|---|---|---|---|---|---|---|
| 666816332 | T_20087312301 | S_38528029 | 0 | http | USER | USER | 7H3KUQVcpx | MS_3386 | MS_3386_POD_1 | 3.0 |
| 666816333 | T_20087312301 | S_38528029 | 0.1 | rpc | MS_3386 | MS_3386_POD_1 | 7H3KUQVcpx | MS_70648 | MS_70648_POD_170 | 1.0 |
| 666816333 | T_20087312301 | S_38528029 | 0.1.1 | http | MS_70648 | MS_70648_POD_170 | 7H3KUQVcpx | MS_38945 | MS_38945_POD_107 | 0.0 |
| 666816333 | T_20087312301 | S_38528029 | 0.1.1.1 | http | MS_38945 | MS_38945_POD_107 | MaC4sWx7iJ | MS_30732 | MS_30732_POD_33 | 0.0 |
- How is the entry point MS determined in the trace? If a trace contains data with 'rpc_id=0', is the entry point considered 'USER'? In cases where there is no 'rpc_id=0' data, and 'rpc_id' starts from '0.1' with 'um' being 'UNAVAILABLE' or 'UNKNOWN', like in this example, is 'rpc_id=0.1' considered the entry point MS?
| timestamp | traceid | service | rpc_id | rpctype | um | uminstanceid | interface | dm | dminstanceid | rt |
|---|---|---|---|---|---|---|---|---|---|---|
| 666803251 | T_14180572390 | S_38528029 | 0.1 | rpc | UNAVAILABLE | UNAVAILABLE | 7H3KUQVcpx | MS_70648 | MS_70648_POD_162 | 20.0 |
| 666803254 | T_14180572390 | S_38528029 | 0.1.1 | http | UNKNOWN | UNAVAILABLE | 7H3KUQVcpx | MS_38945 | MS_38945_POD_97 | 3.0 |
| 666803254 | T_14180572390 | S_38528029 | 0.1.1.1 | http | MS_38945 | MS_38945_POD_97 | ihrQqyYug4 | MS_30732 | MS_30732_POD_0 | 3.0 |
| timestamp | traceid | service | rpc_id | rpctype | um | uminstanceid | interface | dm | dminstanceid | rt |
|---|---|---|---|---|---|---|---|---|---|---|
| 666796398 | T_1445790167 | S_156482560 | 0.1 | http | UNKNOWN | UNAVAILABLE | bHiJXTZtx1 | MS_40912 | MS_40912_POD_290 | 2.0 |
| 666796398 | T_1445790167 | S_156482560 | 0.1.1 | mc | MS_40912 | MS_40912_POD_290 | ZaYxnA3U_f | MS_58269 | MS_58269_POD_25 | 1.0 |
| 666796399 | T_1445790167 | S_156482560 | 0.1.2 | mc | MS_40912 | MS_40912_POD_290 | ZaYxnA3U_f | MS_58269 | MS_58269_POD_5 | 0.0 |
I would appreciate any insights you can provide on these matters. Thank you for your time and for maintaining this valuable dataset.
Best regards