clusterdata icon indicating copy to clipboard operation
clusterdata copied to clipboard

Microservice MSCallGraph: a call is recorded more than twice

Open MaxLanLiu opened this issue 3 years ago • 4 comments

When explaining rpcid in the MSCallGraph dataset, README says “Note that, the call via remote invocation is recorded twice with the same rpcID in the UM and DM independently.” However, within the same trace (i.e. the traceid is the same), we found more than two occurrences of the same rpcid. Does it mean that the same rpc is sampled multiple times? Or does it mean different calls? (Also, many of those have rpctype mc, which means the DM is Memcached.) Thank you!

rpcid

MaxLanLiu avatar Feb 18 '22 20:02 MaxLanLiu

Thanks for your question. The call recorded twice is via remote invocation between two stateless microservices instead of mc between a stateless microservice and a stateful microservice. In the case you mentioned, it should be different calls. The duplication of rpcid ("0.1.3.1.3") is caused by users’ mistakes. For example, microservices start multiple asynchronous threads to handle requests without forwarding the contexts of call information for recording the traces. As a result, the rpcid is duplicated. The right record of "0.1.3.1.3" should be "0.1.3.1." ( represents a number rather than 3), and other information like the response time, um and dm is correct. All duplicated upstream microservices with rpcid "0.1.3.1.3" are all the downstream microservices with rpcid "0.1.3.1", and therefore, you can build call graphs on duplicated traces. Or you can filter those duplicated records when building a graph.

niewuya avatar Feb 22 '22 13:02 niewuya

Hi, thank you for your response! I would like to double check with you if my following understanding is correct. Please let me know if the claims below are correct. If any of these is incorrect, could you kindly explain why? Thank you for your patience!

  1. Each unique rpcid represents a unique communication call.

  2. Within one trace, the number of occurrences of a unique rpcid is either 1 or 2. In other words, within one trace, we should not see the same rpcid in more than two rows (where the traceid are the same) in the data table.

  3. Within one trace, if a call has rpctype mc, db, or mq, then the call's rpcid should only occur once. If we see more than one occurrence like what's in the screenshot below, then it must be made by users' mistakes. Each of these rows is actually a different inter-process communication call to Memcached. mc

  4. Within one trace, if an rpcid occurs twice, then the two rows represent the same remote invocation between two stateless microservices. Hence the rpctype must be rpc or http. Additionally, the two rows must have the same um, same rpctype, same dm and same interface. The two rows might have same or different timestamps. Further, one of the rows must have rt >=0 and the other's rt must be <=0. (Two examples that fit this description are in the screenshot below. The rows are taken from the file MSCallGraph_0.) same rpcid

MaxLanLiu avatar Feb 22 '22 20:02 MaxLanLiu

Actually, I found the following: there exist two rows with the same traceid and same rpcid, but the two rows have different rpctypes (see the screenshot below). Is this also a result of users' mistakes? Thanks! same rpcid with different rpctype

MaxLanLiu avatar Feb 22 '22 20:02 MaxLanLiu

1.Each unique rpcid represents a unique communication call.

Yes. You are right.

  1. Within one trace, the number of occurrences of a unique rpcid is either 1 or 2. In other words, within one trace, we should not see the same rpcid in more than two rows (where the traceid are the same) in the data table.

Yes. But there are also some exceptions due to the users’ wrong behavior in Alibaba. And you can filter those data.

3.Within one trace, if a call has rpctype mc, db, or mq, then the call's rpcid should only occur once. If we see more than one occurrence like what's in the screenshot below, then it must be made by users' mistakes. Each of these rows is actually a different inter-process communication call to Memcached.

Yes. You are right.

4.Within one trace, if an rpcid occurs twice, then the two rows represent the same remote invocation between two stateless microservices. Hence the rpctype must be rpc or http. Additionally, the two rows must have the same um, same rpctype, same dm and same interface. The two rows might have same or different timestamps. Further, one of the rows must have rt >=0 and the other's rt must be <=0. (Two examples that fit this description are in the screenshot below. The rows are taken from the file MSCallGraph_0.)

Yes. It also happens that rpcid via rpc or http is only recorded once due to the miss of trace item.

  1. Actually, I found the following: there exist two rows with the same traceid and same rpcid, but the two rows have different rpctypes (see the screenshot below). Is this also a result of users' mistakes?

Yes. Due to the complex production environment, it happens some exceptions in traces. I recommend that you can filter those exceptions based on the rule in README.

Thanks for your interest. If there are any problems or confusion, please let me know.

niewuya avatar Mar 01 '22 09:03 niewuya