erpc icon indicating copy to clipboard operation
erpc copied to clipboard

Arbitrated client stucks -> dead lock

Open willmann9527 opened this issue 5 years ago • 16 comments

Condition: This bug occurres if a message will be lost or the server is not yet available.

Issue: If the client invokes a procedure, erpc sends the invoke out no matter if there is someone, Right depends on the driver. After that, the arbitrated transport waits forever for a reply but this reply will never received and the process will stuck -> dead lock. As well when the server is coming up.

Solution: Replace the following timeout kWaitForever by a member variable mu32_ReceiveTimeout, which is going to be initialized in the constructor and can be set by a setter method. info->m_sem.get(Semaphore::kWaitForever);

BTW: When are you going to release a new version?

willmann9527 avatar Apr 24 '20 14:04 willmann9527

Hello @willmann9527 , thank you for reporting this issue. Agree, this needs to be solved. BTW, what transport layer are you using, is it MU_transport? As for the next planned release it should be available in June.

MichalPrincNXP avatar Apr 27 '20 07:04 MichalPrincNXP

Hello @MichalPrincNXP, thank you for the rapid reply. I use my own transport layer due to the fact that a similar to IP layer is between eRPC and the physical layer. However, how do you handle a communication from one central point to two communication partners? The thing is, p_client is generated by the erpc generator and must be reassigned before we call an erpc procedure, which results in we have to write wrappers for the generated procedures to set this pointer p_client. I guess you have also a use case where you have two clients in one application and hopefully a smarter solution.

willmann9527 avatar Apr 27 '20 08:04 willmann9527

Hello @willmann9527 , if I understand your case correctly, you would like to have one eRPC client to communicate with two independent eRPC servers, correct? Then you have two transports, each for one client-server communication, and there is not necessary to have the transport arbitrator. In fact there are two independent eRPC client instances running on the client side then.

MichalPrincNXP avatar Apr 27 '20 08:04 MichalPrincNXP

Hello @MichalPrincNXP, Yes, right. How do you implement these two independent eRPC client instances running on the client side as the generated code uses just one client pointer p_client?

willmann9527 avatar Apr 27 '20 09:04 willmann9527

Yes, you are right, this is the corner case that is not handled by the generator. Unfortunately, it is up to the user to solve now (by a script or a wrapper like you did it).

MichalPrincNXP avatar Apr 27 '20 10:04 MichalPrincNXP

Our main use-case requires multiple client-server instances. To achieve this, I added instance versions of all setup functions; they don't use the static global objects. I changed the generator to create a client specific name for p_client. In the setup code, I bind the client to the client manager instance. I'm curious, if you share the same API with two instances, how do you know which instance to bind p_client with? Is there some "higher entity" which knows this? Or do you have two different components, which each happen to use the same API but on different remote connections?

RamseyTI avatar Apr 27 '20 18:04 RamseyTI

Hi @RamseyTI, we are planning to call a procedure in front of each erpc call, which sets the global client p_client to the proper remote connection. We also considered your solution but we wanted to stay as keep as possible at the erpc repository to facilitate updates from it. Frankly speaking, if the generator is capable of taking a client parameter would be the best solution for us. Then, the setup function uses a list of clients to initialize these clients.

willmann9527 avatar Apr 28 '20 05:04 willmann9527

@willmann9527, understood. Thanks for the reply.

RamseyTI avatar Apr 28 '20 16:04 RamseyTI

Hi @RamseyTI, may I have your tailored generator (solution and/or exe) and the setup files to check if it would be work for me? That would be awesome. Thank you in advance.

willmann9527 avatar May 07 '20 08:05 willmann9527

Hello @willmann9527, There are three elements to our object binding solution:

  • Modify client source template to replace global symbol with module specific name
  • Add setup functions which take object references
  • Setup code

I modified the following template to replace the global g_Client pointer with a module specific name. I used {$commonHeaderName}_cm as the module specific name.

erpcgen/src/templates/c_client_source.template

- extern ClientManager *g_Client;
+ ClientManager *{$commonHeaderName}_cm;

- RequestContext request = g_client->createRequest(true);
+ RequestContext request = {$commonHeaderName}_cm->createRequest(true);

There are several more places in the template which need the same replacement.

Modify the client setup code to add object based functions which don't reference the internal global objects.

erpc_c/setup/erpc_client_setup.h

  void erpc_client_init(erpc_transport_t transport, erpc_mbf_t message_buffer_factory);
+ void erpc_client_init(erpc_transport_t transport, erpc_mbf_t message_buffer_factory,
+         void *cm_obj, void *cf_obj, void *crc_obj);

Make a similar change to all the functions in this header file. I've attached this file for reference, but I also made similar changes to the following files:

erpc_mbf_setup.h erpc_server_setup.cpp erpc_server_setup.h erpc_transport_setup.h

And the custom transport and message buffer factories we wrote.

The client setup code will depend on callback support and transport arbitration. The example code below does both:

static ManuallyConstructed<BasicCodecFactory> lprf_client_api_bcfobj;
static ManuallyConstructed<BasicCodec> lprf_client_api_bcobj;
static ManuallyConstructed<Crc16> lprf_client_api_crcobj;
static ManuallyConstructed<TransportArbitrator> lprf_client_taobj;
static ManuallyConstructed<ArbitratedClientManager> lprf_client_api_acmobj;

static ManuallyConstructed<SimpleServer> lprf_client_cbk_ssobj;
static ManuallyConstructed<BasicCodecFactory> lprf_client_cbk_bcfobj;
static ManuallyConstructed<Crc16> lprf_client_cbk_crcobj;

extern ClientManager *lprf_api_cm;

static void lprf_client_setup(void)
{
    erpc_transport_t transport;
    erpc_mbf_t msg_buf_fact;
    erpc_transport_t arbitrator;

    /* create shared transport, arbitrator, and server objects */
    transport = erpc_transport_uart_simplelink_init(CONFIG_UART_LPRF, 115200);
    msg_buf_fact = erpc_mbf_static_init();
    arbitrator = erpc_arbitrated_client_init_ref(transport, msg_buf_fact,
            &lprf_client_api_acmobj, &lprf_client_taobj,
            &lprf_client_api_bcfobj, &lprf_client_api_bcobj, &lprf_client_api_crcobj);
    erpc_server_init_ref(arbitrator, msg_buf_fact,
            &lprf_client_cbk_ssobj, &lprf_client_cbk_bcfobj, &lprf_client_cbk_crcobj);

    /* bind interfaces to client manager */
    lprf_api_cm = lprf_client_api_acmobj;

    /* add services to rpc server */
    erpc_add_service_to_server_ref(create_BLEmesh_cbk_service(), &lprf_client_cbk_ssobj);
    erpc_add_service_to_server_ref(create_BLEstk_cbk_service(), &lprf_client_cbk_ssobj);
}

Here is the complimentary code for the server side:

static ManuallyConstructed<SimpleServer> lprf_server_api_ssobj;
static ManuallyConstructed<BasicCodecFactory> lprf_server_api_bcfobj;
static ManuallyConstructed<Crc16> lprf_server_api_crcobj;

static ManuallyConstructed<BasicCodecFactory> lprf_server_cbk_bcfobj;
static ManuallyConstructed<BasicCodec> lprf_server_cbk_bcobj;
static ManuallyConstructed<Crc16> lprf_server_cbk_crcobj;
static ManuallyConstructed<TransportArbitrator> lprf_server_taobj;
static ManuallyConstructed<ArbitratedClientManager> lprf_server_cbk_acmobj;

extern ClientManager *lprf_cbk_cm;

static void lprf_server_setup(void)
{
    erpc_transport_t transport;
    erpc_mbf_t msg_buf_fact;
    erpc_transport_t arbitrator;

    /* create shared transport, arbitrator, and server objects */
    transport = erpc_transport_uart_simplelink_init(CONFIG_UART_LPRF, 115200);
    msg_buf_fact = erpc_mbf_static_init();
    arbitrator = erpc_arbitrated_client_init_ref(transport, msg_buf_fact,
            &lprf_server_cbk_acmobj, &lprf_server_taobj,
            &lprf_server_cbk_bcfobj, &lprf_server_cbk_bcobj, &lprf_server_cbk_crcobj);
    erpc_server_init_ref(arbitrator, msg_buf_fact,
            &lprf_server_api_ssobj, &lprf_server_api_bcfobj, &lprf_server_api_crcobj);

    /* add api services to rpc server */
    erpc_add_service_to_server_ref(create_BLEmesh_api_service(), &lprf_server_api_ssobj);
    erpc_add_service_to_server_ref(create_BLEstk_api_service(), &lprf_server_api_ssobj);

    /* bind callback interfaces to client manager */
    lprf_cbk_cm = lprf_server_cbk_acmobj;
}

This should give you an idea of how we implemented this. We are interested in up-streaming this solution, but have not found the time. Let us know if you are interested.

c_client_source.template.txt erpc_client_setup.h.txt erpc_client_setup.cpp.txt

RamseyTI avatar May 08 '20 21:05 RamseyTI

@willmann9527 For the issue of client getting stuck, in a way a eRPC client timeout for each eRPC call. I am also concerned about this same issue in my implementation. Need a mechanism in client to unblock the execution flow.

Can you share how you attempted this - maybe that sample setter function (method) code and Sem variable handling ?

Talking about this idea - "Solution: Replace the following timeout kWaitForever by a member variable mu32_ReceiveTimeout, which is going to be initialized in the constructor and can be set by a setter method. info->m_sem.get(Semaphore::kWaitForever);"

Thanks.

anirudhgargi avatar Jul 02 '20 03:07 anirudhgargi

@anirudhgargi Frankly speaking, in the end I decided against the arbitrater. But it should be not that hard. Just use a member, e.g. mu32_Timeout, and set it in the constructor to Semaphore::kWaitForever. Then set up a setter procedure, for instance SetTimeout(uint32 u32_InTimeout) { mu32_Timeout = u32_InTimeout; }. Last but not least, use the member mu32_Timeout instead of the Semaphore::kWaitForever: info->m_sem.get(mu32_Timeout);. Was that helpful?

willmann9527 avatar Jul 02 '20 06:07 willmann9527

@willmann9527 Thanks for the prompt reply. Let me implement and validate. Will share feedback soon.

anirudhgargi avatar Jul 06 '20 04:07 anirudhgargi

I think it would be better to release pending clients in erpc_transport_arbitrator::receive with something like

        erpc_status_t err = m_sharedTransport->receive(message);
        if (err)
        {
		// if we timeout, we must unblock all pending client(s)	
		if (err == kErpcStatus_Timeout)
		{	
			PendingClientInfo *client = m_clientList;
			for (; client; client = client->m_next)
			{
				if (client->m_isValid)
				{
					client->m_sem.put();
				}	
			}
		}
               return err;
        }

pgu-swir avatar Aug 26 '20 04:08 pgu-swir

Hi @pgu-swir @MichalPrincNXP do you think this jira can be closed based on merged PR?

Hadatko avatar Apr 25 '22 11:04 Hadatko

I would close it if others involved agree too. @willmann9527 @RamseyTI @anirudhgargi any comments?

MichalPrincNXP avatar Apr 26 '22 07:04 MichalPrincNXP

Closing for inactivity.

Hadatko avatar Sep 27 '23 13:09 Hadatko