Android-nRF-Mesh-Library icon indicating copy to clipboard operation
Android-nRF-Mesh-Library copied to clipboard

Segmentation feedback of acknowledged msg is mostly failing when sent to non-proxy device

Open R0m4in-dooz opened this issue 3 years ago • 38 comments


Describe the bug As title states, when sending segmented acknowledged messages to non-proxy devices, it is too often failing to construct back the Status.

To Reproduce Steps to reproduce the behavior:

  1. Provision at least 2 nodes
  2. Connect to proxy
  3. Send either CompositionDataGet, ConfigModelPublicationSet, or any msg that would result in segmented answer to the device that is not the proxy
  4. Witness the library failing to construct the Status msg -- it can totally fail or will pass it once a retransmission from the device is done

Expected behavior The library is able to construct the Status as soon as all feedback segments has been received (eg. SEG 0 == SEG N), and the non-proxy device should not have to retransmit at this point (?).

Platform details:

  • Device: any
  • OS: Android 9 and above
  • Library Version: 3.1.5
  • Mesh SDK: 4.0
  • Chipset: nrf52840 xxAA
  • MTU size: 66

Logs / Screenshots I spent some time to build enough material for you. I hope it will be enough to spot any bug, or give us hint on why this behavior.

log files suffixed with OK --> expected behavior log files suffixed with KO --> unexpected behavior: all segments received from device but no Status is being constructed. But we have it once the device is retransmitting some segments log files suffixed with full_KO --> unexpected behavior: all segments received from device but no Status is being constructed. The device seems to have received every acknowledgements as it is not retransmitting

Unfortunately, this behavior is even more problematic in our commercial application as we have implemented a FIFO for such msg with auto-retry mechanism. Meaning this kind of msg is in a FIFO, when sending a cmd, we wait for feedback and retry every 2.5 sec (retried 2 times). The success rate is below 15% because of this problem and it makes the app unusable for a lot of our features. If we increase the retry timer to 5-6sec, the success rate raises to ~60%, but it is still not enough, and is not acceptable for our customers. We may force proxy connection to raise success rate to 95%, but it is not acceptable for most features that need these segmented feedbacks, and moreover it's blocking the power of mesh n/w.

R0m4in-dooz avatar May 07 '21 15:05 R0m4in-dooz

Hi Rom4in thanks for the detailed report. I will try to take a look at this to see where the issue might be. In the mean time could you check if you are having the same problem on version 2.4.1?

roshanrajaratnam avatar May 10 '21 07:05 roshanrajaratnam

Not sure for 2.4.1, but we are using 2.3.0 on our mainline app. And yes, no problems on this version

ghost avatar May 10 '21 07:05 ghost

I tried to make code comparison between 3.1.5 and 2.3.0, but too much changes 😅 And I don't really know where to look in the codebase

ghost avatar May 10 '21 07:05 ghost

Hello @roshanrajaratnam :) any updates ?

ghost avatar May 14 '21 10:05 ghost

Hey sorry, been busy with some other tasks. Unfortunately i won't be back to work until Tuesday due to national holidays in Norway. I'll try to take a look at this next week!

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: R0m4in-dooz @.> Sent: Friday, May 14, 2021 12:27:58 PM To: NordicSemiconductor/Android-nRF-Mesh-Library @.> Cc: Rajaratnam, Roshan @.>; Mention @.> Subject: Re: [NordicSemiconductor/Android-nRF-Mesh-Library] Segmentation feedback of acknowledged msg is mostly failing when sent to non-proxy device (#426)

Hello @roshanrajaratnamhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Froshanrajaratnam&data=04%7C01%7Croshan.rajaratnam%40nordicsemi.no%7Cdb2f0299f99b4136b12e08d916c2f292%7C28e5afa2bf6f419a8cf6b31c6e9e5e8d%7C0%7C0%7C637565848823122352%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=XWvT6ztKu%2BuY4ESEV7zPu0chco9wMtn3GdSufYoLZBk%3D&reserved=0 :) any updates ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FNordicSemiconductor%2FAndroid-nRF-Mesh-Library%2Fissues%2F426%23issuecomment-841158540&data=04%7C01%7Croshan.rajaratnam%40nordicsemi.no%7Cdb2f0299f99b4136b12e08d916c2f292%7C28e5afa2bf6f419a8cf6b31c6e9e5e8d%7C0%7C0%7C637565848823132309%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=k9v5vR%2Bq7ainPfdlb6EzudIOZYCYQsT9Qz7%2BjiKLRd0%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACCK6DOVN7VECYUVSXCTPW3TNT3K5ANCNFSM44KKHPZQ&data=04%7C01%7Croshan.rajaratnam%40nordicsemi.no%7Cdb2f0299f99b4136b12e08d916c2f292%7C28e5afa2bf6f419a8cf6b31c6e9e5e8d%7C0%7C0%7C637565848823132309%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Ki%2F%2FGHMT7DDAG2twe4x4uGdbtdYZbKcSDTCJkDz%2FKw8%3D&reserved=0.

roshanrajaratnam avatar May 14 '21 17:05 roshanrajaratnam

@R0m4in-dooz I have had some training programs yesterday and today so I have not had time yet to look in to your issue. I will try my best to have a look at this during this week.

roshanrajaratnam avatar May 19 '21 16:05 roshanrajaratnam

Thanks @roshanrajaratnam, if I may help, with other logs or giving you our custom msg, feel free to ask :)

ghost avatar May 19 '21 16:05 ghost

Could you also confirm if this is happening with the nRF SDK examples as well?

roshanrajaratnam avatar May 19 '21 17:05 roshanrajaratnam

If you mean with the nrf example app, yes it has the same behavior: sometimes it's ok, sometimes it's ok after a retransmission, sometimes it never constructs the feedback

ghost avatar May 19 '21 17:05 ghost

yes example app with the nRF SDK Mesh light/switch examples?

roshanrajaratnam avatar May 19 '21 17:05 roshanrajaratnam

🤔 i only use our firmwares, based on nrf mesh sdk 4.0

ghost avatar May 19 '21 17:05 ghost

And I reproduced with publication set msg, with provisioned nodes, no configuration

ghost avatar May 19 '21 17:05 ghost

Hi @R0m4in-dooz finally found sometime to look at your issue. However looking at your logs and also after some debugging, I notice that "Message reassembly may not be completed yet!" is logged only when the message is not completed or when the reassembly might be in progress. If you notice when this is logged not all expected segments are received.

roshanrajaratnam avatar May 25 '21 21:05 roshanrajaratnam

Yeh, sometimes I expect this log but:

  • in compositiondataget_KO.log --> after receiving the 2nd segment which was missing, I indeed get the status, but after that I again receive something. Can you explain ?
V/NetworkLayer(19824): TTL for received message: 99
V/NetworkLayer(19824): Dst: 0x0001
V/NetworkLayer(19824): Received a segmented access message from: 0002
V/LowerTransportLayer(19824): SEG O: 2
V/LowerTransportLayer(19824): SEG N: 5
V/LowerTransportLayer(19824): Last SeqAuth value 511
V/LowerTransportLayer(19824): Current SeqAuth value 511
V/LowerTransportLayer(19824): Received segment message count: 6
V/BlockAcknowledgementMessage(19824): Block ack value: 3f
V/BlockAcknowledgementMessage(19824): Block ack: 63
V/BlockAcknowledgementMessage(19824): bit count: 5
V/LowerTransportLayer(19824): Block acknowledgement payload: 07FC0000003F
V/LowerTransportLayer(19824): Creating unsegmented transport control
V/LowerTransportLayer(19824): Unsegmented Lower transport control PDU 0007FC0000003F
V/NetworkLayer(19824): Encryption key: 531F92A8F77688718BB6855F2E35F772
V/NetworkLayer(19824): Privacy key: 73F0C6B5D7B3A2FB34F31A738AF2AD15
V/NetworkLayer(19824): Sequence Number: 0001DF
V/NetworkLayer(19824): Encrypted Network payload: 186235E82A8E46B432454AB4044B4B8822
V/MeshMessageState(19824): Sending acknowledgement: 0019F6A7E69E6BAA186235E82A8E46B432454AB4044B4B8822
D/DoozMeshStatusCallbacks(19824): onBlockAcknowledgementProcessed
V/AccessLayer(19824): Received Access PDU 02005900000000002800070000000400000002000210010000000400021000100310011000000400021000100310011000000300021003100110000001000210
V/ConfigCompositionDataStatus(19824):
[...]
D/fr.dooz.nordic_nrf_mesh.DoozMeshManagerCallbacks(19824): onNetworkUpdated
D/DoozMeshStatusCallbacks(19824): onMeshMessageReceived
D/DoozMeshStatusCallbacks(19824): received a ConfigCompositionDataStatus
V/MeshManagerApi(19824): MeshNetwork pdu sent: 0x0019F6A7E69E6BAA186235E82A8E46B432454AB4044B4B8822
V/MeshManagerApi(19824): Received network pdu: 0x00198C03B0B32F69B8E2E2D9ADC72551ED3F4D82ABC77139EB
V/BaseMeshMessageHandler(19824): TTL for received message: 4
V/BaseMeshMessageHandler(19824): Sequence number of received access message: 479
V/NetworkLayer(19824): TTL for received message: 4
V/NetworkLayer(19824): Received a control message that was not directed to us, so we drop it
V/DefaultNoOperationMessageState(19824): Message reassembly may not be completed yet!

is it the answer from onBlockAcknowledgementProcessed ? then why it says not directed to us + msg not completed yet ?

  • in compositiondataget_KO_2.log --> i never have my status even if all segments have been logged at some point (0,1,3,4,5 and then 1,2,3,5)

  • in configmodelpublicationset_KO.log --> I have my status after this is logged : seg 0, seg 1, seg 1, seg 0 and still after the Status is constructed i have the same as compositiondataget_KO.log

  • in configmodelpublicationset_KO_2.log --> I have my status after this is logged : seg 1, seg 0, seg 0, seg 0 seg 1 and still after the Status is constructed i have the same as compositiondataget_KO.log

  • in configmodelpublicationset_full_KO.log --> i never have my status even if seg 0 and seg 1 are logged and in the end it says

V/MeshMessageState(23796): Sending acknowledgement: 00194768EF39053AB8E96F3BD1CF16713C329C54EA78F30997
D/DoozMeshStatusCallbacks(23796): onBlockAcknowledgementProcessed
V/MeshManagerApi(23796): Received network pdu: 0x00193237EADEAC3D2F3243ED98B1A236B06FF03963AF866DCB
V/BaseMeshMessageHandler(23796): TTL for received message: 4
V/BaseMeshMessageHandler(23796): Sequence number of received access message: 105
V/NetworkLayer(23796): TTL for received message: 4
I/flutter (23796): [NordicReport] received onDataReceived
V/DefaultNoOperationMessageState(23796): Acknowledgement payload: 00000003
V/BlockAcknowledgementMessage(23796): Segment 0 of 1 received by peer
V/BlockAcknowledgementMessage(23796): Segment 1 of 1 received by peer
D/DoozMeshStatusCallbacks(23796): onBlockAcknowledgementReceived
V/MeshManagerApi(23796): MeshNetwork pdu sent: 0x00194768EF39053AB8E96F3BD1CF16713C329C54EA78F30997
V/MeshManagerApi(23796): Received network pdu: 0x001991483AB5612B2FB31BE4E04AE0FD1E1B7FA0AA17BF6647
V/BaseMeshMessageHandler(23796): TTL for received message: 4
V/BaseMeshMessageHandler(23796): Sequence number of received access message: 106
V/NetworkLayer(23796): TTL for received message: 4
V/DefaultNoOperationMessageState(23796): Acknowledgement payload: 00000003
V/BlockAcknowledgementMessage(23796): Segment 0 of 1 received by peer
V/BlockAcknowledgementMessage(23796): Segment 1 of 1 received by peer
D/DoozMeshStatusCallbacks(23796): onBlockAcknowledgementReceived

logged all segments received by peer so why no Status ?

I notice that there are a lot of

E/DefaultNoOperationMessageState(23796): Decryption failed in NetworkLayer : mac check in CCM failed
D/DoozMeshStatusCallbacks(23796): onMessageDecryptionFailed

maybe it's the cause of the issue ?

ghost avatar May 26 '21 07:05 ghost

In a mesh network you may receive messages that are not directed to you as they are being relayed, in the network layer we could find out if it was directed to us. There could be instances where you can receive such messages hence we drop it. So the segments you are referring are the once received by the device.

The decryption failure is not something I am able produce using our sample app at all. Have you done any changes? if the decryption fails yes the LowerTransportLayer will timeout eventually.

roshanrajaratnam avatar May 26 '21 09:05 roshanrajaratnam

In a mesh network you may receive messages that are not directed to you as they are being relayed, in the network layer we could find out if it was directed to us. There could be instances where you can receive such messages hence we drop it. So the segments you are referring are the once received by the device.

Ok makes sense :)

The decryption failure is not something I am able produce using our sample app at all. Have you done any changes? if the decryption fails yes the LowerTransportLayer will timeout eventually.

I will try again with sample app and give you those logs if I have it again 👍🏼 but we only added our custom ApplicationMessage on top of 3.1.5 + new build flavor which fallback to your definition (specific flutter bug fix). So you don't have any issue to have ConfigModelPublicationSetStatus when it comes from non-proxy device ?

What about composition data log KO_2 and publication set log full_KO ? You never have this behavior ?

ghost avatar May 26 '21 10:05 ghost

ok sounds good. I do not see any issues relating to CompositionDataGet or ConfigModelPublicationSet message sent via proxy or to proxy. Let me know how this goes for you.

roshanrajaratnam avatar May 26 '21 11:05 roshanrajaratnam

Hello there. Sorry I didn't have time to look into this issue as we shipped force proxy as a temporary workaround on prod and I was busy with some other priorities! We may find some time to look for it. But first we will try out the new release 🕺

ghost avatar Jun 23 '21 14:06 ghost

Hello ! After upgrading to latest release, we still have issues on segmented answers. It is behaving better tho ! I don't have anymore issues on CompositionDataGet, nor on our custom 2 segments ApplicationMessage. However, for ConfigModelPublicationSet, I still experience troubles to construct Status from segments. I think it has changed behavior and now, I always have these two logs when it fails:

D/DoozMeshStatusCallbacks(13381): onBlockAcknowledgementProcessed
E/DefaultNoOperationMessageState(13381): Decryption failed in NetworkLayer : mac check in CCM failed
D/DoozMeshStatusCallbacks(13381): onMessageDecryptionFailed

Even if later it has these two logs:

V/BlockAcknowledgementMessage(13381): Segment 0 of 1 received by peer
V/BlockAcknowledgementMessage(13381): Segment 1 of 1 received by peer

The success rate has increase so it is better, but for pubs it is still not enough and I can't revert our force proxy code workaround. I feel like our FIFO with auto-retry is messing with your "segment queue", because often, it is constructing the Status some seconds after our retries are done (3x publication set) when target device resend 1 of the segments.

ghost avatar Jul 06 '21 07:07 ghost

It is still rly better, so thank for your work on low layers :)

ghost avatar Jul 06 '21 07:07 ghost

hey again, no problem, does this happen on a custom firmware, could you share it? I am not able to reproduce this on our sdk.

roshanrajaratnam avatar Jul 06 '21 07:07 roshanrajaratnam

I will ask our embedded team

ghost avatar Jul 06 '21 07:07 ghost

@roshanrajaratnam is [email protected] a valid email to receive a Google Drive share link ?

ghost avatar Jul 06 '21 07:07 ghost

I just sent the .hex file via Google Drive share

ghost avatar Jul 06 '21 08:07 ghost

Thanks I just got it, I will take a look at it later during the day. What dk is this built for ?

roshanrajaratnam avatar Jul 06 '21 08:07 roshanrajaratnam

This one :) compatible_DK

ghost avatar Jul 06 '21 08:07 ghost

Hello ! Any update on the subject?

ghost avatar Jul 12 '21 20:07 ghost

Hi, we're all on vacations right now. Please expect some delays in responses in July.

philips77 avatar Jul 14 '21 06:07 philips77

Hi there ! I see you are closing issues due to inactivity. This issue is still there, and I'm still awaiting support from you guys. I provided a .hex file in beginning of summer and still no feedback but this issue is labelled "awaiting user input" ?? Please notify me if you find any time to try to reproduce the issue with our firmware 😁

ghost avatar Dec 09 '21 10:12 ghost