arduino-lmic icon indicating copy to clipboard operation
arduino-lmic copied to clipboard

Packet transmission fails after ~200 successful transmissions.

Open akshaim opened this issue 6 years ago • 2 comments

ABP Node starts off sending data on a 30-second interval but fails after sending around 200 packets. LMIC Debug data show “uplink delayed until …” error. Last few lines of debug log(along with comments) is attached for reference. Any help is deeply appreciated :+1:

Device Details : SX1276 + Atmega 328. Sleep using Lowpower.h.

[17:49:12.962350 25.631989] Sleep complete //node wakes up from sleep to send 199th data [17:49:12.978556 0.016206] a_xy228 a_xz233 a_yz229 [17:49:13.009979 0.031424] Enter do_send [17:49:13.025927 0.015948] 347136201: engineUpdate, opmode=0x908 [17:49:13.058001 0.032074] 347137705: Uplink data pending [17:49:13.090223 0.032222] 347139720: Considering band 0, which is available at 388 [17:49:13.153837 0.063614] 347143426: Considering band 3, which is available at 0 [17:49:13.201948 0.048111] 347147009: No channel found in band 3 [17:49:13.249569 0.047618] 347149470: Considering band 0, which is available at 388 [17:49:13.313015 0.063448] 347153184: No channel found in band 0 [17:49:13.345306 0.032292] 347155645: Considering band 1, which is available at 345710800 [17:49:13.409153 0.063846] 347159740: Considering band 2, which is available at 388 [17:49:13.472689 0.063538] 347163454: No channel found in band 2 [17:49:13.504812 0.032122] 347165915: Considering band 1, which is available at 345710800 [17:49:13.568289 0.063478] 347170019: No channel found in band 1 [17:49:13.616145 0.047856] 347172479: Airtime available at 345710800 (channel duty limit) [17:49:13.679799 0.063653] 347176574: Airtime available at 345818595 (global duty limit) [17:49:13.743777 0.063979] 347180604: Ready for uplink [17:49:13.775581 0.031803] 347183211: Updating info for TX at 347137704, airtime will be 3536. Setting available time for band 1 to 1273495552 [17:49:13.887426 0.111845] 347190039: TXMODE, freq=865062500, len=21, SF=7, BW=125, CR=4/5, IH=0 [17:49:13.966923 0.079497] 347190036: irq: dio: 0x0 flags: 0x8 [17:49:13.999007 0.032085] 347192387: Scheduled job 0x2ea, cb 0x240e ASAP [17:49:14.046888 0.047880] Packet queued [17:49:14.062758 0.015870] Leave do_send [17:49:14.078713 0.015956] 347197382: Running job 0x2ea, cb 0x240e, deadline 0 [17:49:14.142103 0.063389] 347200766: Scheduled job 0x2ea, cb 0x2421 at 347252505 [17:49:14.907146 0.765042] 347252511: Running job 0x2ea, cb 0x2421, deadline 347252505 [17:49:14.955563 0.048417] 347252684: RXMODE_SINGLE, freq=865062500, SF=7, BW=125, CR=4/5, IH=0 [17:49:15.034720 0.079158] 347252851: irq: dio: 0x1 flags: 0x80 [17:49:15.067107 0.032386] 347255267: Scheduled job 0x2ea, cb 0x2429 ASAP [17:49:15.114876 0.047770] 347258313: Running job 0x2ea, cb 0x2429, deadline 0 [17:49:15.178321 0.063445] 347261696: Scheduled job 0x2ea, cb 0x2440 at 347317981 [17:49:16.007143 0.828822] 347317987: Running job 0x2ea, cb 0x2440, deadline 347317981 [17:49:16.071155 0.064012] 347318161: RXMODE_SINGLE, freq=869525000, SF=12, BW=125, CR=4/5, IH=0 [17:49:16.182312 0.111156] 347324454: irq: dio: 0x1 flags: 0x80 [17:49:16.214699 0.032388] 347324538: Scheduled job 0x2ea, cb 0x2396 ASAP [17:49:16.262575 0.047875] 347325715: Running job 0x2ea, cb 0x2396, deadline 0 [17:49:16.310673 0.048099] Enter onEvent [17:49:16.326525 0.015852] EV_TXCOMPLETE (includes waiting for RX windows) // successful transmission [17:49:16.389758 0.063233] Leave onEvent [17:49:16.405718 0.015960] 347334226: engineUpdate, opmode=0x900 [17:49:16.437840 0.032122] Enter sleeping for 3 cycles of 8 seconds [17:49:42.069730 25.631889] Sleep complete // Node wakes up to send 200th data. [17:49:42.086105 0.016376] a_xy228 a_xz233 a_yz229 [17:49:42.117703 0.031598] Enter do_send [17:49:42.133661 0.015958] 348916902: engineUpdate, opmode=0x908 [17:49:42.165775 0.032114] 348918406: Uplink data pending [17:49:42.197666 0.031891] 348920421: Considering band 1, which is available at 347491304 [17:49:42.261485 0.063819] 348924516: Considering band 2, which is available at 388 [17:49:42.325137 0.063651] 348928221: Considering band 3, which is available at 0 [17:49:42.388628 0.063492] 348931804: No channel found in band 3 [17:49:42.420754 0.032126] 348934266: Considering band 1, which is available at 347491304 [17:49:42.484632 0.063877] 348938361: Considering band 2, which is available at 388 [17:49:42.548213 0.063582] 348942074: No channel found in band 2 [17:49:42.580477 0.032263] 348944536: Considering band 1, which is available at 347491304 [17:49:42.659480 0.079004] 348948640: No channel found in band 1 [17:49:42.691620 0.032139] 348951110: No channel found in band 0 [17:49:42.723945 0.032326] 348953570: Airtime available at 2148918405 (channel duty limit) [17:49:42.803013 0.079067] 348957730: Uplink delayed until 2148918405 //Error !!! [17:49:42.835500 0.032488] 348960530: Scheduled job 0x2ea, cb 0x473 at 2148918280 [17:49:42.898912 0.063412] Packet queued [17:49:42.914871 0.015959] Leave do_send

akshaim avatar Jun 17 '18 13:06 akshaim

It looks very much like an overflow of the ticks. For a signed 32 bit int (ostime_t), 2148918405 (see error line in log file) is a negative value.

I never understood why the original code uses a 32 bit int, let alone a signed one. Ticks must be between 15.5μs – 100 μs long. So os_getTime() will overflow after 2^32 / (1,000,000 / 100) / 3600 = 119.3 hours (for 100µs) or less than 18.5 hours for 15µs.

If I'm not mistaken, I've seen modifications of the IBM code declaring ostime_t as u8_t.

manuelbl avatar Jul 09 '18 18:07 manuelbl

I have nodes running this code, which achived far longer uptimes (months) and far higher number of transmitted packets (15.000+) without a reset / restart.

cyberman54 avatar Jul 09 '18 18:07 cyberman54