ola icon indicating copy to clipboard operation
ola copied to clipboard

RDM full discovery sometimes misses connected fixtures or will sometimes hang.

Open ghost opened this issue 7 years ago • 36 comments

I was talking in the IRC channel last night with @peternewman and I was finally able to grab some logs before I left the office for the day.

The RDM full discovery OFTEN misses a handful of connected fixtures, and we will have to run the discovery multiple times until all fixtures appear. Once in a while a discovery will hang and will render our quality assurance software frozen.

Here is a olad -l 4 log where it misses 2 out of the 4 connected fixtures. https://gist.github.com/ltd9938/b502c5395b9fab2231a97a2087b02400

Here is a olad -l 4 log where it picks up all of the connected devices. https://gist.github.com/ltd9938/6e7391310fba51db07161b4c7919459c

I was unable to grab a log of the discovery hanging, but as soon as I do I will update this post.

We are running an ENTTEC RDM USB PRO with firmware 2.4 (RDM Enabled) on the latest version of OLA on ubuntu 12.04.

ghost avatar Mar 28 '18 12:03 ghost

Is that the same four fixtures in both runs?

I'm guessing so based on plugins/usbpro/EnttecUsbProWidget.cpp:586: Enttec Pro discovery complete: 4151:0200013a,4151:0200014e,4151:020002a3,4151:020002a5

And plugins/usbpro/EnttecUsbProWidget.cpp:586: Enttec Pro discovery complete: 4151:0200013a,4151:0200014e

In which case, the question is why this happened:

plugins/usbpro/EnttecUsbProWidget.cpp:330: Sending DUB packet: 4151:02000200 - 4151:020003ff
plugins/usbpro/EnttecUsbProWidget.cpp:854: TX: 11, length 38
common/io/EPoller.cpp:306: ss process time was 0.000001
plugins/usbpro/EnttecUsbProWidget.cpp:865: RX: 5, length 25
plugins/usbpro/EnttecUsbProWidget.cpp:865: RX: 12, length 0
common/rdm/DiscoveryAgent.cpp:217: BranchComplete, got 24
common/rdm/DiscoveryAgent.cpp:321: Muting 4151:020002a7

Which means that either there's something up with one or more of your fixtures, or they're somehow generating a collision which appears as a complete and valid response to the Enttec.

To progress this, it's probably a case of adding some more debugging to BranchComplete and/or capturing the raw RDM data on the line using an analyser/logic sniffer etc, e.g. see https://www.openlighting.org/rdm-tools/rdm-analyzers/ .

peternewman avatar Mar 28 '18 22:03 peternewman

So in data terms:

4151:020002a5 = eb 55 fb 55 aa 57 aa 55 aa 57 af f5 af 57 bf 75
4151:020002a3 = eb 55 fb 55 aa 57 aa 55 aa 57 ab f7 af 57 bb 77
4151:020002a7 = eb 55 fb 55 aa 57 aa 55 aa 57 af f7 af 57 bf 77

Which I guess isn't a huge stretch looking at the data.

But with checksums as follows:

AB 5D EF 7F
AB 5D EB 7F
AB 5D FB 77

A clean collision for them seems less likely to me.

peternewman avatar Mar 28 '18 22:03 peternewman

Correct those are the same four fixtures in both runs. I can try and get a bigger log with more fixtures before the end of the week if need be.

ghost avatar Mar 28 '18 23:03 ghost

There are two questions here really:

  1. From your side, how come you're getting a perfect collision which generates a valid DUB response
  2. From the OLA side, when we find a bad UID, should we actually behave a bit like a collision and keep branching down either side of the bad UID, which I think may work around your issue.

Although fundamentally, the way the standard is designed, 1 shouldn't be possible, otherwise all bets are off in terms of discovery.

The full discovery run log for success and failure would help, both should start with something like "DUB 0000:00000000 - ffff:ffffffff" or a line or two above that.

peternewman avatar Mar 28 '18 23:03 peternewman

It's very weird. Say we have 11 connected fixtures. I'll have to run the discovery multiple times before all 11 fixtures are discovered. It usually goes like this...

Discovery 1: 4 Fixtures Discovery 2: 7 Fixtures Discovery 3: 4 Fixtures Discovery 4: 9 Fixtures Discovery 5: 8 Fixtures Discovery 6: 11 Fixtures

ghost avatar Mar 29 '18 12:03 ghost

Does running a full, then incrementals improve that?

When it fails to find everything, does it log "didn't respond to MUTE, marking as bad" against some non-existent UIDs?

peternewman avatar Mar 29 '18 12:03 peternewman

I haven't tried running a full then incremental. Hopefully I can give it a shot tomorrow.

Nope I've never seen that message. It just outputs a list of the found uids.

ghost avatar Mar 29 '18 13:03 ghost

Prior to that, in the olad -l 4 logging, it should have that message (not in ola_rdm_discover sorry), e.g.: https://gist.github.com/ltd9938/b502c5395b9fab2231a97a2087b02400#file-fail-olad-l-4-L211

peternewman avatar Mar 29 '18 14:03 peternewman

Ahh gotcha. I won't have access to the manufacturing plant until Monday so I'm tied until next week.

ghost avatar Mar 29 '18 15:03 ghost

Was able to go into the plant yesterday. Ran incremental discoveries and didn't run into any issues. However, this was only with 4 fixtures. Once we get more fixtures I'll really try and recreate it.

ghost avatar Apr 04 '18 13:04 ghost

Hi @ltd9938 did you make any progress with this?

peternewman avatar Nov 30 '18 15:11 peternewman

Hi peter, Apologize I break in here. I thought not to start a new thread because I have very similar problem.

My new born responders (LED fixtures), all passed the OLA Responder tests sucessfully (Passed 367, Not Run 59, Total 426)

I tried 12 of LED fixtures, (same model ID, 12 sequential DID from 1 to 12) to patch using OLA admin. I have to press" run full discovery" button several times to succeed discovering all devices. attempt 1 : 6 devices attempt 2 : 3 devices attempt 3 : 4 devices attempt 4 : 4 devices ...

Is there a an upper limit discovering for OLA ?

majcit avatar Dec 01 '18 11:12 majcit

Hi @majcit ,

There shouldn't be a limit and I've successfully discovered large numbers of responders with it such as the LabPack (lots of responders in a box https://www.goddarddesign.com/rdm-lab-pack.html and there's a bigger version too).

Firstly I assume this is repeatable, i.e. it happens each time you do it? I'd suggest using ola_rdm_discover as it's probaly more repeatable, and you can explicitly try full and incremental discoveries: https://docs.openlighting.org/ola/man/man1/ola_rdm_discover.1.html

To have any chance of finding the source of this bug and potentially fixing or working around it, we'll need a lot more information please, including olad -l 4 logs of any test runs. If you can capturing the raw RDM data on the line using an analyser/logic sniffer etc would be amazing, e.g. see https://www.openlighting.org/rdm-tools/rdm-analyzers/ .

I'm going to throw out a load of things to consider and hopefully you can do some testing your end to come up with the minimum needed to reproduce the fault, which might give me a chance of seeing it too and hence greatly increase the chance of fixing it.

For starters, what controller/interface are you using? Do you have access to another of the same or ideally a different type? What about an RDM compatible lighting desk? Does it happen with fewer fixtures? E.g. binary chop to six and repeat, what's the minimum number required? If it happens with fewer than twelve, or you have access to more than twelve or can change their UIDs, does it happen if they aren't sequentially numbered. As discussed above, the closer the numbers, the greater the chance of a collision generating a valid packet (although still unlikely). Does incremental discovery behave better or worse? Does the fixture pass the responder tests fine if it's tested with all the other fixtures on the line too? I assume these are all just connected daisy-chained? Have you got an RDM compatible splitter, does it still happen through that? Is it always that sequence of discovery successes or does it vary? I assume your responder is closed source? Would you be able to loan me some, or at least the control board guts/minimum bit to do RDM responding, (in London), if you don't have the RDM analyser/logic sniffer kit?

@ltd9938 if you can answer any of the above for your issue too would also help.

peternewman avatar Dec 01 '18 16:12 peternewman

Hi @peternewman,

Sorry for not updating my situation sooner. My problem has been solved. My software was triggering multiple discoveries at a time which was causing the complications.

All has been fixed. I'm going to close this issue, please reopen if you see the need to.

ghost avatar Dec 03 '18 13:12 ghost

Hmm, thanks for coming back @ltd9938 , that sounds like a potential bug in OLA still, as I'm not sure we should allow the second discovery until the first has completed, given it would cause issues like you've seen.

Do you have a basic sample of your code you could share with the bug so I can try and reproduce and fix it? Which API were you using C++, Python, JSON/HTTP?

@majcit are you sure you aren't having the same issue? Can you try with the CLI client to make sure it's only being run once at a time, ideally after waiting some time for the initial discovery that's run on startup to complete.

peternewman avatar Dec 03 '18 14:12 peternewman

Ok Peter My setup (12 fixture) is successfully discovered by : XMT-350, ENTTEC DMX USB Pro + Enttec software ENTTEC DMX USB Mk2 + Enttec software Several times I press "full Discovery" every time discovers 12 devices

By both ENTTEC hardwares + OLA 10.3 (on Raspbian) İt may randomly any number between 1~12 Rarely at first try discovers all 12, I didn't consider any special numbers or special order of DIDs

I can execute Enttec sniffer and report sniffer messages soon, For olad log, I need to try more, never used before,

majcit avatar Dec 03 '18 15:12 majcit

Thanks @majcit .

But if you don't press any buttons and wait for say 2-3 mins, then hit full discovery once, it still generally fails to find all 12?

The Enttec software works very differently to us, as there is some oddity when discovering https://www.enttec.com/products/controls/led/din-led4px/ with OLA that works with their system which I haven't had a chance to get to the bottom of yet and find out where the issue lies.

Enttec sniffer logs would be excellent, either from their software or using ours: http://docs.openlighting.org/ola/man/man1/rdmpro_sniffer.1.html

Likewise as mentioned, a few tests to see if the number of devices is special, or if it's intermittent with just one or two devices.

In terms of gathering olad debug logs, see here: https://www.openlighting.org/ola/get-help/ola-faq/#How_do_I_get_olad_-l_4_logs

peternewman avatar Dec 03 '18 16:12 peternewman

I did new trials by adding fixtures one by one and observed new facts as below : N: number of fixtures there is no problem when N<4, OLA always discovers successfully when N>=4 and DIDs are sequential, at every pressing of discovery , OLA finds randomly different numbers 1 ~ N for example 6 sequential fixteures : (all sequential) 2ee109a1-2ee109a2-2ee109a3-2ee109a4-2ee109a5-2ee109a6

when N>=4 and DIDs are not sequential, OLA always discovers successfully, even if pressed immediately for example 15 non-sequential fixteures : (5 group, each group only 3 sequential) 2ee109a1-2ee109a2-2ee109a3 2ee10aa4-2ee10aa5-2ee10aa6 2ee10ba7-2ee10ba8-2ee10ba9 2ee10caa-2ee10cab-2ee10cac 2e94be67-2e94be68-2e94be69

But if you don't press any buttons and wait for say 2-3 mins, then hit full discovery once, it still generally fails to find all 12?

I tried several times, there is no difference if wait 1s or 1min or 5min.

majcit avatar Dec 04 '18 12:12 majcit

Thanks @majcit , I'll reopen this, as that certainly sounds like a bug. @ltd9938 are you sure you also aren't seeing the same issue, it sounds VERY similar!

@majcit I assume this fails with both the Enttec Pro and the Pro Mk 2?

I think we can now just concentrate on four sequential fixtures for the moment, some RDM sniffer and/or olad -l 4 logs will be the next step now I think.

peternewman avatar Dec 04 '18 12:12 peternewman

@majcit I assume this fails with both the Enttec Pro and the Pro Mk 2?

yes It occurs both for PRO and PRO Mk2

I think we can now just concentrate on four sequential fixtures for the moment, some RDM sniffer and/or olad -l 4 logs will be the next step now I think.

I am your man! I need to sort this out because maybe it is hidden bug of my devices, so that ENTTEC PRO and ENTTEC PRO Mk2 and XMT-350 somehow tolerate the error , but OLA not

attachments are sniffer log for 4 fixtures with DID=1,2,3 and 4 OLA discovery by PRO, sniffed by PRO Mk2,

3 fixtures, discovered all 3 successfully : FIX=3 DISC 3 (DID=1,2,3).txt

4 fixtures, discovered only 1 : FIX=4 DISC 1 (DID=3).txt

4 fixtures, discovered only 2: FIX=4 DISC 2 (DID=1,2).txt

4 fixtures, discovered all 4 successfully : FIX=4 DISC 4 (DID=1,2,3,4).txt

are .txt files ok? or do you need .bin file?

majcit avatar Dec 04 '18 13:12 majcit

@peternewman

I wrote a quality assurance station for our manufacturing team using Flask. I had a "Refresh Fixtures" button on the homepage that when clicked would trigger ola_rdm_discover -f -u 1.

After going through my code I realized I stupidly had the discovery start twice. After removing the second discovery initiation we haven't had an issue since (except when our fixtures aren't daisy chained correctly, which may have also played a part months ago)

ghost avatar Dec 04 '18 13:12 ghost

@majcit I assume this fails with both the Enttec Pro and the Pro Mk 2?

yes It occurs both for PRO and PRO Mk2

I think we can now just concentrate on four sequential fixtures for the moment, some RDM sniffer and/or olad -l 4 logs will be the next step now I think.

I am your man! I need to sort this out because maybe it is hidden bug of my devices, so that ENTTEC PRO and ENTTEC PRO Mk2 and XMT-350 somehow tolerate the error , but OLA not

attachments are sniffer log for 4 fixtures with DID=1,2,3 and 4 OLA discovery by PRO, sniffed by PRO Mk2,

3 fixtures, discovered all 3 successfully : FIX=3 DISC 3 (DID=1,2,3).txt

Strange, that log only shows two Good Checksum lines! For 02ac:00000002 and 02ac:00000001. Are they actually your UIDs, they don't match what I assume was supposed to be the device part of the IDs listed earlier. However it does match the DEVICE_LABEL get response, and indeed your comment just above, so I assume that's correct.

Actually looking at the log it's already muted 02ac:00000003, so I assume it found that outside of the text log you sent.

Other strange things here: We DUB 0000000000007FFFFFFFFFFF twice and don't get collisions either time, although that might just be a timing thing.

4 fixtures, discovered only 1 : FIX=4 DISC 1 (DID=3).txt

02ac:00000003

4 fixtures, discovered only 2: FIX=4 DISC 2 (DID=1,2).txt

So I think this is the worst case, only finding half: Finds 02ac:00000002 Finds 02ac:00000001 Finds 02ac:00000006! Finds 02ac:00000007! Finds 02ac:00000007! Finds 02ac:00000007!

The 7's are all when DUBing 000000000000-7FFFFFFFFFFF

4 fixtures, discovered all 4 successfully : FIX=4 DISC 4 (DID=1,2,3,4).txt

The sniffing can't be great, as it also shows this line: 39362263,RDM Discovery Response, , , , , , , Good Checksum , 8 ,FC FF FF FF FF FF FF BA This still found 02ac:00000000 (twice) and 02ac:00000006!

are .txt files ok? or do you need .bin file?

Text is fine, and indeed easier!

We've got an EUID to UID converter here: http://rdm.openlighting.org/tools/uid-converter

I'd again suggest OLA's RDM sniffer code (which should work with your Enttec), and may produce nicer output: https://www.openlighting.org/rdm-tools/rdm-analyzers/enttec-sniffer/

Other than that, it's probably a case of using the Enttec software or your XMT-350 to do a successful discovery and sniffing that to see if it also has the valid discovery of non-existent UIDs.

peternewman avatar Dec 04 '18 14:12 peternewman

@peternewman

I wrote a quality assurance station for our manufacturing team using Flask. I had a "Refresh Fixtures" button on the homepage that when clicked would trigger ola_rdm_discover -f -u 1.

After going through my code I realized I stupidly had the discovery start twice. After removing the second discovery initiation we haven't had an issue since (except when our fixtures aren't daisy chained correctly, which may have also played a part months ago)

Okay thanks for confirming @ltd9938 , although it's odd how much it mirrors your issue, I was trying to find where I made this comment, then realised it was regarding you: https://github.com/OpenLightingProject/ola/issues/1396#issuecomment-377055812

peternewman avatar Dec 04 '18 15:12 peternewman

Strange, that log only shows two Good Checksum lines! For 02ac:00000002 and 02ac:00000001. Are they actually your UIDs, they don't match what I assume was supposed to be the device part of the IDs listed earlier. However it does match the DEVICE_LABEL get response, and indeed your comment just above, so I assume that's correct.

Actually looking at the log it's already muted 02ac:00000003, so I assume it found that outside of the text log you sent.

Other strange things here: We DUB 0000000000007FFFFFFFFFFF twice and don't get collisions either time, although that might just be a timing thing.

4 fixtures full UIDs are : 02ac:00000001, 02ac:00000002, 02ac:00000003 and 02ac:00000004 I briefly say 1,2,3 and 4

strange UIDs 6 and 7, there is no such UIDs , controller DISC_MUTE them, but they don't appear at final list, maybe they are consequence of collisions I am not sure if it is normal or not but It also happens with XMT-350, here is the sniffer log for 4 fixtures UIDs 1, 2, 3 and 4 : XMT-350 FIX=4 DISC 4 (DID=1,2,3,4).txt

I'd again suggest OLA's RDM sniffer code (which should work with your Enttec), and may produce nicer output: https://www.openlighting.org/rdm-tools/rdm-analyzers/enttec-sniffer/

I tried to execute rdmpro_sniffer [ options ] <usb-device-path> but I can't determine <usb-device-path> the example says rdmpro_sniffer -r /dev/tty.usbserial-00001014 in my Raspberry Pi there is no such file or similar :

root@raspberrypi:/dev# ls
autofs           loop7               ram6     tty20  tty46      urandom
block            loop-control        ram7     tty21  tty47      vchiq
btrfs-control    mapper              ram8     tty22  tty48      vcio
bus              mem                 ram9     tty23  tty49      vc-mem
cachefiles       memory_bandwidth    random   tty24  tty5       vcs
char             mmcblk0             raw      tty25  tty50      vcs1
console          mmcblk0p1           rfkill   tty26  tty51      vcs2
cpu_dma_latency  mmcblk0p2           serial   tty27  tty52      vcs3
cuse             mqueue              serial0  tty28  tty53      vcs4
disk             net                 shm      tty29  tty54      vcs5
fb0              network_latency     snd      tty3   tty55      vcs6
fd               network_throughput  stderr   tty30  tty56      vcsa
full             null                stdin    tty31  tty57      vcsa1
fuse             ppp                 stdout   tty32  tty58      vcsa2
gpiochip0        ptmx                tty      tty33  tty59      vcsa3
gpiomem          pts                 tty0     tty34  tty6       vcsa4
hwrng            ram0                tty1     tty35  tty60      vcsa5
initctl          ram1                tty10    tty36  tty61      vcsa6
input            ram10               tty11    tty37  tty62      vcsm
kmsg             ram11               tty12    tty38  tty63      vhci
log              ram12               tty13    tty39  tty7       watchdog
loop0            ram13               tty14    tty4   tty8       watchdog0
loop1            ram14               tty15    tty40  tty9       zero
loop2            ram15               tty16    tty41  ttyAMA0
loop3            ram2                tty17    tty42  ttyprintk
loop4            ram3                tty18    tty43  ttyUSB0
loop5            ram4                tty19    tty44  uhid
loop6            ram5                tty2     tty45  uinput

majcit avatar Dec 05 '18 08:12 majcit

Strange, that log only shows two Good Checksum lines! For 02ac:00000002 and 02ac:00000001. Are they actually your UIDs, they don't match what I assume was supposed to be the device part of the IDs listed earlier. However it does match the DEVICE_LABEL get response, and indeed your comment just above, so I assume that's correct. Actually looking at the log it's already muted 02ac:00000003, so I assume it found that outside of the text log you sent. Other strange things here: We DUB 0000000000007FFFFFFFFFFF twice and don't get collisions either time, although that might just be a timing thing.

4 fixtures full UIDs are : 02ac:00000001, 02ac:00000002, 02ac:00000003 and 02ac:00000004 I briefly say 1,2,3 and 4

strange UIDs 6 and 7, there is no such UIDs , controller DISC_MUTE them, but they don't appear at final list, maybe they are consequence of collisions I am not sure if it is normal or not but It also happens with XMT-350, here is the sniffer log for 4 fixtures UIDs 1, 2, 3 and 4 : XMT-350 FIX=4 DISC 4 (DID=1,2,3,4).txt

So the XMT-350 log also finds 6, mute's it once and then continues dubbing. So I think this is down to how different things respond to a collision that generates a good checksum. I suspect OLA is being a bit too defensive and assuming the device just doesn't respond to mute properly, whereas it seems we should branch/DUB a bit more first before writing it off as a bad device. I think some olad -l 4 logs are the next step just to confirm my guesswork from the RDM captures.

I'd again suggest OLA's RDM sniffer code (which should work with your Enttec), and may produce nicer output: https://www.openlighting.org/rdm-tools/rdm-analyzers/enttec-sniffer/

I tried to execute rdmpro_sniffer [ options ] <usb-device-path> but I can't determine <usb-device-path> the example says rdmpro_sniffer -r /dev/tty.usbserial-00001014 in my Raspberry Pi there is no such file or similar :

I think you want /dev/ttyUSB0, I believe the other format is Mac style.

peternewman avatar Dec 05 '18 11:12 peternewman

I think some olad -l 4 logs are the next step just to confirm my guesswork from the RDM captures.

I succeeded capturing olad -l 4 log stream :

4 fixtures, discovered 2 of them: 1,2 olad log, FIX=4 DISC 2 (DID=1,2).txt

4 fixtures, discovered 2 of them: 1, 3 olad log, FIX=4 DISC 2 (DID=1,3).txt

4 fixtures, discovered 1 of them: 3 olad log, FIX=4 DISC 1 (DID=3).txt

4 fixtures, discovered all of them: 1,2,3,4 olad log, FIX=4 DISC 4 (DID=1,2,3,4).txt

majcit avatar Dec 05 '18 13:12 majcit

I did new trials by adding fixtures one by one and observed new facts as below : N: number of fixtures there is no problem when N<4, OLA always discovers successfully when N>=4 and DIDs are sequential, at every pressing of discovery , OLA finds randomly different numbers 1 ~ N for example 6 sequential fixteures : (all sequential) 2ee109a1-2ee109a2-2ee109a3-2ee109a4-2ee109a5-2ee109a6

when N>=4 and DIDs are not sequential, OLA always discovers successfully, even if pressed immediately for example 15 non-sequential fixteures : (5 group, each group only 3 sequential) 2ee109a1-2ee109a2-2ee109a3 2ee10aa4-2ee10aa5-2ee10aa6 2ee10ba7-2ee10ba8-2ee10ba9 2ee10caa-2ee10cab-2ee10cac 2e94be67-2e94be68-2e94be69

Hi @peternewman , I just observed new fact,

The issue does not happen, for other new sequential UIDs, I just did new tests with new UIDs - all sequential too, - the issue does not occurs for this UIDs : 02ac2ea58b8d-02ac2ea58b8e-02ac2ea58b8f-02ac2ea58b90-02ac2ea58b91-02ac2ea58b92-02ac2ea58b93-02ac2ea58b94-02ac2ea58b95-02ac2ea58b96--02ac2ea58b97-02ac2ea58b98

I executed OLA discovery several times , for 4 devices , never skipped any device and found all 4 devices successfully, then I repeated for 12 devices , again never skipped any device and found all 12 devices successfully,

here is the sniffed .txt for new UIDs 12 devices : FIX=12, DISC=12 (all successful).txt

I did the test for old issued UIDs (2ee109a1-2ee109a2-2ee109a3-2ee109a4) again on exactly the same hardware and same firmware, the issue is still exists for old UIDs as before,

majcit avatar Dec 07 '18 08:12 majcit

I tried to execute rdmpro_sniffer [ options ] <usb-device-path> but I can't determine <usb-device-path> the example says rdmpro_sniffer -r /dev/tty.usbserial-00001014 in my Raspberry Pi there is no such file or similar :

I think you want /dev/ttyUSB0, I believe the other format is Mac style.

I'd still be curious if this works if you've got five minutes to test it. I can then update the docs to clarify too.

Thanks @majcit , sorry I'd not got back to you. So I've found the source of the issue, it's down to how we behave when we get a response from a phantom UID.

02ac:00000002 = aa 57 ae fd aa 55 aa 55 aa 55 aa 57 ae 57 aa ff
02ac:00000004 = aa 57 ae fd aa 55 aa 55 aa 55 ae 55 ae 57 ae fd
bitwise or gives:
aa 57 ae fd aa 55 aa 55 aa 55 ae 57 ae 57 ae ff
Which if you decode the EUID is:
02ac:00000006

We were seeing 6, failing to mute it, and giving up on that whole branch. Our test code was also broken, so although our tests passed, they didn't actually test this particular issue, fixing the test code made the tests fail, so I've then been able to fix and test the actual discovery code.

If you add the changes in DiscoveryAgent*.cpp from here and recompile, your issue should be fixed: https://github.com/OpenLightingProject/ola/pull/1520

Although there's a slightly more optimised fix I'm working on too...

peternewman avatar Dec 07 '18 16:12 peternewman

I'd still be curious if this works if you've got five minutes to test it. I can then update the docs to clarify too.

Sure, I will do it as soon as I get to workplace.

So I've found the source of the issue, it's down to how we behave when we get a response from a phantom UID. 02ac:00000002 = aa 57 ae fd aa 55 aa 55 aa 55 aa 57 ae 57 aa ff 02ac:00000004 = aa 57 ae fd aa 55 aa 55 aa 55 ae 55 ae 57 ae fd bitwise or gives: aa 57 ae fd aa 55 aa 55 aa 55 ae 57 ae 57 ae ff Which if you decode the EUID is: 02ac:00000006

Glad to hear the issue is traced, thanks. Actually I also simulated collision manually by bitwise AND'ing 2, 4, resulted 2 with coincidently true checksum, since I assumed 0 is dominent in sinking 1 to 0. I did't think OR!

Although there's a slightly more optimised fix I'm working on too...

I would kindly request 2 more things, if possible please consider for next releases 1. After RUN discovery a brief message of how many devices were found would be very useful to verify discovery is successful. 2. Automatic patching option for ascening/descending sorted UIDs may accelerate patching for sequentially mounted devices.

majcit avatar Dec 07 '18 18:12 majcit

I'd still be curious if this works if you've got five minutes to test it. I can then update the docs to clarify too.

Sure, I will do it as soon as I get to workplace.

Thanks.

Glad to hear the issue is traced, thanks. Actually I also simulated collision manually by bitwise AND'ing 2, 4, resulted 2 with coincidently true checksum, since I assumed 0 is dominent in sinking 1 to 0. I did't think OR!

I think it's the reverse, the pull up wins. Certainly the OR generates the DUB reply your packet capture includes.

Although there's a slightly more optimised fix I'm working on too...

I would kindly request 2 more things, if possible please consider for next releases 1. After RUN discovery a brief message of how many devices were found would be very useful to verify discovery is successful.

Do you mean in the olad log, or the output of ola_rdm_discover?

Automatic patching option for ascening/descending sorted UIDs may accelerate patching for sequentially mounted devices.

There is already auto-patch on the web UI (see the wand Wand ) button on the UI. The code for this is here, but currently it only sorts by footprint: https://github.com/OpenLightingProject/ola/blob/master/javascript/ola/full/rdm_patcher.js

I'm not sure patching by UID is relevant, aside from maybe a large architectural install, the chances of getting devices with UIDs in any logical order is fairly slim and even in that scenario, fitting them to the building in the correct order will still be quite a hassle; identifying using RDM and addressing appropriately may be just as quick.

For both requests, you're probably better off starting new issues with a bit more detail anyway.

peternewman avatar Dec 08 '18 01:12 peternewman