pynetworktables icon indicating copy to clipboard operation
pynetworktables copied to clipboard

reconnect errors

Open virtuald opened this issue 6 years ago • 8 comments

Some kind of queue buildup or race condition during reconnects...

Reference: https://www.chiefdelphi.com/forums/showthread.php?t=164590

virtuald avatar Apr 08 '18 01:04 virtuald

I wonder if this is an ntcore bug as well that we've inherited. Both Shuffleboard and pynetworktables2js proved to be not entirely reliable at competitions for my team this year.

(I've also seen other teams have to restart SmartDashboard whilst on the field, so definitely not just my team.)

auscompgeek avatar Apr 08 '18 04:04 auscompgeek

@PeterJohnson thoughts on that possibility?

virtuald avatar Apr 08 '18 04:04 virtuald

As I mentioned in that CD thread, waiting until we could ping the roboRIO has a much higher success rate for us. It went from about a 30% chance of working when we booted the roboRIO and vision code simultaneously, but adding the delayed start spiked that up to something around 80% from all my tests. Still not perfect and there's still some funkiness going on, but still a heck of a lot better. I'd suggest that as a temporary workaround until a fix for this is discovered.

andrewda avatar Apr 09 '18 16:04 andrewda

Seems like this fix stopped working today. We're getting connection to networktables (logs show that we get all the initial data on the coproccessor) but NetworkTables.isConnected is still false and we're unable to add new values. We have connection to the roboRIO and port 1735 on the roboRIO.

andrewda avatar Apr 19 '18 13:04 andrewda

I haven't taken the time to dig into this yet... unfortunately I imagine it has to difficult to reproduce (seeing as I haven't had the issue).

Have you upgraded to pynetworktables 2018.1.1 yet? That addresses some unicode handling issues that could cause a connection to fail.

virtuald avatar Apr 19 '18 13:04 virtuald

@andrewda do you use networktables flush in your code anywhere? https://github.com/wpilibsuite/ntcore/issues/275 sounds vaguely related.

virtuald avatar Apr 25 '18 06:04 virtuald

Apologies for not replying to your message from a week ago: I did update to 2018.1.1 at CMP and it didn't seem to make a difference.

A while ago, in an attempt to find a fix for this problem, I did try adding a flush immediately after attempting to initialize the connection, i.e. something like:

NetworkTables.initialize(server="10.25.21.2")
NetworkTables.flush()

I never removed this since my tests on 2018.0.1 (it didn't seem to make a difference at the time), so once I get access to a robot again I can try removing this flush or attempting to call it regularly as wpilibsuite/ntcore#275 suggests.

andrewda avatar Apr 25 '18 06:04 andrewda

Actually, that would be a good thing to try -- calling flush continuously.

If you're not already calling flush more than once, I wouldn't expect that bug to affect you. If calling flush a lot of times fixes it for you, that would be very interesting to know and would narrow down the potential culprits.

virtuald avatar Apr 25 '18 06:04 virtuald