easyMesh Confusion about dest and from in messages.

I am trying to write a non-andruino client for easyMesh, that hopefully can receive messages (and maybe later on become a full easyMesh member). Progress can be found on github.

Currently I am just trying to get basic sync to work, but am a bit puzzled about what is going on. As you can see below I get a sync request and then send back a (mostly empty) reply, with as destination that node and from my node. But from that point onward (second and third messages) the easyMesh node fills in its own node id as dest, not my clients id (which I would expect).

# First message received
["dest":0, "subs":[], "type":5, "from":13610488] 
# My Reply
Sending: {"dest":13610488, "from":34296, "type":6, "subs":[]}
# Second
["dest":13610488, "type":4, "msg":{"adopt":false,"num":0,"time":469349218}, "from":13610488]
Sending: {"dest":13610488, "from":34296, "type":5, "subs":[]}
# Third
["dest":13610488, "subs":[], "type":6, "from":13610488]

I tried digging through the easyMesh code, but that didn't clarify it either. Am I doing something wrong here?

Oct 19 '16 10:10 BlackEdder

I wonder if what you are seeing is part of this problem - http://www.esp8266.com/viewtopic.php?p=56059#p56059

Oct 19 '16 12:10 RudyFiero

The problem seems to be here: https://github.com/Coopdis/easyMesh/blob/master/src/easyMeshSync.cpp#L191 and https://github.com/Coopdis/easyMesh/blob/master/src/easyMeshSync.cpp#L153 https://github.com/Coopdis/easyMesh/blob/master/src/easyMeshSync.cpp#L222

Where I think _chipId should be replaced with conn->chipId

@Coopdis I am happy to send a pull request if this is indeed where the problems lies.

@RudyFiero I am not sure how this could lead to the given problem, but I guess it could happen somehow if the remoteChipId is updated based on this.

Oct 19 '16 13:10 BlackEdder

I agree with @BlackEdder ... the destId value is being set with the current nodes chipId ... in each of the examples linked to above ... and they should be pointing to the source chipId ... From what I can tell ... the nodesync and timesync functions take place between each node and it's parent node only ... these syncing function do not take place across connections. So in this case setting it to conn->chipId should work ... but if these sync function expand past just between the node and it's parent (or node-parent-node) then we should be using something like uint32_t destId = (uint32_t)root["from"];

this change has been added to https://github.com/sfranzyshen/easyMesh/tree/devel

Oct 20 '16 22:10 sfranzyshen

this does appear to correct the problems related to nodes sending themselves messages ...

Oct 24 '16 01:10 sfranzyshen

I have been running the code you uploaded today. I have not seen the self sending of messages. But I still have the crashing with the rollover of the timer.

I did have things go nuts on me at one time. Tons of messages and I don't think it was a rollover issue. In fact it started happening just now. And it goes on for a couple of minutes and then seems to calm down. At least it did this time. Other times I did a reset to the modules.

Other than the rollover problem I do have some concerns. The mesh seems to break up too easily. On the LCDs I have connected to the modules I show the WiFi channel. Sometimes a connected module will do a scanning of other channels, repeatedly. And while this is happening any messages going through that module, or to that module, get lost. So the concern that I have with this mesh approach is the lack of reliability. (excluding the rollover problem)

I'm not sure that new scans are the only issue. It seems that nodes can break and then the system does a realignment and goes through a procedure and fixes itself. But during the breakup and the fixing the much of the network is dead and messages are lost.

Added on top of this protocol there needs to be a way to confirm packet reception. But I still think there needs to be less breakdowns in the first place. But much of the time it all seems to work well.

One more thing. I have seven modules running now.

Oct 24 '16 03:10 RudyFiero

Update: I am mistaken ... the repeat_flag is being set false (0) ... so this makes or shouldn't make a difference ... I too am coming to the conclusion that this whole protocol (code) is fun for the demo ... but is becoming obvious just how inefficient and useless it is in practice ... this whole thing needs to have the basic building blocks redesigned and tightened up ... I'm stepping back even further on this ... and starting over ...

@RudyFiero

Sometimes a connected module will do a scanning of other channels, repeatedly.

if a STA doesn't find an AP to connect to on the first round ... it starts a timer that calls the scanTimerCallback() function after SCAN_INTERVAL (10000) ... the problem is the timer never gets turned off again ... so even if a connection is made ... it continues to perform scans every SCAN_INTERVAL ... forever ... to make matters worse wifi scans appear to slow down the nodesync & timesync messages to a point they reach the NODE_TIMEOUT (3000000 //uSecs) and cause the AP to drop the STA TCP connection ... that in turn causes the STA to disconnect from wifi and that starts the process all over again ... so I have added disarming the timer within the scantimer callback function that should handle the unnecessary scans ... change your scanTimerCallback() in the easyMeshSTA.cpp file to this ...

//***********************************************************************
void ICACHE_FLASH_ATTR easyMesh::scanTimerCallback( void *arg ) {
    os_timer_disarm(&staticThis->_scanTimer);
    staticThis->startStationScan();

    // this function can be totally elimiated!
}

I believe the getNodeTime() is responsible for the wdt resets with this line uint32_t ret = system_get_time() + timeAdjuster; If system_get_time() + timeAdjuster ... exceeds uint32_t ... boom

Oct 24 '16 04:10 sfranzyshen

I did have things go nuts on me at one time. Tons of messages and I don't think it was a rollover issue. In fact it started happening just now. And it goes on for a couple of minutes and then seems to calm down. At least it did this time. Other times I did a reset to the modules.

Slightly OT, but looking at the code I am still unclear on how reliable routing is in larger meshes. It seems that with 3+ AP nodes there is a possible of a triangular route forming (i.e. A wants to send to D, but is not directly connected. Instead they start sending to each other: A->B->C->A). This could cause lots of messages (same message going around and around) to happen. From the code I couldn't find anything stopping this, but it might be that the mesh layout prevents this from happening.

Oct 24 '16 05:10 BlackEdder

I display the node count on the LCD. I have usually had five modules but now running with seven. But the node count has gone above ten. I see it after the fact. I write the node count to a character location on the screen and I don't clear any previous trailing digits that happen when the node count gets to ten or above.

I had it running all night and all the modules were alive. Sometimes wdt resets do not bring them back. I did see it start to go nuts but it had returned to normal again when I left for work.

I'm going to try and display the connections information on the LCDs. I'll try and figure that out when I get home tonight.

Oct 24 '16 13:10 RudyFiero

easyMesh easyMesh copied to clipboard

Confusion about dest and from in messages.

easyMesh
easyMesh copied to clipboard