thingsboard-edge icon indicating copy to clipboard operation
thingsboard-edge copied to clipboard

Loosing connection cloud <-> edge after few hours --- RPC Error: NO_ACTIVE_CONNECTION

Open AdrienAdB opened this issue 2 years ago • 13 comments

Describe the bug

Self hosted TB cloud seems to loose connection with Edge after sometime (hours). Issue apply only for RPC calls from cloud server. Error from audit log TB Cloud: "RPC Error: NO_ACTIVE_CONNECTION" It seems like unassigning/reassigning Edge device restart connection, RPC call will work after that.

Your Server Environment

  • own setup
    • cloud
    • ThingsBoard Version 3.3.4.1 Community
    • TB Edge 3.3.4.1 Community
    • Ubuntu server

To Reproduce Steps to reproduce the behavior:

  1. Make RPC call from cloud. It works.
  2. Wait few hours, I have the problem over night on the next morning.
  3. Try same RPC call from TB cloud dashboard (audit log: RPC Error: NO_ACTIVE_CONNECTION). I can confirm the same RPC call works fine from TB Edge dashboard.
  4. Unassign Edge device.
  5. Assign Edge device.
  6. RPC call success.

Screenshots

Screenshot 2022-05-18 at 10 56 17

AdrienAdB avatar May 18 '22 04:05 AdrienAdB

hi @AdrienAdB that is most probably related to the active status of the device on the cloud. That's a known bug that is targeted to be fixed in the next release. Because the device is connected to the edge, its status on the cloud is not properly updated and the RPC call failed. I'm going to review it later in details and provide you feedback on the exact reason.

volodymyr-babak avatar May 18 '22 11:05 volodymyr-babak

Thanks @volodymyr-babak.

AdrienAdB avatar May 18 '22 11:05 AdrienAdB

Hello @AdrienAdB I'm trying to reproduce this problem and fix it. Could you please provide your device protocol? Are you using MQTT? Do you send some data overnight from edge to cloud?

volodymyr-babak avatar Jun 09 '22 16:06 volodymyr-babak

Hello,

  • http from device to edge.
  • I use default conf from edge to cloud. (not sure about the protocol). Find my conf below.
  • No data coming overnight.
  • TB-Edge is behind NAT, it has accesss to internet (tb-cloud), but no incoming port forwarding to its instance.
  • TB-Cloud has 1883,7070,80,443 opened.

Overall setup work really well, TB-Edge make remote very fast. Only issue is this disconnection overnight.

I will try to send extra data overnight, something every 10min and see if problem persists. That can be an easy work around time issue is resolved.

# /etc/tb-edge/conf/tb-edge.conf 
#
# Copyright © 2016-2022 The Thingsboard Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

export JAVA_OPTS="$JAVA_OPTS -Dplatform=deb -Dinstall.data_dir=/usr/share/tb-edge/data"
export JAVA_OPTS="$JAVA_OPTS -Xlog:gc*,heap*,age*,safepoint=debug:file=/var/log/tb-edge/gc.log:time,uptime,level,tags:filecount=10,filesize=10M"
export JAVA_OPTS="$JAVA_OPTS -XX:+IgnoreUnrecognizedVMOptions -XX:+HeapDumpOnOutOfMemoryError"
export JAVA_OPTS="$JAVA_OPTS -XX:-UseBiasedLocking -XX:+UseTLAB -XX:+ResizeTLAB -XX:+PerfDisableSharedMem -XX:+UseCondCardMark"
export JAVA_OPTS="$JAVA_OPTS -XX:+UseG1GC -XX:MaxGCPauseMillis=500 -XX:+UseStringDeduplication -XX:+ParallelRefProcEnabled -XX:MaxTenuringThreshold=10"
export LOG_FILENAME=tb-edge.out
export LOADER_PATH=/usr/share/tb-edge/conf,/usr/share/tb-edge/extensions
export SQL_DATA_FOLDER=/usr/share/tb-edge/data/sql

# UNCOMMENT NEXT LINES AND PUT YOUR CLOUD CONNECTION SETTINGS:
export CLOUD_ROUTING_KEY=xxxxxx-xxxxxxxx
export CLOUD_ROUTING_SECRET=xxxxxx

# UNCOMMENT NEXT LINES IF EDGE CONNECTS TO CE 'DEMO.THINGSBOARD.IO' SERVER:
export CLOUD_RPC_HOST=xxxxxx

# UNCOMMENT NEXT LINES IF YOU CHANGED DEFAULT CLOUD RPC HOST/PORT SETTINGS:
# export CLOUD_RPC_HOST=xxxxxx
# export CLOUD_RPC_PORT=7070

# UNCOMMENT NEXT LINES IF YOU ARE RUNNING EDGE ON THE SAME MACHINE WHERE THINGSBOARD SERVER IS RUNNING:
# export HTTP_BIND_PORT=18080
# export MQTT_BIND_PORT=11883
# export COAP_BIND_PORT=15683

# UNCOMMENT NEXT LINES IF YOU HAVE CHANGED DEFAULT POSTGRESQL DATASOURCE SETTINGS:
# export SPRING_DATASOURCE_URL=jdbc:postgresql://localhost:5432/tb_edge
export SPRING_DATASOURCE_USERNAME=postgres
export SPRING_DATASOURCE_PASSWORD=xxxxxx

AdrienAdB avatar Jun 12 '22 14:06 AdrienAdB

Device is now sending "keepAlive" attribute every 1min. I let you know tomorrow...

AdrienAdB avatar Jun 13 '22 01:06 AdrienAdB

Hi, "keepAlive" attribute every minute didn't fix issue.

AdrienAdB avatar Jun 15 '22 10:06 AdrienAdB

Hello @AdrienAdB

thanks for the updates. Pull request that should fix this issue was created: https://github.com/thingsboard/thingsboard-pe/pull/897

It should be available next release. I'm going to validate this use case separately and let you know the results before the release. But this will work only in the case of sending "keepAlive" events from the device to keep the session active on a cloud.

volodymyr-babak avatar Jun 20 '22 15:06 volodymyr-babak

Have you knew the fix for this?

truongvanhuy2000 avatar Mar 19 '23 05:03 truongvanhuy2000

@truongvanhuy2000

please provide additional details on your issue

  1. Are you seeing 'RPC Error: NO_ACTIVE_CONNECTION' error in the logs?
  2. Do you send any data from the edge to the cloud actively? Or you have some pauses in sending data?

So please provide any additional data so issue can be reproduced and fixed.

volodymyr-babak avatar Mar 21 '23 11:03 volodymyr-babak

hi @AdrienAdB that is most probably related to the active status of the device on the cloud. That's a known bug that is targeted to be fixed in the next release. Because the device is connected to the edge, its status on the cloud is not properly updated and the RPC call failed. I'm going to review it later in details and provide you feedback on the exact reason.

Hi @volodymyr-babak I'm on TB Edge 3.4.4 and just got exactly the same problem. I'm using MQTT between the TB Edge and devices. Here's what I get in Audit Logs of the device image

After the "unassign -> assign" the problem is gone.

AndreMaz avatar May 04 '23 09:05 AndreMaz

@AndreMaz

Can you kindly check if there's an RPC_CALL event logged immediately before or after you notice the NO_ACTIVE_CONNECTION error? This will help us ascertain whether the RPC_CALL request is being sent to the edge, or if it's not leaving the cloud.

image

Also, it would be insightful to determine if there are any RPC_CALL cloud events being logged on the edge. If the RPC_CALL is being sent from the cloud but is not being received at the edge, it may signify network issues or problems with the edge's ability to process the RPC_CALL.

Your observations on these points will be very valuable for us to pinpoint the issue and help you further.

I look forward to your response. If you have any additional questions or need further clarification, please don't hesitate to ask.

volodymyr-babak avatar May 12 '23 07:05 volodymyr-babak

Hi, after the new update to ThingsBoard, the RPC Call Request at Cloud is showing No Active Connection. From the Edge it can send the RPC request.

Kindly share any hints, thanks.

image

akseerali avatar May 23 '23 15:05 akseerali

Hi, after the new update to ThingsBoard, the RPC Call Request at Cloud is showing No Active Connection. From the Edge it can send the RPC request.

Kindly share any hints, thanks.

image

Problem is resolved. Please refer this issue for more details.

akseerali avatar May 29 '23 16:05 akseerali