OpenWISP-Geographic-Monitoring icon indicating copy to clipboard operation
OpenWISP-Geographic-Monitoring copied to clipboard

Mysql::Error: Lost connection to MySQL server

Open marino-mrc opened this issue 9 years ago • 13 comments

Hi, I have about 100 ap monitored with OWGM on a ubuntu vm (4GB ram, dual core). Randomly I have crashes from backgroudrb daemon. Errors are: "Mysql::Error: MySQL server has gone away: SELECT alerts.* FROM alerts WHERE alerts.sent = 0 AND alerts.access_point_id = 102" ... /home/ubuntu/.rvm/rubies/ruby-1.8.7-p374/lib/ruby/1.8/open3.rb:59:in fork': Cannot allocate memory - fork(2) (Errno::ENOMEM) ... /home/ubuntu/.rvm/gems/ruby-1.8.7-p374/gems/activerecord-3.0.9/lib/active_record/connection_adapters/mysql_adapter.rb:614:inreal_connect': Can't create UNIX socket (24) (Mysql::Error)

It seems a problem with memory (too much used), but it uses 4GB of ram for 100 ap? How is this possible? Is there some tuning to do for mysql server? Please, could you give me some advises? Thank you

marino-mrc avatar Jan 11 '16 07:01 marino-mrc

Which revision are you using?

Are you able to understand which operation is eating up so much memory?

nemesifier avatar Jan 11 '16 09:01 nemesifier

I'm using OWGM with latest commit (master branch). I think the problem is related to "clean activity" backgroud process. Too much memory used, perhaps? Should I tune mysql for this task?

marino-mrc avatar Jan 11 '16 10:01 marino-mrc

Tell me the exact commit hash please, check with git log -1. Also check more carefully the name of the background task.

There was a recent change to address a similar issue: https://github.com/openwisp/OpenWISP-Geographic-Monitoring/commit/429df76509835fa8e34d04fd0be575e663c560f1

We didn't need to do much tuning of mysql, we just ensured the task runs more often so has less data to process.

But currently I'm not sure if it's the same issue or not, please send me precise information about commit hash and exact name of background task which is causing the problem.

nemesifier avatar Jan 11 '16 10:01 nemesifier

ubuntu@ow-service:~/OpenWISP-Geographic-Monitoring$ git log -2 commit fea1fb3416c43e2319e7cd55039b477b721ebe8c Author: nemesisdesign <...> Date: Fri Jan 8 10:26:52 2016 +0100

Avoid exception in AccessPoint.build_property_set_if_group_name_empty

commit 429df76509835fa8e34d04fd0be575e663c560f1 Author: nemesisdesign <...> Date: Mon Dec 28 10:07:57 2015 +0100

Run housekeeping often to avoid blocking #49

- moved ActivityHistory cleanup to a separate worker that runs every day at 2:30 am
- run rest of housekeeping tasks every day at 1:30 am

ubuntu@ow-service:~/OpenWISP-Geographic-Monitoring$

My config/backgroundrb.yml is here -> http://pastebin.com/KB630iry and, after I pulled the commit to address a similar issue, problem starts at: [Sun Jan 10 00:35:14 +0100 2016] Mysql::Error: Lost connection to MySQL server during query: SELECT alerts.* FROM alerts WHERE alerts.sent = 0 AND alerts.access_point_id = 99

so I think the problem is related to the task that starts at 00:30

marino-mrc avatar Jan 11 '16 10:01 marino-mrc

Default hours are 1:30 and 2:30 AM, there's no 00:30. Did you mean 1:30?

nemesifier avatar Jan 11 '16 10:01 nemesifier

No. There are 2 possibilities: 1) error is not related to the task or 2) log timestamp are 1 hour behind in log/background_debug_11012.log

marino-mrc avatar Jan 11 '16 11:01 marino-mrc

Both are possible, I cannot help very much in this case.

What I can suggest is to run the code in housekeeping and clean_activityhistory manually in the rails console and see what happens.

To launch the console:

RAILS_ENV=production bundle exec rails console

nemesifier avatar Jan 11 '16 11:01 nemesifier

How can I launch workers from console? I cannot instantiate an object of type MonitoringWorker....

marino-mrc avatar Jan 11 '16 12:01 marino-mrc

Do not try to launch the MonitoringWorker, instead copy the code of each task and execute it in the console, one at time.

nemesifier avatar Jan 11 '16 12:01 nemesifier

Ok... I can work on this, but I have a problem probably related: maybe I have some kind of database corruption on OWGM because actually I see the same ap belonging to 2 groups: mygroup and NoGroup. Is this possible? Can I fix this behavior without losing history?

marino-mrc avatar Jan 14 '16 10:01 marino-mrc

This should not be possible, are you sure it's not 2 access points with a similar name belonging to different groups?

nemesifier avatar Jan 14 '16 12:01 nemesifier

It's the same ap. "data-ap-id" attribute is the same in html. Could give me some advises on how solve this problem? I need to preserve the history related tables.

marino-mrc avatar Jan 15 '16 07:01 marino-mrc

Someone can help me please with duplicate ap? Database seems to be safe because I don't have duplicate AP, but in the view I see 2 AP with same name, same MAC and same data-ap-id html attribute. It seems that the same AP is associated with 2 different group ("my group" and "No group"). Is this possible?

marino-mrc avatar Jan 25 '16 09:01 marino-mrc