nautobot-app-golden-config icon indicating copy to clipboard operation
nautobot-app-golden-config copied to clipboard

MySQL -> Failed to Get Lock

Open jvanderaa opened this issue 2 years ago • 2 comments

Environment

  • Python version: 3.10
  • Nautobot version: 1.4.1
  • nautobot-golden-config version: 1.2

Steps to Reproduce

  1. Move from Postgres to MySQL since I am still early in the data in this instance
  2. Try to do config backup which worked on postgres
  3. Failure below

Expected Behavior

Config backed up

Observed Behavior

Error:

  File "/opt/nautobot/lib/python3.10/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/backends/mysql/base.py", line 73, in execute
    return self.cursor.execute(query, args)
  File "/opt/nautobot/lib/python3.10/site-packages/MySQLdb/cursors.py", line 206, in execute
    res = self._query(query)
  File "/opt/nautobot/lib/python3.10/site-packages/MySQLdb/cursors.py", line 319, in _query
    db.query(q)
  File "/opt/nautobot/lib/python3.10/site-packages/MySQLdb/connections.py", line 254, in query
    _mysql.connection.query(self, query)
MySQLdb.OperationalError: (1205, 'Lock wait timeout exceeded; try restarting transaction')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/nautobot/lib/python3.10/site-packages/nautobot_golden_config/nornir_plays/config_backup.py", line 133, in config_backup
    nr_with_processors.run(
  File "/opt/nautobot/lib/python3.10/site-packages/nornir/core/__init__.py", line 139, in run
    result = self.runner.run(task, run_on)
  File "/opt/nautobot/lib/python3.10/site-packages/nornir/plugins/runners/__init__.py", line 43, in run
    worker_result = future.result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/nautobot/lib/python3.10/site-packages/nornir/core/task.py", line 136, in start
    self.processors.task_instance_completed(self, host, self.results)
  File "/opt/nautobot/lib/python3.10/site-packages/nornir/core/processor.py", line 85, in task_instance_completed
    p.task_instance_completed(task, host, result)
  File "/opt/nautobot/lib/python3.10/site-packages/nautobot_golden_config/nornir_plays/processor.py", line 34, in task_instance_completed
    self.logger.log_failure(task.host.data["obj"], f"{task.name} failed: {result.exception}")
  File "/opt/nautobot/lib/python3.10/site-packages/nornir_nautobot/utils/logger.py", line 49, in log_failure
    self.nautobot_job.log_failure(obj, message)
  File "/opt/nautobot/lib/python3.10/site-packages/nautobot/extras/jobs.py", line 575, in log_failure
    self._log(obj, message, level_choice=LogLevelChoices.LOG_FAILURE)
  File "/opt/nautobot/lib/python3.10/site-packages/nautobot/extras/jobs.py", line 517, in _log
    self.job_result.log(
  File "/opt/nautobot/lib/python3.10/site-packages/nautobot/extras/models/jobs.py", line 760, in log
    log.save(using=JOB_LOGS)
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/models/base.py", line 739, in save
    self.save_base(using=using, force_insert=force_insert,
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/models/base.py", line 776, in save_base
    updated = self._save_table(
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/models/base.py", line 881, in _save_table
    results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/models/base.py", line 919, in _do_insert
    return manager._insert(
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/models/query.py", line 1270, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1416, in execute_sql
    cursor.execute(sql, params)
  File "/opt/nautobot/lib/python3.10/site-packages/cacheops/transaction.py", line 97, in execute
    result = self._no_monkey.execute(self, sql, params)
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/backends/utils.py", line 66, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/backends/utils.py", line 79, in _execute
    with self.db.wrap_database_errors:
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/backends/mysql/base.py", line 73, in execute
    return self.cursor.execute(query, args)
  File "/opt/nautobot/lib/python3.10/site-packages/MySQLdb/cursors.py", line 206, in execute
    res = self._query(query)
  File "/opt/nautobot/lib/python3.10/site-packages/MySQLdb/cursors.py", line 319, in _query
    db.query(q)
  File "/opt/nautobot/lib/python3.10/site-packages/MySQLdb/connections.py", line 254, in query
    _mysql.connection.query(self, query)
django.db.utils.OperationalError: (1205, 'Lock wait timeout exceeded; try restarting transaction')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/backends/mysql/base.py", line 73, in execute
    return self.cursor.execute(query, args)
  File "/opt/nautobot/lib/python3.10/site-packages/MySQLdb/cursors.py", line 206, in execute
    res = self._query(query)
  File "/opt/nautobot/lib/python3.10/site-packages/MySQLdb/cursors.py", line 319, in _query
    db.query(q)
  File "/opt/nautobot/lib/python3.10/site-packages/MySQLdb/connections.py", line 254, in query
    _mysql.connection.query(self, query)
MySQLdb.OperationalError: (1205, 'Lock wait timeout exceeded; try restarting transaction')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/nautobot/lib/python3.10/site-packages/nautobot/extras/jobs.py", line 1172, in _run_job
    output = job.run(data=data, commit=commit)
  File "/opt/nautobot/lib/python3.10/site-packages/nautobot_golden_config/jobs.py", line 51, in inner
    return method(obj, data, commit)
  File "/opt/nautobot/lib/python3.10/site-packages/nautobot_golden_config/jobs.py", line 189, in run
    config_backup(self, data)
  File "/opt/nautobot/lib/python3.10/site-packages/nautobot_golden_config/nornir_plays/config_backup.py", line 144, in config_backup
    logger.log_failure(None, err)
  File "/opt/nautobot/lib/python3.10/site-packages/nornir_nautobot/utils/logger.py", line 49, in log_failure
    self.nautobot_job.log_failure(obj, message)
  File "/opt/nautobot/lib/python3.10/site-packages/nautobot/extras/jobs.py", line 575, in log_failure
    self._log(obj, message, level_choice=LogLevelChoices.LOG_FAILURE)
  File "/opt/nautobot/lib/python3.10/site-packages/nautobot/extras/jobs.py", line 517, in _log
    self.job_result.log(
  File "/opt/nautobot/lib/python3.10/site-packages/nautobot/extras/models/jobs.py", line 760, in log
    log.save(using=JOB_LOGS)
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/models/base.py", line 739, in save
    self.save_base(using=using, force_insert=force_insert,
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/models/base.py", line 776, in save_base
    updated = self._save_table(
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/models/base.py", line 881, in _save_table
    results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/models/base.py", line 919, in _do_insert
    return manager._insert(
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/models/query.py", line 1270, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1416, in execute_sql
    cursor.execute(sql, params)
  File "/opt/nautobot/lib/python3.10/site-packages/cacheops/transaction.py", line 97, in execute
    result = self._no_monkey.execute(self, sql, params)
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/backends/utils.py", line 66, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/backends/utils.py", line 79, in _execute
    with self.db.wrap_database_errors:
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/backends/mysql/base.py", line 73, in execute
    return self.cursor.execute(query, args)
  File "/opt/nautobot/lib/python3.10/site-packages/MySQLdb/cursors.py", line 206, in execute
    res = self._query(query)
  File "/opt/nautobot/lib/python3.10/site-packages/MySQLdb/cursors.py", line 319, in _query
    db.query(q)
  File "/opt/nautobot/lib/python3.10/site-packages/MySQLdb/connections.py", line 254, in query
    _mysql.connection.query(self, query)
django.db.utils.OperationalError: (1205, 'Lock wait timeout exceeded; try restarting transaction')

jvanderaa avatar Aug 24 '22 23:08 jvanderaa

Can you test with @jeffkala? This is likely since we are not releasing the sql connection correctly, but looking for a good place to disconnect and not finding a good test env.

itdependsnetworks avatar Aug 24 '22 23:08 itdependsnetworks

I have the same problem. Is there any solution for this?

milan2655 avatar Oct 17 '22 13:10 milan2655

@jeffkala

jvanderaa avatar Nov 13 '22 20:11 jvanderaa

I have the same problem. Is there any solution for this?

An exception occurred: OperationalError: (1205, 'Lock wait timeout exceeded; try restarting transaction')

abdikanipd avatar Jan 14 '23 05:01 abdikanipd

I can't duplicate this, can I get a more fully step by step to reproduce?

jeffkala avatar Feb 16 '23 23:02 jeffkala

from public chats

I'm trying to set up Nautobot in a not quite production yet environment.  I have data entered for one of my devices.  I've written a Jinja tempalate and a GraphQL query, and am now trying to use Golden Config to attempt a config generation.
I'm using MySQL as the database.  Nautobot seems to be able to pull device data.  In addition to displaying stuff, GraphQL aimed at the API is able to pull all the data I'm looking for.
But when I go to Job Results / Golden Configuration / Generate Intended Configurations and tell it to generate a configuration for my test device, I'm getting a database locking error.
The error that shows up is this:
Traceback (most recent call last):
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/opt/nautobot/lib/python3.10/site-packages/django/db/backends/mysql/base.py", line 73, in execute
    return self.cursor.execute(query, args)
  File "/opt/nautobot/lib/python3.10/site-packages/MySQLdb/cursors.py", line 206, in execute
    res = self._query(query)
  File "/opt/nautobot/lib/python3.10/site-packages/MySQLdb/cursors.py", line 319, in _query
    db.query(q)
  File "/opt/nautobot/lib/python3.10/site-packages/MySQLdb/connections.py", line 254, in query
    _mysql.connection.query(self, query)
MySQLdb.OperationalError: (1205, 'Lock wait timeout exceeded; try restarting transaction')
The sequence of events in the log is this:
2023-05-10 17:36:22.377530	run	Default	—	
Starting intended job.
2023-05-10 17:36:22.383246	run	Default	—	
Pull Jinja template repos.
2023-05-10 17:36:23.109212	main	Success	—	
Repository successfully refreshed
2023-05-10 17:37:13.134948	run	Failure	—	
An exception occurred: OperationalError: (1205, 'Lock wait timeout exceeded; try restarting transaction')
When I run "SHOW ENGINE INNODB STATUS" on the MySQL server during the query, this appears to be the relevant content (ignore the timestamps; this is from a different run, but the results are consistent):
------------
TRANSACTIONS
------------
Trx id counter 27427
Purge done for trx's n:o < 27422 undo n:o < 0 state: running but idle
History list length 2
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421423040477560, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421423040481600, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421423040480792, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421423040476752, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421423040475944, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421423040475136, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421423040474328, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421423040473520, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421423040472712, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 27426, ACTIVE 45 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1128, 1 row lock(s), undo log entries 1
MySQL thread id 27, OS thread handle 139947888539200, query id 2021 localhost nautobot update
INSERT INTO `extras_joblogentry` (`id`, `job_result_id`, `log_level`, `grouping`, `message`, `created`, `log_object`, `absolute_url`) VALUES ('d9a82ad39abc4bf2b078b555fd6e3c7f', 'ba1a5a8035b1423dae37b32872217fb2', 'default', 'run', 'Pull Intended config repos.', '2023-05-09 23:46:44.650444', NULL, NULL)
------- TRX HAS BEEN WAITING 45 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 312 page no 4 n bits 136 index PRIMARY of table `nautobot`.`extras_jobresult` trx id 27426 lock mode S locks rec but not gap waiting
Record lock, heap no 63 PHYSICAL RECORD: n_fields 15; compact format; info bits 64
 0: len 30; hex 626131613561383033356231343233646165333762333238373232313766; asc ba1a5a8035b1423dae37b32872217f; (total 32 bytes);
 1: len 6; hex 000000006b20; asc     k ;;
 2: len 7; hex 010000018a0269; asc       i;;
 3: len 30; hex 706c7567696e732f6e6175746f626f745f676f6c64656e5f636f6e666967; asc plugins/nautobot_golden_config; (total 47 bytes);
 4: len 8; hex 99b0137bab0d1410; asc    {    ;;
 5: SQL NULL;
 6: len 7; hex 72756e6e696e67; asc running;;
 7: len 19; hex 00010012000b0006000c11006f757470757400; asc             output ;;
 8: len 30; hex 643437326335653730323133346465356263376334346238646164363335; asc d472c5e702134de5bc7c44b8dad635; (total 32 bytes);
 9: len 4; hex 80000010; asc     ;;
 10: len 30; hex 393030633233623032333239346531623931363466663361356135636532; asc 900c23b023294e1b9164ff3a5a5ce2; (total 32 bytes);
 11: len 5; hex 0000000400; asc      ;;
 12: SQL NULL;
 13: len 30; hex 306530313433303330653932343765366237323135373764353837613036; asc 0e0143030e9247e6b721577d587a06; (total 32 bytes);
 14: SQL NULL;
------------------
---TRANSACTION 27424, ACTIVE 46 sec
9 lock struct(s), heap size 1128, 4 row lock(s), undo log entries 3
MySQL thread id 26, OS thread handle 139947555231296, query id 2020 localhost nautobot
Is there something obvious I'm doing wrong here, or anything I should be looking at to debug this further?

jeffkala avatar May 10 '23 18:05 jeffkala

Screen Shot 2023-05-10 at 11 55 45 AM Screen Shot 2023-05-10 at 11 54 48 AM

jeffkala avatar May 10 '23 21:05 jeffkala

lock appears to be occurring here. https://github.com/nautobot/nautobot-plugin-golden-config/blob/03e0b0c1c0bf5bd3fdf218ae17f76bb122cd10f5/nautobot_golden_config/jobs.py#L153

jeffkala avatar May 10 '23 21:05 jeffkala

I've narrowed this down thanks to @scgibbard who had reproduced this issue and provided access to the enviornment.

This has something to do specifically with ensure_git_repository call within this for loop.

def get_refreshed_repos(job_obj, repo_type, data=None):
    """Small wrapper to pull latest branch, and return a GitRepo plugin specific object."""
    devices = get_job_filter(data)
    dynamic_groups = DynamicGroup.objects.exclude(golden_config_setting__isnull=True)
    repository_records = set()
    # Iterate through DynamicGroups then apply the DG's filter to the devices filtered by job.
    for group in dynamic_groups:
        repo = getattr(group.golden_config_setting, repo_type, None)
        if repo and devices.filter(group.generate_query()).exists():
            repository_records.add(repo.id)

    repositories = []
    for repository_record in repository_records:
        repo = GitRepository.objects.get(id=repository_record)
        ensure_git_repository(repo, job_obj.job_result). # This is the issue.
        git_repo = GitRepo(repo)
        repositories.append(git_repo)

    return repositories

jeffkala avatar May 11 '23 19:05 jeffkala

specifically this save() call is causing the problem. https://github.com/nautobot/nautobot/blob/0710c6f7672f7a76b9bcd6dc2efc05c7eefbe0b5/nautobot/extras/datasources/git.py#LL301C6-L301C6

jeffkala avatar May 11 '23 20:05 jeffkala

specifically this save() call is causing the problem. https://github.com/nautobot/nautobot/blob/0710c6f7672f7a76b9bcd6dc2efc05c7eefbe0b5/nautobot/extras/datasources/git.py#LL301C6-L301C6

Commenting this line out does fix the problem, and all logging/job functions still complete as expected. From some chats with core team it doesn't seem this line is even needed.

jeffkala avatar May 11 '23 20:05 jeffkala

@jeffkala Confirmed this worked for me as well, ty!!

gt732 avatar Jun 30 '23 14:06 gt732

Affected me as well on a docker compose install of nautobot 1.6.2, golden configuration 1.5.0, python 3.9, mysql 8.1.0 I can confirm that uncommenting that job_result.save() line allowed the job to proceed

gioccher avatar Sep 12 '23 17:09 gioccher

I just ran into this same issue when attempting to use use Intended Config with:

  • python 3.11
  • Golden Configuration 1.6.3
  • Nautobot 1.6.5
  • MySQL 8.1.0

Are there plans to roll this change into ltm-1.6 ?

nrnvgh avatar Nov 27 '23 20:11 nrnvgh

I just ran into this same issue when attempting to use use Intended Config with:

  • python 3.11
  • Golden Configuration 1.6.3
  • Nautobot 1.6.5
  • MySQL 8.1.0

Are there plans to roll this change into ltm-1.6 ?

@nrnvgh A workaround was merged into ltm-1.6 with #695. A new LTM release with the fix should be out soon

DistantVoyager avatar Jan 11 '24 15:01 DistantVoyager

Fixed in #695 with release v1.6.4

DistantVoyager avatar Jan 11 '24 18:01 DistantVoyager