frontera icon indicating copy to clipboard operation
frontera copied to clipboard

WIP: Check if table already exists before creating it

Open lopuhin opened this issue 9 years ago • 5 comments

If we have SQLALCHEMYBACKEND_DROP_ALL_TABLES = False, then without this fix we will fail to start db and strategy workers with an existing database, because the tables are already there, but we try to create them.

Potentially, creating the table without checking for existance could cause troubles even with SQLALCHEMYBACKEND_DROP_ALL_TABLES = True, when several workers would try to create required tables concurrently, but I never tried to reproduce this.

lopuhin avatar Jun 18 '16 10:06 lopuhin

This check is not enough to prevent errors during concurrent table creation, at least with postgres. Although this looks more like a sqlalchemy bug:

db-worker-incoming-1_1  | Traceback (most recent call last):
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
db-worker-incoming-1_1  |     context)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
db-worker-incoming-1_1  |     cursor.execute(statement, parameters)
db-worker-incoming-1_1  | psycopg2.IntegrityError: duplicate key value violates unique constraint "pg_type_typname_nsp_index"
db-worker-incoming-1_1  | DETAIL:  Key (typname, typnamespace)=(queue_id_seq, 2200) already exists.
db-worker-incoming-1_1  | 
db-worker-incoming-1_1  | 
db-worker-incoming-1_1  | The above exception was the direct cause of the following exception:
db-worker-incoming-1_1  | 
db-worker-incoming-1_1  | Traceback (most recent call last):
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/runpy.py", line 170, in _run_module_as_main
db-worker-incoming-1_1  |     "__main__", mod_spec)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/runpy.py", line 85, in _run_code
db-worker-incoming-1_1  |     exec(code, run_globals)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/frontera/worker/db.py", line 280, in <module>
db-worker-incoming-1_1  |     worker = DBWorker(settings, args.no_batches, args.no_incoming)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/frontera/worker/db.py", line 74, in __init__
db-worker-incoming-1_1  |     self._manager = FrontierManager.from_settings(settings, db_worker=True)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/frontera/core/manager.py", line 282, in from_settings
db-worker-incoming-1_1  |     strategy_worker=strategy_worker)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/frontera/core/manager.py", line 243, in __init__
db-worker-incoming-1_1  |     strategy_worker=strategy_worker)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/frontera/core/manager.py", line 24, in __init__
db-worker-incoming-1_1  |     self._backend = self._load_backend(backend, db_worker, strategy_worker)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/frontera/core/manager.py", line 54, in _load_backend
db-worker-incoming-1_1  |     return cls.db_worker(self)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/frontera/contrib/backends/sqlalchemy/__init__.py", line 162, in db_worker
db-worker-incoming-1_1  |     queue_m.__table__.create(bind=b.engine, checkfirst=True)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/sql/schema.py", line 725, in create
db-worker-incoming-1_1  |     checkfirst=checkfirst)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1856, in _run_visitor
db-worker-incoming-1_1  |     conn._run_visitor(visitorcallable, element, **kwargs)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1481, in _run_visitor
db-worker-incoming-1_1  |     **kwargs).traverse_single(element)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/sql/visitors.py", line 121, in traverse_single
db-worker-incoming-1_1  |     return meth(obj, **kw)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/sql/ddl.py", line 764, in visit_table
db-worker-incoming-1_1  |     include_foreign_key_constraints=include_foreign_key_constraints
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 914, in execute
db-worker-incoming-1_1  |     return meth(self, multiparams, params)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/sql/ddl.py", line 68, in _execute_on_connection
db-worker-incoming-1_1  |     return connection._execute_ddl(self, multiparams, params)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 968, in _execute_ddl
db-worker-incoming-1_1  |     compiled
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context
db-worker-incoming-1_1  |     context)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception
db-worker-incoming-1_1  |     exc_info
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 202, in raise_from_cause
db-worker-incoming-1_1  |     reraise(type(exception), exception, tb=exc_tb, cause=cause)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 185, in reraise
db-worker-incoming-1_1  |     raise value.with_traceback(tb)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
db-worker-incoming-1_1  |     context)
db-worker-incoming-1_1  |   File "/usr/local/lib/python3.5/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
db-worker-incoming-1_1  |     cursor.execute(statement, parameters)
db-worker-incoming-1_1  | sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "pg_type_typname_nsp_index"
db-worker-incoming-1_1  | DETAIL:  Key (typname, typnamespace)=(queue_id_seq, 2200) already exists.
db-worker-incoming-1_1  |  [SQL: '\nCREATE TABLE queue (\n\tid SERIAL NOT NULL, \n\tpartition_id INTEGER, \n\tscore FLOAT, \n\turl VARCHAR(1024) NOT NULL, \n\tfingerprint VARCHAR(40) NOT NULL, \n\thost_crc32 BIGINT NOT NULL, \n\tmeta BYTEA, \n\theaders BYTEA, \n\tcookies BYTEA, \n\tmethod VARCHAR(6), \n\tcreated_at BIGINT, \n\tdepth SMALLINT, \n\tPRIMARY KEY (id)\n)\n\n']

lopuhin avatar Jun 18 '16 19:06 lopuhin

@lopuhin concurrent table creation is something we should avoid doing. Such a behavior isn't expected by both DB servers and clients. I recommend to redesign your application to avoid doing this.

Concurrent truncation of the table should work fine, btw.

sibiryakov avatar Jun 20 '16 07:06 sibiryakov

@sibiryakov thanks, I'll try to avoid concurrent table creation, that makes sense!

lopuhin avatar Jun 20 '16 07:06 lopuhin

I would like to test that behavior. What if we'll be testing our backends with SQLALCHEMYBACKEND_DROP_ALL_TABLES enabled?

sibiryakov avatar Jun 20 '16 11:06 sibiryakov

Yep, I'll add the tests (found them now).

lopuhin avatar Jun 20 '16 11:06 lopuhin