pyani
pyani copied to clipboard
Use `alembic` to manage SQLite schema changes?
Summary:
Use alembic
or similar to manage database migration.
Description:
In discussions with users, modifications to the SQLite database schema have caused issues where updating their v0.3 installation has led to incompatibility with an existing database, and resulting errors.
We could potentially avoid this by carefully versioning the SQLite/SQLAlchemy schema and using alembic
or similar to aid migration between versions in most cases.
If users maintain large analyses in their databases, we should avoid discouraging them from trusting the package by breaking backwards-compatibility.
Having used this before with Flask applications, I'm confident we could usually achieve seamless in-place upgrades of users' existing local databases.
Can you share any of these discussions with users? That would give me a more concrete idea of the problem this solves. I imagine it's the same issue I had between the fastANI
branch and others, though.
You're the user with the most experience of these incompatibilities, for sure ;)
We have talked about using alembic
for this before, and it's good practice in any case.
I don't have much concrete, but here's a related screenshot as I helped Angelika understand her issue with the update:
I've been playing with this, and there are a few things I don't seem to have figured out, yet.
- revision naming, and specifying a specific revision from the command line, vs just saying
alembic upgrade head
- where migration files need to exist in a directory hierarchy; do they go in
multidb
? - allowing multiple databases; I think I am halfway towards this, but something seems incorrect
- how much of Q&As that talk about Flask is going to be relevant? I'm seeing lots of Flask stuff, despite not searching for it
BUT, I think I have a basic migration script written for the old/new versions of the database (pre-/post- fastANI addition).
I have set up the basic requirements for a new versiondb
subcommand, and also managed to get alembic
working correctly locally. I'm now working on tying the functionality into pyani
in a sensible way.
I was going to add a tag in the repo, prior to the merge of fastANI
, to capture the database changes, but it looks like tags are directly related to releases, and doing so might be confusing, since the database is only a factor in version 3, and all of the existing releases are version 2.
Actually, I have found an issue with the downgrade()
function I am testing, which should remove the columns that are added to support fastANI
.
The function I am running is written in accordance with examples I've seen:
def downgrade_old_db():
with op.batch_alter_table("comparisons") as batch_op:
batch_op.drop_column('kmersize')
batch_op.drop_column('minmatch')
but nothing happens—note the lack of a 'Running downgrade ...' message here:
(pyani_dev) ! alembic downgrade base
INFO [alembic.env] Migrating database old_db
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
Many sources on the internet claim that this is because sqlite
does not support the DROP COLUMN
functionality of ALTER TABLE
, but the documentation for this indicates otherwise. It appears this change was made sometime prior to July 2021, though I have not been able to pinpoint exactly when.
I have, however, confirmed that I can drop columns within my sqlite
installation, so I'm puzzled as to why alembic
is not managing to do so.
sqlite> .schema comparisons
CREATE TABLE comparisons (
comparison_id INTEGER NOT NULL,
query_id INTEGER NOT NULL,
subject_id INTEGER NOT NULL,
aln_length INTEGER,
sim_errs INTEGER,
identity FLOAT,
cov_query FLOAT,
cov_subject FLOAT,
program VARCHAR,
version VARCHAR,
fragsize INTEGER,
maxmatch BOOLEAN, kmersize INTEGER, minmatch FLOAT,
PRIMARY KEY (comparison_id),
UNIQUE (query_id, subject_id, program, version, fragsize, maxmatch),
FOREIGN KEY(query_id) REFERENCES genomes (genome_id),
FOREIGN KEY(subject_id) REFERENCES genomes (genome_id)
);
sqlite> alter table comparisons drop kmersize;
sqlite> .schema comparisons
CREATE TABLE comparisons (
comparison_id INTEGER NOT NULL,
query_id INTEGER NOT NULL,
subject_id INTEGER NOT NULL,
aln_length INTEGER,
sim_errs INTEGER,
identity FLOAT,
cov_query FLOAT,
cov_subject FLOAT,
program VARCHAR,
version VARCHAR,
fragsize INTEGER,
maxmatch BOOLEAN, minmatch FLOAT,
PRIMARY KEY (comparison_id),
UNIQUE (query_id, subject_id, program, version, fragsize, maxmatch),
FOREIGN KEY(query_id) REFERENCES genomes (genome_id),
FOREIGN KEY(subject_id) REFERENCES genomes (genome_id)
);
sqlite>
it looks like tags are directly related to releases, and doing so might be confusing, since the database is only a factor in version 3, and all of the existing releases are version 2.
IIRC, git
tags are just labels (though really, only lightweight
labels are just labels - annotated
tags are proper git
objects; both point to specific commits) - but GitHub chooses to interpret tags in a specific way. Each tag is understood to be a release point, but the release point only becomes an actual release if Release Notes
are added. GitHub's approach to this is not especially general, IMO.
Actually, I have found an issue with the
downgrade()
function I am testing, which should remove the columns that are added to supportfastANI
.The function I am running is written in accordance with examples I've seen:
def downgrade_old_db(): with op.batch_alter_table("comparisons") as batch_op: batch_op.drop_column('kmersize') batch_op.drop_column('minmatch')
but nothing happens—note the lack of a 'Running downgrade ...' message here:
(pyani_dev) ! alembic downgrade base INFO [alembic.env] Migrating database old_db INFO [alembic.runtime.migration] Context impl SQLiteImpl. INFO [alembic.runtime.migration] Will assume non-transactional DDL.
Does alembic provide no error message or other useful information (and appear to have "worked" without DROP
ping the column)?
A quick search turned up at least one workaround someone implemented, but we shouldn't have to do that.
I've seen two mentions of a render_as_batch
option which, I think, allows alembic
to batch copy the table, modify it accordingly, DROP
the old one, and rename the new one:
Miguel Grinberg has a blog post outlining a possible solution.
I expect you'll have seen and tried all of these, mind.
Does alembic provide no error message or other useful information (and appear to have "worked" without DROPping the column)?
By default it does not produce any error message, and I have tried including the --raiseerr
optional flag to get a full traceback, but it still doesn't seem to output anything.
I was attempting to verify that the issue was with drop table, and not just downgrading specifically, by creating a second revision so I had more things to step through. However, now I am stuck in a loop where alembic
claims the database is not up-to-date, but the fastani
columns exist. The issue seems to be it's not populating the alembic_version
table now, so the upgrade fails partway through, but I don't know why. I think I saw something about this the other day, but haven't managed to find it again.
I think this issue started when I manually deleted columns, to prove I could, though I've since recreated the database. Perhaps something is just not hooked up quite right....
I seem to have hacked my way through a series of arcane errors I could find little information about to something that works.
I now have two migration scripts, that I am able to use to both upgrade and downgrade a database reliably. The differences from what I had before are that I did not use the multidb
template this time, as it did not seem necessary after further reflection, and I have had to switch from performing the batch operations within a context:
with op.batch_alter_table("comparisons") as batch_op:
batch_op.add_column(sa.Column('kmersize', sa.Integer))
batch_op.add_column(sa.Column('minmatch', sa.Float))
to doing so outside of one:
op.add_column('comparisons', sa.Column('kmersize', sa.Integer))
op.add_column('comparisons', sa.Column('minmatch', sa.Float))
This change was necessary because I got the following error, and nothing else both prevents the error, and still performs the changes. I have not been able to figure out why this caused an error when, for instance, I successfully used the former format yesterday.
Traceback (most recent call last):
File "/Users/baileythegreen/Software/miniconda3/bin/alembic", line 8, in <module>
sys.exit(main())
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/site-packages/alembic/config.py", line 588, in main
CommandLine(prog=prog).main(argv=argv)
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/site-packages/alembic/config.py", line 582, in main
self.run_cmd(cfg, options)
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/site-packages/alembic/config.py", line 559, in run_cmd
fn(
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/site-packages/alembic/command.py", line 320, in upgrade
script.run_env()
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/site-packages/alembic/script/base.py", line 563, in run_env
util.load_python_file(self.dir, "env.py")
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/site-packages/alembic/util/pyfiles.py", line 92, in load_python_file
module = load_module_py(module_id, path)
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/site-packages/alembic/util/pyfiles.py", line 108, in load_module_py
spec.loader.exec_module(module) # type: ignore
File "<frozen importlib._bootstrap_external>", line 848, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/Users/baileythegreen/Software/pyani/scratch/alembic/env.py", line 72, in <module>
run_migrations_online()
File "/Users/baileythegreen/Software/pyani/scratch/alembic/env.py", line 66, in run_migrations_online
context.run_migrations()
File "<string>", line 8, in run_migrations
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/site-packages/alembic/runtime/environment.py", line 851, in run_migrations
self.get_context().run_migrations(**kw)
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/site-packages/alembic/runtime/migration.py", line 620, in run_migrations
step.migration_fn(**kw)
File "/Users/baileythegreen/Software/pyani/scratch/alembic/versions/65538af5a5e1_add_nonsense_column.py", line 21, in upgrade
batch_op.add_column(sa.Column('nonsense', sa.Integer))
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/contextlib.py", line 120, in __exit__
next(self.gen)
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/site-packages/alembic/operations/base.py", line 374, in batch_alter_table
impl.flush()
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/site-packages/alembic/operations/batch.py", line 101, in flush
should_recreate = self._should_recreate()
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/site-packages/alembic/operations/batch.py", line 94, in _should_recreate
return self.operations.impl.requires_recreate_in_batch(self)
File "/Users/baileythegreen/Software/miniconda3/lib/python3.8/site-packages/alembic/ddl/sqlite.py", line 56, in requires_recreate_in_batch
and col.server_default.persisted
AttributeError: 'NoneType' object has no attribute 'persisted'
I have no idea how this manages to get around the issue I saw before when nothing seemed to happen on downgrade.
Does this seem like the right way to incorporate alembic
with pyani
? I know less abut the nuance of packaging things in Python.
https://github.com/sqlalchemy/alembic/issues/463#issuecomment-441887412
I'm not familiar enough with alembic
to know whether it's the "correct" way to proceed. It looks to have been part of alembic
since 1.2.0, but isn't explicitly documented or in the tutoral AFAICT. So I presume it's stable, but maybe a niche operation that doesn't get stress-tested a lot.
What I get from the thread linked above seems sensible. If it does what we need, doesn't break anything else, and doesn't impose any outrageous requirements, that's good enough for me.
I meant more from a Python packaging standpoint. All this seems to do is create two empty __init__.py
files—one in the alembic
(or other) directory specified as the location for alembic
's stuff; the other in the versions
subdirectory of the former. This seems unlikely to break anything, but may or may not be necessary.
I think the files that need to be included/given to users, and the correct structure for them is:
- alembic.ini
- alembic/
- __init__.py
- env.py
- README # contains a single line stating what alembic template was used; may not be useful
- script.py.mako # this may not be necessary; it is a template for creating new migration files
- versions/
- __init__.py
- <revision_file>
- <revision_file>
I have a subcommand working that will run upgrades and downgrades, and capture the output/errors from those. The main questions now are:
-
How should these files be included/given to the user?
- Including them at the top level seems sensible to me, but for users who install via
conda
, I don't know if something else would be better.
- Including them at the top level seems sensible to me, but for users who install via
-
Is there a way to set the
sqlalchemy.url
variable programmatically?- This is currently the only change that has to be made in the
alembic.ini
file in order for things to work, but it needs to match whatever database the user is using; for distribution we could use the default database name, but I would like to find a way to pull it from the command line arguments, or something. Ideally, without usingsed
, or a vimscript, to edit it.
- This is currently the only change that has to be made in the
All this seems to do is create two empty
__init__.py
files—one in thealembic
(or other) directory specified as the location foralembic
's stuff; the other in theversions
subdirectory of the former.
That doesn't seem unreasonable but, as I understand it, the __init__.py
may no longer be necessary with namespace packages. As pyani
isn't supported for Python versions that don't support namespace packaging, we'd likely be OK without it.
How should these files be included/given to the user? Including them at the top level seems sensible to me, but for users who install via
conda
, I don't know if something else would be better.
I imagined that there would be a way to present only the pyani upgrade
/pyani downgrade
command to the user and have it work. Is that not possible?
Is there a way to set the
sqlalchemy.url
variable programmatically?
I don't really know what this refers to, or should point to. I would have assumed the up/downgrade could run with a default --dbpath
argument like the other commands, or take a passed path.
Is this issue relevant? https://github.com/sqlalchemy/alembic/issues/606
For alembic
to work, it needs a config file: alembic.ini
, which contains the database name/driver info in the form of an sqlalchemy.url
variable, and the location of the migration scripts as another variable. The sqlalchemy.url
variable is pulled from the config file when alembic
's env.py
is run; I am not yet sure where the script location variable is retrieved. There does not seem to be a general way to pass these values inside an alembic
command so that I can just pull them from our own command line options. It is possible to give alembic
a database name this way, but I believe this only works in cases where the config file uses the multidb
template, and allows you to selectively upgrade/downgrade a single database.
I imagined that there would be a way to present only the
pyani upgrade
/pyani downgrade
command to the user and have it work. Is that not possible?
Right now, pyani versiondb [--upgrade <revision>]
and pyani versiondb --downgrade <revision>
work, but the location of the database, and the driver used, as well as the location of the migration scripts still have to be specified somewhere.
This is what I mean by specifying the driver & database location:
Valid SQLite URL forms are:
sqlite:///:memory: (or, sqlite://)
sqlite:///relative/path/to/file.db
sqlite:////absolute/path/to/file.db
The sqlite
part should be fine, but the path is obviously something we can't predict, and the only way I currently know to get the value of something like --dbpath
to the right bit of code is through an environment variable, which I would have to set, because I don't think alembic
's env.py
will have access to our namespace variables.
Is this issue relevant? sqlalchemy/alembic#606
This looks like they're using an environmental variable, so maybe. If you think that's an acceptable way forward. I still need to do some digging to see where script location stuff is used.
Examples of my current alembic.ini
and env.py
. The only modifications I have made are to the sqlalchemy.url
line(s) in the first one.
alembic.ini
:
# A generic, single database configuration.
[alembic]
# path to migration scripts
script_location = alembic
# template used to generate migration files
# file_template = %%(rev)s_%%(slug)s
# sys.path path, will be prepended to sys.path if present.
# defaults to the current working directory.
prepend_sys_path = .
# timezone to use when rendering the date within the migration file
# as well as the filename.
# If specified, requires the python-dateutil library that can be
# installed by adding `alembic[tz]` to the pip requirements
# string value is passed to dateutil.tz.gettz()
# leave blank for localtime
# timezone =
# max length of characters to apply to the
# "slug" field
# truncate_slug_length = 40
# set to 'true' to run the environment during
# the 'revision' command, regardless of autogenerate
# revision_environment = false
# set to 'true' to allow .pyc and .pyo files without
# a source .py file to be detected as revisions in the
# versions/ directory
# sourceless = false
# version location specification; This defaults
# to alembic/versions. When using multiple version
# directories, initial revisions must be specified with --version-path.
# The path separator used here should be the separator specified by "version_path_separator"
# version_locations = %(here)s/bar:%(here)s/bat:alembic/versions
# version path separator; As mentioned above, this is the character used to split
# version_locations. Valid values are:
#
# version_path_separator = :
# version_path_separator = ;
# version_path_separator = space
version_path_separator = os # default: use os.pathsep
# the output encoding used when revision files
# are written from script.py.mako
# output_encoding = utf-8
# sqlalchemy.url = sqlite:////Users/baileythegreen/Software/pyani/scratch/alembic/old_db
sqlalchemy.url = sqlite:///pyani/scratch/alembic/old_db
[post_write_hooks]
# post_write_hooks defines scripts or Python functions that are run
# on newly generated revision scripts. See the documentation for further
# detail and examples
# format using "black" - use the console_scripts runner, against the "black" entrypoint
# hooks = black
# black.type = console_scripts
# black.entrypoint = black
# black.options = -l 79 REVISION_SCRIPT_FILENAME
# Logging configuration
[loggers]
keys = root,sqlalchemy,alembic
[handlers]
keys = console
[formatters]
keys = generic
[logger_root]
level = WARN
handlers = console
qualname =
[logger_sqlalchemy]
level = WARN
handlers =
qualname = sqlalchemy.engine
[logger_alembic]
level = INFO
handlers =
qualname = alembic
[handler_console]
class = StreamHandler
args = (sys.stderr,)
level = NOTSET
formatter = generic
[formatter_generic]
format = %(levelname)-5.5s [%(name)s] %(message)s
datefmt = %H:%M:%S
and env.py
:
from logging.config import fileConfig
from sqlalchemy import engine_from_config
from sqlalchemy import pool
from alembic import context
# this is the Alembic Config object, which provides
# access to the values within the .ini file in use.
config = context.config
# Interpret the config file for Python logging.
# This line sets up loggers basically.
fileConfig(config.config_file_name)
# add your model's MetaData object here
# for 'autogenerate' support
# from myapp import mymodel
# target_metadata = mymodel.Base.metadata
target_metadata = None
# other values from the config, defined by the needs of env.py,
# can be acquired:
# my_important_option = config.get_main_option("my_important_option")
# ... etc.
def run_migrations_offline():
"""Run migrations in 'offline' mode.
This configures the context with just a URL
and not an Engine, though an Engine is acceptable
here as well. By skipping the Engine creation
we don't even need a DBAPI to be available.
Calls to context.execute() here emit the given string to the
script output.
"""
url = config.get_main_option("sqlalchemy.url")
context.configure(
url=url,
target_metadata=target_metadata,
literal_binds=True,
dialect_opts={"paramstyle": "named"},
)
with context.begin_transaction():
context.run_migrations()
def run_migrations_online():
"""Run migrations in 'online' mode.
In this scenario we need to create an Engine
and associate a connection with the context.
"""
connectable = engine_from_config(
config.get_section(config.config_ini_section),
prefix="sqlalchemy.",
poolclass=pool.NullPool,
)
with connectable.connect() as connection:
context.configure(
connection=connection, target_metadata=target_metadata
)
with context.begin_transaction():
context.run_migrations()
if context.is_offline_mode():
run_migrations_offline()
else:
run_migrations_online()
This looks like they're using an environmental variable, so maybe. If you think that's an acceptable way forward. I still need to do some digging to see where script location stuff is used.
Environmental variables are fair game. os.environ["PYANI_DB_PATH"]
could be set pretty straightforwardly post-CLI parsing, I think.
When populating, pathlib
's Path
should allow identification of the absolute path of whatever the CLI picks up for --dbpath
on the current system so that you don't need to worry about relative paths in the sqlalchemy
URL.
Okay, good to know. I'll see if I can get that working. I have been waiting to push stuff because technically right now it won't run; without those values, it's going to throw an error.
I'm looking forward to seeing how you manage the tests for this, too.
That's what I'll be looking at after I try this stuff with environmental variables. I have ideas for what I'll do with tests.
The issue described here has been resolved (and I felt really silly once it was). Skip to here if you want to bypass irrelevant questions/discussion.
I have thus far failed to get the environment variable plan working. I have managed to set them, and I have the syntax for retrieving them correct. However, I am unable to get their values assigned to the alembic
config values, and I don't know why.
The env.py
file seems to be a black box; I can't get print()
statements to work, which makes it difficult to debug what's happening. If I don't have an alembic.ini
file present in the directory, I get an error related to that; if I have one, but comment out the lines setting the two variables I need to modify, I get a different error; and whatever I supply there is not overwritten when I reset it in env.py
.
An example of how I'm trying to reset them, where $DATABASE is the value of the --dbpath
variable.
dbpath = os.environ.get('DATABASE')
url = "sqlite:////Users/baileythegreen/Software/pyani/" + dbpath
config.set_main_option("sqlalchemy.url", url)
From what I understand, this is the correct API method for what I want to do (eliminate the .ini
file, but I have yet to get it to work.
It's hard to comment on the first part without seeing the error message or practically taking a look.
In the second part, this line:
url = "sqlite:////Users/baileythegreen/Software/pyani/" + dbpath
looks suspect, to me. However, it's not clear what's in the DATABASE
environment variable that gets read into dbpath
.
The required format, described in an earlier post, is essentially:
sqlite:////absolute/path/to/file.db
It's not clear to me that your string, concatenated with dbpath
, satisfies that. It looks like the hard-coded path to your pyani
directory might also cause issues.
Have you tried:
- using
args.dbpath
from the CLI, get the absolute path withargs.dbpath.resolve()
- setting a suitably-named environment variable with the result of
args.dbpath.resolve()
(and I'd suggest having aPYANI_
prefix so that related environment variables are effectively namespaced. - setting
url = f"sqlite:///{os.environ.get('PYANI_DBPATH'}"
and what was/is the result?
FEATURE REQUEST
alembic
does upgrades/downgrades in-place. It would be useful, I think, to avoid borking an existing database by these updates, so we could backup the database before we start. For instance:
pyanidb
could be copied to
pyanidb.YYYY-MM-DD_HH-MM-SS.bak
before the in-place modification starts.
To resummarise where this is (will potentially be expanded as I think of other things to add):
(Branch with code found here.)
- The issues with editing
env.py
not working have been resolved; the problem was that an incorrectenv.py
was edited. - The
pyani versiondb
command creates an environment variable that are used to set values for the database name, overriding the value specified in thealembic.ini
file. [^1] - An environment variable is also used for the script location, but this is set to
alembic
, rather than being user-determined. [^2] - The
--package
option creates two__init__.py
files; one in the directory specified toalembic
(defaultalembic/
), one in the subdirectoryalembic/versions/
. [^3] - A step has been added to create a backup of the database being migrated, as described in the previous comment.
- Additional options (
--name
,--config
) have been created to allow for specifying a specific database name (in cases of amultidb
alembic
setup) or and config file not in the default location (./alembic.ini
).
[^1]: This prevents the need to predict/dictate the database name.
[^2]: This allows us to provide such a directory containing the necessary migration scripts as part of a pyani
install.
[^3]: I have decided not to use this option as it creates an issue for devs, wherein running import alembic
in the parent directory of a project that contains an alembic/__init__.py
file, attempts to import what it sees as a local module named alembic
, rather than the actual database version control package.
CLI help information for pyani versiondb
! pyani versiondb -h
usage: pyani versiondb [-h] [-l LOGFILE] [-v] [--debug] [--disable_tqdm] [--version] [--citation]
[--dbpath DBPATH]
(--upgrade [VERSION] | --downgrade [VERSION] | --dry-run [START:END])
[--alembic_exe ALEMBIC_EXE] [-n NAME] [-c FILE]
One of --upgrade, --downgrade, or --dry-run must be specified.
optional arguments:
-h, --help show this help message and exit
-l LOGFILE, --logfile LOGFILE
logfile location (default: None)
-v, --verbose report verbose progress to log (default: False)
--debug report debug messages to log (default: False)
--disable_tqdm Turn off tqdm progress bar (default: False)
--version
--citation
--dbpath DBPATH path to pyani database (default: .pyani/pyanidb)
--upgrade [VERSION] update an existing database to a newer schema; if no argument is given,
'head' will be used (default: None)
--downgrade [VERSION]
revert an existing database to a older schema; if no argument is given,
'base' will be used (default: None)
--dry-run [START:END]
produce the SQL that would be run in migrations, without altering the
database; start and end versions must be specified (default: None)
--alembic_exe ALEMBIC_EXE
path to alembic executable (default: alembic)
-n NAME, --name NAME used to specify an individual database in a multidb setup (default: None)
-c FILE, --config FILE
used to specify a config file for alembic (default: None)
I have chased down the issue underlying something weird I was seeing, which somewhat derailed my writing tests for pyani versiondb
.
Weird thing I noticed
If I took a copy of an old database (pre-fastANI
changes to the ORM), my code was able to upgrade/downgrade this without issue; but if I took a new one (post-fastANI
changes to the ORM), I get the following traceback:
Traceback (most recent call last):
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1802, in _execute_context
self.dialect.do_execute(
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
cursor.execute(statement, parameters)
sqlite3.OperationalError: error in table comparisons after drop column: no such column: kmersize
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/bin/alembic", line 8, in <module>
sys.exit(main())
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/config.py", line 588, in main
CommandLine(prog=prog).main(argv=argv)
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/config.py", line 582, in main
self.run_cmd(cfg, options)
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/config.py", line 559, in run_cmd
fn(
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/command.py", line 366, in downgrade
script.run_env()
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/script/base.py", line 563, in run_env
util.load_python_file(self.dir, "env.py")
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/util/pyfiles.py", line 92, in load_python_file
module = load_module_py(module_id, path)
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/util/pyfiles.py", line 108, in load_module_py
spec.loader.exec_module(module) # type: ignore
File "<frozen importlib._bootstrap_external>", line 843, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "alembic/env.py", line 104, in <module>
run_migrations_online()
File "alembic/env.py", line 79, in run_migrations_online
context.run_migrations()
File "<string>", line 8, in run_migrations
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/runtime/environment.py", line 851, in run_migrations
self.get_context().run_migrations(**kw)
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/runtime/migration.py", line 620, in run_migrations
step.migration_fn(**kw)
File "/Users/baileythegreen/Software/pyani/alembic/versions/92f7f6b1626e_add_fastani_columns.py", line 28, in downgrade
op.drop_column("comparisons", "kmersize")
File "<string>", line 8, in drop_column
File "<string>", line 3, in drop_column
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/operations/ops.py", line 2189, in drop_column
return operations.invoke(op)
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/operations/base.py", line 392, in invoke
return fn(self, operation)
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/operations/toimpl.py", line 89, in drop_column
operations.impl.drop_column(
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/ddl/impl.py", line 333, in drop_column
self._exec(base.DropColumn(table_name, column, schema=schema))
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/alembic/ddl/impl.py", line 197, in _exec
return conn.execute(construct, multiparams)
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1289, in execute
return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/sqlalchemy/sql/ddl.py", line 80, in _execute_on_connection
return connection._execute_ddl(
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1381, in _execute_ddl
ret = self._execute_context(
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1845, in _execute_context
self._handle_dbapi_exception(
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2026, in _handle_dbapi_exception
util.raise_(
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 207, in raise_
raise exception
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1802, in _execute_context
self.dialect.do_execute(
File "/Users/baileythegreen/Software/miniconda3/envs/pyani_dev/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) error in table comparisons after drop column: no such column: kmersize
[SQL: ALTER TABLE comparisons DROP COLUMN kmersize]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
This puzzled me, because the column is clearly present in the database.
Upon further inspection, it seems to be a difference in the unique constraints specified in the updated ORM, versus those achieved by my migration script to update the old database.
Schema for comparisons
table in old database
sqlite> .schema comparisons
CREATE TABLE comparisons (
comparison_id INTEGER NOT NULL,
query_id INTEGER NOT NULL,
subject_id INTEGER NOT NULL,
aln_length INTEGER,
sim_errs INTEGER,
identity FLOAT,
cov_query FLOAT,
cov_subject FLOAT,
program VARCHAR,
version VARCHAR,
fragsize INTEGER,
maxmatch BOOLEAN,
PRIMARY KEY (comparison_id),
UNIQUE (query_id, subject_id, program, version, fragsize, maxmatch),
FOREIGN KEY(query_id) REFERENCES genomes (genome_id),
FOREIGN KEY(subject_id) REFERENCES genomes (genome_id)
);
Schema for comparisons
table in old database following upgrade
sqlite> .schema comparisons
CREATE TABLE comparisons (
comparison_id INTEGER NOT NULL,
query_id INTEGER NOT NULL,
subject_id INTEGER NOT NULL,
aln_length INTEGER,
sim_errs INTEGER,
identity FLOAT,
cov_query FLOAT,
cov_subject FLOAT,
program VARCHAR,
version VARCHAR,
fragsize INTEGER,
maxmatch BOOLEAN, kmersize INTEGER, minmatch FLOAT, nonsense INTEGER,
PRIMARY KEY (comparison_id),
UNIQUE (query_id, subject_id, program, version, fragsize, maxmatch),
FOREIGN KEY(query_id) REFERENCES genomes (genome_id),
FOREIGN KEY(subject_id) REFERENCES genomes (genome_id)
);
Schema for comparisons
table in new database
CREATE TABLE comparisons (
comparison_id INTEGER NOT NULL,
query_id INTEGER NOT NULL,
subject_id INTEGER NOT NULL,
aln_length INTEGER,
sim_errs INTEGER,
identity FLOAT,
cov_query FLOAT,
cov_subject FLOAT,
program VARCHAR,
version VARCHAR,
fragsize INTEGER,
maxmatch BOOLEAN,
kmersize INTEGER,
minmatch FLOAT,
PRIMARY KEY (comparison_id),
UNIQUE (query_id, subject_id, program, version, fragsize, maxmatch, kmersize, minmatch),
FOREIGN KEY(query_id) REFERENCES genomes (genome_id),
FOREIGN KEY(subject_id) REFERENCES genomes (genome_id)
);
sigh
Apparently dropping unique constraints is one of the things SQLite doesn't like to do, so this migration just got a lot harder.
I have also been thinking about the utility of the downgrade option, and have been led to question it. If someone has a new database that contains any fastANI
runs, removal of the kmersize
and minmatch
columns might mean some entries are non-unique, which is problematic. This could be handled by just not supporting downgrading; or we could throw a warning about this; or remove any fastANI
entries (and notify the user).
We're creating a backup, so none of these would be catastrophic, but one may be preferable.
The above issue (re: constraints) seems to be solved using the contextual version of the batch_alter_table()
code mentioned earlier in this thread (https://github.com/widdowquinn/pyani/issues/378#issuecomment-1032081759). And the issue mentioned in that comment seems to have disappeared? I still don't know what that was about.
tl;dr; I have the database successfully upgrading and downgrading, new and old databases alike.
Now I just need to decide whether warnings are needed for downgrading, et cetera, and to finish my tests.
I have also been thinking about the utility of the downgrade option, and have been led to question it. If someone has a new database that contains any
fastANI
runs, removal of thekmersize
andminmatch
columns might mean some entries are non-unique, which is problematic. This could be handled by just not supporting downgrading; or we could throw a warning about this; or remove anyfastANI
entries (and notify the user).We're creating a backup, so none of these would be catastrophic, but one may be preferable.
My first thought is that we should aim to preserve the necessary metadata for a run, which includes - for fastANI
- the kmersize
/minmatch
columns. If we cannot do so, we should loudly (with a warning) flag the problem to the user, and remove those runs from the database, with the argument that the backtransfer of fastANI
runs is not compatible with the old schema.
I think we are justified as we backup, and do not destroy, the original, late-format database.
I would see downgrading the database from fastANI
to not fastANI
support as being an unusual operation, but that we might expect more up/downgrading with fitire use.