Pyrseas icon indicating copy to clipboard operation
Pyrseas copied to clipboard

crash in yamltodb when table is compared to a view

Open tbussmann opened this issue 8 years ago • 9 comments

given two databases t and v each containing a simple table or view both with the same name. dbtoyaml would produce the following yaml: Database t:

schema public:
  description: standard public schema
  owner: postgres
  privileges:
  - PUBLIC:
    - all
  - postgres:
    - all
  table test:
    columns:
    - i:
        type: integer
    owner: myuser

Database v:

schema public:
  description: standard public schema
  owner: postgres
  privileges:
  - PUBLIC:
    - all
  - postgres:
    - all
  view test:
    definition: " SELECT generate_series.generate_series\n   FROM generate_series(1,\
      \ 10) generate_series(generate_series);"
    owner: myuser

if you compare v to t or t to v yamltodb will crash:

$ dbtoyaml v | yamltodb t
Traceback (most recent call last):
  File "/usr/local/bin/yamltodb", line 9, in <module>
    load_entry_point('Pyrseas==0.8.dev0', 'console_scripts', 'yamltodb')()
  File "/Library/Python/2.7/site-packages/Pyrseas-0.8.dev0-py2.7.egg/pyrseas/yamltodb.py", line 47, in main
    stmts = db.diff_map(inmap)
  File "/Library/Python/2.7/site-packages/Pyrseas-0.8.dev0-py2.7.egg/pyrseas/database.py", line 363, in diff_map
    stmts.append(self.db.tables.diff_map(self.ndb.tables))
  File "/Library/Python/2.7/site-packages/Pyrseas-0.8.dev0-py2.7.egg/pyrseas/dbobject/table.py", line 989, in diff_map
    stmts.append(table.diff_map(intables[(sch, tbl)]))
  File "/Library/Python/2.7/site-packages/Pyrseas-0.8.dev0-py2.7.egg/pyrseas/dbobject/table.py", line 419, in diff_map
    raise KeyError("Table '%s' has no columns" % intable.name)
KeyError: "Table 'test' has no columns"

or

$ dbtoyaml t | yamltodb v
Traceback (most recent call last):
  File "/usr/local/bin/yamltodb", line 9, in <module>
    load_entry_point('Pyrseas==0.8.dev0', 'console_scripts', 'yamltodb')()
  File "/Library/Python/2.7/site-packages/Pyrseas-0.8.dev0-py2.7.egg/pyrseas/yamltodb.py", line 47, in main
    stmts = db.diff_map(inmap)
  File "/Library/Python/2.7/site-packages/Pyrseas-0.8.dev0-py2.7.egg/pyrseas/database.py", line 363, in diff_map
    stmts.append(self.db.tables.diff_map(self.ndb.tables))
  File "/Library/Python/2.7/site-packages/Pyrseas-0.8.dev0-py2.7.egg/pyrseas/dbobject/table.py", line 989, in diff_map
    stmts.append(table.diff_map(intables[(sch, tbl)]))
  File "/Library/Python/2.7/site-packages/Pyrseas-0.8.dev0-py2.7.egg/pyrseas/dbobject/table.py", line 563, in diff_map
    if self.definition != inview.definition:
AttributeError: 'Table' object has no attribute 'definition'

it seems yamltodb it is not detecting that test is of different type in the yaml compared to the database.

I've tested with 0.7.3 and the current master with same result.

tbussmann avatar May 16 '17 19:05 tbussmann

Confirmed on master (and also on in our in-progress deptrack branch). In master and r0.7, the problem is in the loop in pyrseas/dbobject/table.py ClassDict.diff_map after the comment # check database tables, sequences and views. The loop ought to differentiate between instances of Table and View at a minimum, before calling table.diff_map which due to polymorphism may end up in Table.diff_map or View.diff_map. In deptrack, the problem is in Database.diff_map in the loop where we call old.alter(new) but we may have to differentiate the proper instances even earlier.

jmafc avatar May 16 '17 20:05 jmafc

@tbussmann Since Postgres doesn't allow creation of a view test if a table with that name already exists, and vice versa, it seems that the only thing that yamltodb could do when dealing with this scenario would be to put out a nice error message. Agree?

jmafc avatar May 18 '17 13:05 jmafc

great you could already verify the root of the issue!

it seems that the only thing that yamltodb could do when dealing with this scenario would be to put out a nice error message. Agree?

actually, I'd rather expect yamltodb to emit a DROP TABLE test....; for the superfluous table and and CREATE VIEW test...; for the missing view (or vice versa, respectively)

tbussmann avatar May 18 '17 13:05 tbussmann

The problem with dropping the table is that we don't know for sure if it's superfluous, and it could have data and that would get lost. While I doubt that anyone runs yamltodb --update in a production environment without testing first, I'm uncomfortable at taking automatic actions when there's such a conflict. Dropping a view and creating a table is less dangerous since the former doesn't have data, it does depend on one or more other objects. @dvarrazzo @rhunwicks What do you guys think?

jmafc avatar May 18 '17 18:05 jmafc

@jmafc my take is that no-one in their right mind is going to run yamltodb on any database of value without doing a dry run first to see what's going to change.

On the other hand, I can see many use cases where I would want to replace an existing table with a view as part of a design refactor. For example, updating the data model while maintaining compatibility with an existing client application.

So I agree with @tbussmann that DROP TABLE followed by CREATE VIEW is the preferred behaviour.

rhunwicks avatar Sep 17 '17 14:09 rhunwicks

@rhunwicks I can see that when refactoring you'd want to retain a table's structure by creating a view. However, in such cases, it seems probable that you'd like to retain the table data first, e.g., (a) CREATE TABLE newtable AS SELECT * FROM oldtable, (b) DROP TABLE oldtable and (c) CREATE VIEW oldtable AS SELECT * FROM newtable. This is a feature that Pyrseas currently does not support, although it would be nice to do so eventually (see issue #138).

I don't know how people tend to use Pyrseas in that situation. Personally, because step (a) is usually more complex, I will create the newtable first by copying the table.oldtable.yaml and editing it, then using yamltodb to generate the (a) statement, pasting it into a conversion script, adding whatever insert/update statements are necessary to keep the desired data in newtable, adding step (b) to the conversion script, and then working on creating the view. However, since at the same application code changes are being made, the conversion script is what gets used and tested more, with yamltodb then serving only as a sanity check that the development/staging/production database matches the repository version.

jmafc avatar Sep 17 '17 19:09 jmafc

@jmafc I agree. My normal approach would be to update schema.yml to create the new tables (often by changing the tables in a dev database directly and then using dbtoyaml, then use yamltodb to extract the SQL necessary to make the actual change to the production database. That change, together with the SQL necessary to populate the new tables from the old ones goes into a Migration script that is run to actually apply the change. I would normally do that as one Migration and the update to replace the old table(s) with a view(s) as a separate Migration, so that we can sanity check the data in the new tables before we drop the old ones. As you suggest, the Migration scripts are used to actually apply the changes to the database with yamltodb being used to check that they worked correctly.

I think that you always need a Migration approach because there are some common changes that always need associated INSERT or UPDATE statements. For example, adding a new non-null column to an existing table. Unless the value for the existing rows is trivial, the easiest way is normally to add the column as nullable, run an UPDATE to set the value based on other existing data, and then ALTER the column to be NOT NULL. This is easy to achieve if you have a Migration framework and much harder with a pure diff approach.

rhunwicks avatar Sep 18 '17 06:09 rhunwicks

@tbussmann Please read https://pyrseas.wordpress.com/2018/09/12/the-future-of-pyrseas-revisited/ . I'm afraid this is an issue that will be closed without action unless you or someone can make a convincing case for what the most appropriate behavior may be for handling both cases: having a database table named X and the YAML input has a view (regular or materialized) named X, and having a database view or matview named X and the YAML input has a table named X. At a minimum, this would add a layer of complexity since the coincidence of names is insufficient to take categoric action in the database (i.e., the respective columns would have to be cross-checked as to name and datatype). As I said before, I think this only merits catching the exception and putting an explanatory error message.

jmafc avatar Sep 14 '18 01:09 jmafc

I've accidently discovered a more pernicious situation in my own work. Let's say you have a table ts with primary key name ts_pkey. You want to create a new table t with primary key t_pkey and a view ts that selects from t. Because you're using multiple files, your metadata directory still has around the definition for table.ts.yaml. yamltodb gets very confused because it reads both the view ts from the catalogs but the YAML metadata has a table object by that name. The solution is to remove the table.ts.yaml file but I think this issue requires a second look based on this scenario.

jmafc avatar Nov 20 '18 01:11 jmafc