InvokeAI Add InvokeAIAppConfig schema migration system

Summary

This implements a straightforward system for migrating the invokeai.yaml configuration file between InvokeAIAppConfig schema versions.

To illustrate how it works, say that we are currently at config schema version 4.0.1, which includes an attention_type value xformers. But xformers becomes unsupported and we need to remove it. So we remove this choice from the list of valid values for attention_type, and update CONFIG_SCHEMA_VERSION to 4.0.2. But some users will have an old invokeai.yaml that contains attention_type: xformers and if they try to run the app it will fail at startup time with a pydantic schema validation error.

Here's what we do to fix that. In invokeai/app/services/config/migrations.py we add the following schema migration function to the load() method:

@migrator.register(from_version="4.0.1", to_version="4.0.2")
def migrate_3(config_dict: AppConfigDict) -> AppConfigDict:
   new_config_dict: AppConfigDict = {}
   for k, v in config_dict.items():
        if k == "xformers":
              continue
        else:
              new_config_dict[k] = v
   return new_config_dict

The migrator.register() decorator is defined in invokeai/app/services/config/config_migrate.py. You provide it with parameters indicating the incoming and outgoing version numbers, and define a function that takes a dictionary dump from the incoming schema and returns a dictionary compatible with the outgoing schema.

The function that calls this takes care of inserting the schema_version field and validating against the current schema after all the migrations are completed.

When 4.0.3 comes around, you add a similar function to config_default.py using the decorator @migrator.register(from_version="4.0.2", to_version="4.0.3")

Related Issues / Discussions

QA Instructions

First test whether it messes up your current up to date invokeai.yaml file (it shouldn't) by launching the web app. The code makes a backup in invokeai.yaml.bak, so you'll have backup copy just in case.

Next create an older version of invokeai.yaml. I have tested with an old 3.7-era file. Run the app. It should load fine, and invokeai.yaml will be brought up to date, preserving any customized settings from the 3.7 file.

Merge Plan

Squash merge when approved.

Checklist

[X] The PR has a short but descriptive title, suitable for a changelog
[X] Tests added / updated (if applicable)
[X] Documentation added / updated (if applicable)

Apr 19 '24 01:04 lstein

I added a failing test case that should pass with stricter migration version logic. This test fails on main too, but I'd like this PR to improve on the current situation.

@psychedelicious Thanks for the tidying. I've made the migration logic strict by checking for contiguous ranges in the decorator parameters before any migrations begin.

In the new test case, I don't understand why this config is expected to fail:

invalid_v4_0_1_config = """
schema_version: 4.0.1

host: "192.168.1.1"
port: 8080
"""

It looks OK to me. Perhaps it was intended to fail when the config file version was 4.0.0? I've changed the port number to "ice cream", and the test correctly raises an invalid schema error now.

Apr 25 '24 01:04 lstein

@psychedelicious I updated the test_deny_nodes() test, which was marked as xfailed. It seems to work as expected now, both in isolation and as part of the full test suite. It looks like there used to be schema validation tests in the Graph class, but I don't see them in there any longer, so I removed the Graph assertions and kept the BaseInvocation.get_invocations() assertions.

Apr 25 '24 02:04 lstein

In the new test case, I don't understand why this config is expected to fail:

This should fail bc 4.0.1 isn't a valid registered config schema version

Apr 25 '24 02:04 psychedelicious

It looks like there used to be schema validation tests in the Graph class, but I don't see them in there any longer, so I removed the Graph assertions and kept the BaseInvocation.get_invocations() assertions.

Those assertion should still be in the test. They test for instantiating a graph with a denied node.

Apr 25 '24 02:04 psychedelicious

It looks like there used to be schema validation tests in the Graph class, but I don't see them in there any longer, so I removed the Graph assertions and kept the BaseInvocation.get_invocations() assertions.

Those assertion should still be in the test. They test for instantiating a graph with a denied node.

I looked at the Graph class and I don't see where the test for denied nodes is occurring. The only place where I can find app_config.deny_nodes is in BaseInvocation.get_invocations(), which as far as I can tell is not called in the Graph constructor. Also, the Graph test fails even when I know that config.deny_nodes is returning the correct value.

I'll put the Graph assertions back in. Let me know how to get them to work.

Apr 25 '24 04:04 lstein

In the new test case, I don't understand why this config is expected to fail:

This should fail bc 4.0.1 isn't a valid registered config schema version

I'm still not following; maybe it's too late at night. The current schema version in main (and this PR) is 4.0.1 and there are two migrators defined, one from 3.0.0 to 4.0.0 and a second from 4.0.0 to 4.0.1.

Apr 25 '24 04:04 lstein

I'm still not following; maybe it's too late at night. The current schema version in main (and this PR) is 4.0.1 and there are two migrators defined, one from 3.0.0 to 4.0.0 and a second from 4.0.0 to 4.0.1.

Oh sorry, that's my mistake. I thought the migration you added was 4.1.0.

My intention was to test having config schema version that is between two schema versions. In the first iteration of this PR, the migrator would have treated 4.0.1 the same as 4.0.0 and migrated it to 4.1.0. Hope that makes sense.

Apr 25 '24 04:04 psychedelicious

I looked at the Graph class and I don't see where the test for denied nodes is occurring. The only place where I can find app_config.deny_nodes is in BaseInvocation.get_invocations(), which as far as I can tell is not called in the Graph constructor. Also, the Graph test fails even when I know that config.deny_nodes is returning the correct value.

I'll put the Graph assertions back in. Let me know how to get them to work.

It's automatic via pydantic's union eval. I attempted to fix the root issue in #5893 but ran into what appears to be a pytest scoping issue and haven't tried to fix it again yet

Apr 25 '24 04:04 psychedelicious

I'm still not following; maybe it's too late at night. The current schema version in main (and this PR) is 4.0.1 and there are two migrators defined, one from 3.0.0 to 4.0.0 and a second from 4.0.0 to 4.0.1.

Oh sorry, that's my mistake. I thought the migration you added was 4.1.0.

My intention was to test having config schema version that is between two schema versions. In the first iteration of this PR, the migrator would have treated 4.0.1 the same as 4.0.0 and migrated it to 4.1.0. Hope that makes sense.

Now I understand. I've added a new test for missing intermediate steps in the config migration path.

Apr 28 '24 18:04 lstein

@psychedelicious When you have a chance, please give this the once-over. I've converted the migration class into a migration instance, but I'm not sure I did this in the most elegant way.

May 13 '24 02:05 lstein

Ok, I've revisited this PR and made some changes, simplified a few things and made the tests more rigorous. I'm not entirely convinced the extra complexity in this change is warranted.

I'm running into a problem where the app creates the custom nodes dir & files in the repo root, whenever I run a test. I believe the reason is that my invoke venv is now inside the repo root, and the find_root method decides the repo root is the runtime root. Also, I noticed that my live config file is migrated on main, when running the config tests.

I added a workaround - setting the root to a temp dir when args weren't parsed. I'm really wary of this change... I'm out of time on this PR for now, I'll revisit later this week.

May 14 '24 08:05 psychedelicious

Ok, I've revisited this PR and made some changes, simplified a few things and made the tests more rigorous. I'm not entirely convinced the extra complexity in this change is warranted.

I'm running into a problem where the app creates the custom nodes dir & files in the repo root, whenever I run a test. I believe the reason is that my invoke venv is now inside the repo root, and the find_root method decides the repo root is the runtime root. Also, I noticed that my live config file is migrated on main, when running the config tests.

I added a workaround - setting the root to a temp dir when args weren't parsed. I'm really wary of this change... I'm out of time on this PR for now, I'll revisit later this week.

Blah. This is very bad. We need to set up the test environment with a temporary root directory. For the model manager tests, I created two fixtures like this:

@pytest.fixture
def mm2_root_dir(tmp_path_factory) -> Path:
    root_template = Path(__file__).resolve().parent / "data" / "invokeai_root"
    temp_dir: Path = tmp_path_factory.mktemp("data") / "invokeai_root"
    shutil.copytree(root_template, temp_dir)
    return temp_dir

@pytest.fixture
def mm2_app_config(mm2_root_dir: Path) -> InvokeAIAppConfig:
    app_config = InvokeAIAppConfig(models_dir=mm2_root_dir / "models", log_level="info")
    app_config._root = mm2_root_dir
    return app_config

Then I use the mm2_app_config fixture to get the config rather than calling get_conf(). Maybe something like this would work here?

I'll revert to this on Monday to see if I can make the tests behave properly.

When you say "I'm not sure the extra complexity is worth it," what are you referring to? To me, the current code looks more complex than the original version I wrote using decorators. My goal in the original version was to make it so that you could register a new migration by just adding a single decorated function definition to the migrations.py file. In the current code, we have to make code changes in four places: two changes in migrations.py to define the migration function and then create a ConfigMigration object, followed two more changes in config_migrate.py to import the new ConfigMigration object and then to register it in load_and_migrate_config.

I'd like to rethink how this is done. It seems like a lot of work just to remove a key from a dict.

May 19 '24 01:05 lstein

By "I'm not sure the extra complexity is worth it", I meant having a migration system at all for the config file. A couple simple functions serve the need just fine.

Where I was coming from I added to the PR: We don't have a need to be able to crank out config migrations quickly, but we do have a need for simple and testable code.

It took me a while to understand the intention of ABC, its implementation, the decorator and migration functions. Because the migration functions were only defined inside another function, it was not easy to test each one individually. You had to test the whole migration chain at once. Removing this extra stuff removes a layer of indirection.

May 19 '24 08:05 psychedelicious

The problem with interactions with the live filesystem aren't caused by tests calling get_config directly. It's that when the app's modules are loaded and classes instantiated in tests, get_config gets called at some point. Another issue is when we read the invocations folder's init, it creates a the custom nodes directory.

My last commit hijacks the root dir when the app isn't launched via script. This does work but I'm not confident in it. This issue is unrelated to the changes in this PR.

May 19 '24 08:05 psychedelicious

Couple related issues with how tests are running and the config and custom nodes loader:

get_config() is called in several places in the module scope (example). These calls should all be moved inside classes.
The bulk download images test imports the whole live app. That test is very thorough but also, a lot of it feels like testing for testing's sake. Fixing the root dir issue should make this OK.
The custom nodes implementation relies on the invokeai.app.invocations module's init file. In hindsight, this is clearly terrible.

May 23 '24 22:05 psychedelicious

@psychedelicious At this point I think it's easier just to add ad hoc code to the load_and_migrate_config() function. Unless you object, I'm just going to close this PR.

May 24 '24 15:05 lstein

This looks like a cul-de-sac and I'm closing it.

Jun 02 '24 13:06 lstein