graphql-engine icon indicating copy to clipboard operation
graphql-engine copied to clipboard

Server/CLI: Incorrect `replace_metadata` API behaviour

Open scriptonist opened this issue 2 years ago • 10 comments

Version Information

Server Version: v2.10.1 CLI Version (for CLI related issue): v2.10.1

Environment

OSS/Cloud/EE

What is the current behaviour?

Let's take a look at the following scenario

Note: For simplicity, in the examples below I'm using hasura CLI to apply metadata which uses replace_metadata API under the hood

I have a new HGE instance running. I'm applying the following metadata to the server.

{
  "version": 3,
  "sources": [
    {
      "name": "default",
      "kind": "postgres",
      "tables": [
        {
          "table": {
            "name": "t1",
            "schema": "public"
          }
        },
        {
          "table": {
            "name": "t2",
            "schema": "public"
          }
        }
      ],
      "configuration": {
        "connection_info": {
          "database_url": {
            "from_env": "HASURA_GRAPHQL_DATABASE_URL"
          },
          "isolation_level": "read-committed",
          "pool_settings": {
            "connection_lifetime": 600,
            "idle_timeout": 180,
            "max_connections": 50,
            "retries": 1
          },
          "use_prepared_statements": true
        }
      }
    }
  ]
}

As expected our server now has inconsistencies (shown below) because tables t1 and t2 does not exist yet. So, I go ahead and apply migrations which create these tables.

$ hasura md ic list                                    130 ↵
  NAME  TYPE   DESCRIPTION                         REASON

  t2    table  {"name":"t2","schema":"public"}...  Inconsistent object: no such
                                                   table/view exists in source:
                                                   "t2"
  t1    table  {"name":"t1","schema":"public"}...  Inconsistent object: no such
                                                   table/view exists in source:
                                                   "t1"
$ hasura migrate apply --all-databases
INFO migrations applied on database: default

Now I try to apply metadata again. Pause for a second here and think about what might be the expected result of this operation.

$ hasura metadata apply 
WARN Metadata is inconsistent
WARN Use 'hasura metadata ic list' command to list inconsistent objects
INFO Metadata applied

$ hasura md ic list
  NAME  TYPE   DESCRIPTION                         REASON

  t2    table  {"name":"t2","schema":"public"}...  Inconsistent object: no such
                                                   table/view exists in source:
                                                   "t2"
  t1    table  {"name":"t1","schema":"public"}...  Inconsistent object: no such
                                                   table/view exists in source:
                                                   "t1"

But as we see above, the metadata is inconsistent.

What is the expected behaviour?

w.rt to the example above, After resolving inconsistencies ie creating relevant tables, metadata apply should not result in inconsistencies.

How to reproduce the issue?

Refer description above

Keywords

replace_metadata, hasura metadata apply

scriptonist avatar Aug 17 '22 12:08 scriptonist

@manasag pretty sure this is product platform team, either the CLI or the metadata on the server.

ajohnson1200 avatar Aug 18 '22 23:08 ajohnson1200

This bug is accepted and has been kept in the Platform team backlog.

manasag avatar Aug 23 '22 10:08 manasag

I've been digging into this and it seems quite clear that this is caused by a failure to invalidate the relevant parts of the schema cache upon replacing the metadata.

This seems intentional. I'm happy to change the behavior so that the schema cache is invalidated upon replacing the metadata, but before I do that, it'd be good to know that we are happy with this change, and that there isn't a good reason why we should keep it the way it is.

@manasag, what do you think?

@abooij, do you know why the replace_metadata endpoint doesn't force a reload of existing source data?

SamirTalwar avatar Sep 15 '22 19:09 SamirTalwar

I've been digging into this and it seems quite clear that this is caused by a failure to invalidate the relevant parts of the schema cache upon replacing the metadata.

This seems intentional. I'm happy to change the behavior so that the schema cache is invalidated upon replacing the metadata, but before I do that, it'd be good to know that we are happy with this change, and that there isn't a good reason why we should keep it the way it is.

@manasag, what do you think?

@abooij, do you know why the replace_metadata endpoint doesn't force a reload of existing source data?

The inconsistencies raised by migrations normally get resolved after the final hasura metadata apply. So what causes this to normally work as expected, but fail in the case reported in this issue?

Depending on the answer, I'm inclined to say that this is an oversight. I was not involved in the original development of replace_metadata - @SamirTalwar have you done any archeological work in this regard to dig up the design decisions involved?

abooij avatar Sep 15 '22 20:09 abooij

At some point there was an idea to have the Console be fully powered by the replace_metadata API, rather than using the incremental metadata API. In that case, it would be important not to invalidate the schema cache fully every time.

@0x777 might also have some context here.

abooij avatar Sep 15 '22 20:09 abooij

I tried digging into the thought process behind replace_metadata but didn't find too much; I think tacit knowledge is going to get us further than Git history in this case.

SamirTalwar avatar Sep 16 '22 08:09 SamirTalwar

The inconsistencies raised by migrations normally get resolved after the final hasura metadata apply. So what causes this to normally work as expected, but fail in the case reported in this issue?

In this case, the inconsistencies are already present, which I think is different from the case you're describing, in which the migration introduces an inconsistency. Do I have this right?

Are you saying the migration step should automatically reload the schema, or is it more nuanced than that?

SamirTalwar avatar Sep 16 '22 08:09 SamirTalwar

At some point there was an idea to have the Console be fully powered by the replace_metadata API, rather than using the incremental metadata API. In that case, it would be important not to invalidate the schema cache fully every time.

That's a good point. However, between the two options, I think ignoring the previous state and 'applying the metadata from scratch' seems less surprising than our current behaviour. @rikinsk What are your thoughts?

The inconsistencies raised by migrations normally get resolved after the final hasura metadata apply

Interesting. I have not considered this at all. So why isn't the schema cache being rebuilt after the last pg_run_sql, which would've resolved the inconsistencies?

queryModifiesSchema :: RQLQuery -> Bool
queryModifiesSchema = \case
  RQInsert _ -> False
  RQSelect _ -> False
  RQUpdate _ -> False
  RQDelete _ -> False
  RQCount _ -> False
  RQRunSql q -> Postgres.isSchemaCacheBuildRequiredRunSQL q
  RQCitusRunSql q -> Postgres.isSchemaCacheBuildRequiredRunSQL q
  RQCockroachRunSql q -> Postgres.isSchemaCacheBuildRequiredRunSQL q
  RQMssqlRunSql q -> MSSQL.isSchemaCacheBuildRequiredRunSQL q
  RQMysqlRunSql _ -> False
  RQBigqueryRunSql _ -> False
  RQBigqueryDatabaseInspection _ -> False
  RQBulk l -> any queryModifiesSchema l


Postgres.isSchemaCacheBuildRequiredRunSQL :: RunSQL -> Bool
Postgres.isSchemaCacheBuildRequiredRunSQL RunSQL {..} =
  case rTxAccessMode of
    Q.ReadOnly -> False
    Q.ReadWrite -> fromMaybe (containsDDLKeyword rSql) rCheckMetadataConsistency
  where
    containsDDLKeyword =
      TDFA.match
        $$( quoteRegex
              TDFA.defaultCompOpt
                { TDFA.caseSensitive = False,
                  TDFA.multiline = True,
                  TDFA.lastStarGreedy = True
                }
              TDFA.defaultExecOpt
                { TDFA.captureGroups = False
                }
              "\\balter\\b|\\bdrop\\b|\\breplace\\b|\\bcreate function\\b|\\bcomment on\\b"
          )

We seem to return false when the SQL statement is something along create table(..). If I remember correctly, this design was meant to prevent a consistent object becoming inconsistent. Hence, we only look for the keywords alter, replace etc, i.e, statements which could potentially change the behaviour of a metadata object (create function I think exists because you can create overloaded functions and that causes function objects in metadata to be come inconsistent). However we never really accounted for the transition of a metadata object from inconsistent to consistent. Maybe we should take a look at this again.

0x777 avatar Sep 19 '22 11:09 0x777

If I remember correctly, this design was meant to prevent a consistent object becoming inconsistent.

In what kind of scenario would a create table cause this to happen?

abooij avatar Sep 19 '22 12:09 abooij

I think there are two issues here.

The first is the one pointed out by @scriptonist. I think that it can be summarized as follows: replace_metadata should have the effect of also reloading the metadata. Exporting the metadata, and then immediately replacing the metadata with the exported metadata (i.e. a no-op) should have the effect of reloading the metadata.

replace_metadata . export_metadata === reload_metadata

This is a departure from the current behavior, but IMO a good one, as I think it will be less surprising for the average user.


The second issue is the one highlighted by @0x777: a CREATE TABLE query, run through the run_sql command, can have the effect of making currently-inconsistent metadata consistent. This means that we probably need to invalidate the cache for CREATE alongside all the other keywords we recognize as "DDL". If we had this behavior, @scriptonist would not need to reload (or reapply) the metadata at all.

I think this is important, but there's nothing preventing @scriptonist from modifying the table using psql or another migration framework instead of hasura migrate. I therefore don't consider it a fix to this issue, though I do think it's worth fixing.

(@scriptonist, please correct me if I've misinterpreted you in any way.)

@0x777, @manasag, what do you think? Do we fix both of these? Only one of them?

SamirTalwar avatar Sep 19 '22 14:09 SamirTalwar

replace_metadata should have the effect of also reloading the metadata

This means that we probably need to invalidate the cache for CREATE alongside all the other keywords we recognize as "DDL". If we had this behavior, @scriptonist would not need to reload (or reapply) the metadata at all.

I think this is important, but there's nothing preventing @scriptonist from modifying the table using psql or another migration framework instead of hasura migrate. I therefore don't consider it a fix to this issue, though I do think it's worth fixing

@SamirTalwar I agree with all of these. Let's change the replace_metadata behaviour in one PR and then create an issue for 'create table' turning inconsistent objects into consistent and add it to our backlog.

0x777 avatar Sep 21 '22 22:09 0x777

If I remember correctly, this design was meant to prevent a consistent object becoming inconsistent.

In what kind of scenario would a create table cause this to happen?

It wouldn't and hence we don't re-build schema cache when we see create table like statements. Note that there is no regular expression match for create or create table in isSchemaCacheBuildRequiredRunSQL.

0x777 avatar Sep 21 '22 22:09 0x777

@scriptonist, I have resolved this in https://github.com/hasura/graphql-engine/commit/93e8803d3e1cc4650229f3764743697abefc2665. happy to close this out now?

SamirTalwar avatar Sep 28 '22 14:09 SamirTalwar