graphql-engine icon indicating copy to clipboard operation
graphql-engine copied to clipboard

hdb_catalog schema Inconsistent between Instances

Open DomVinyard opened this issue 2 years ago • 3 comments

Version Information

Server Version: v2.13.0-cloud.1

Environment

Cloud

What is the current behaviour?

I have two Hasura instances which differ in how they have constructed the hdb_catalog in the database.

If I run this sql in both:

SELECT * FROM information_schema.tables 
WHERE table_schema = 'hdb_catalog'

I see different results. The first is what we have (rightly or wrongly) considered to be the 'correct' behaviour (the behaviour we have coded against) image

The second instance does not surface these tables, instead (for the same query) giving us:

image

I'm not sure which is the intended behaviour. The reason this is an issue is because we have a script which can return the number of unprocessed events at any one time:

post(HASURA_QUERY_ENDPOINT, {
      headers: {"X-Hasura-Admin-Secret": HASURA_ADMIN_SECRET},
      body: JSON.stringify({
         type: "run_sql",
         args: { sql: "select count(*) from hdb_catalog.hdb_scheduled_events where status = 'scheduled'" },
      }),
});

This obviously throws an error in the second example as the table does not exist. I would like to understand:

  1. Which of these two results is intended/expected?
  2. How/where are events stored in the second example?
  3. How can I go about making these consistent?

Thanks in advance

(Related: https://stackoverflow.com/q/70010557/3001761)

DomVinyard avatar Oct 16 '22 08:10 DomVinyard

@DomVinyard Actually, what you are seeing are two different database schemas: metadata and main database. You should see distinct values in "table_catalog" column in the above sql query.

Hasura Metadata is a representation of the configuration of the Hasura Server. From actions to databases and their tables, to permissions, event triggers and API limits, everything regarding the config of the Hasura Server is represented in Metadata.

Please let me know if you need more information.

adas98012 avatar Oct 17 '22 22:10 adas98012

Hi @adas98012, thank you so much for your help.

what you are seeing are two different database schemas: metadata and main database

I think I'm even more confused than before! Why the same sql command would return two distinct schemas - the hbd_catalog schema was generated by Hasura, but in one of the examples it is the main database? what does that mean?

You should see distinct values in "table_catalog" column in the above sql query.

The table_catalogue column seems to contain the name of the environment, staging_myapp in the first example and production_myapp in the second

image

vs

image

Please let me know if you need more information

I'd love to know how you would change the behaviour in the second example so it works the same as the first example. Should I just create these tables manually? I guess I'm still no closer to understanding my initial three questions

  1. Which of these two results is intended/expected (for the query below)?
SELECT * FROM information_schema.tables 
WHERE table_schema = 'hdb_catalog'
  1. How/where are events stored in the second example? (if this is the 'main database' then where is the metadata database)
  2. How can I go about making these consistent?

The current Hasura Cloud runs on version 3 config

I don't see anything In those docs about how hbd_catalog is constructed and used - is it undocumented for a reason, aka should I not be interfacing with it directly? If not is there a recommended way to query for scheduled events?

DomVinyard avatar Oct 18 '22 10:10 DomVinyard

A bit more context:

I can see from the docs:

scheduled events also have two tables managed by Hasura: hdb_catalog.hdb_scheduled_events: Table that stores all the scheduled events. hdb_catalog.hdb_scheduled_event_invocation_logs: Table that stores all the HTTP requests and their responses of the scheduled events invocations.

In all of my environments -- local and cloud -- this is true. Except for one instance (example #2 above) - in this instance the table does not exist. If I create an event and then run:

select count(*) from hdb_catalog.hdb_scheduled_events where status = 'scheduled'" 

I get an error. However, the event is still showing in the Events Tab, so it is definitely being stored somewhere.

DomVinyard avatar Oct 18 '22 17:10 DomVinyard

Hi @DomVinyard

How to get count of events

The short answer to your question:

is it undocumented for a reason, aka should I not be interfacing with it directly? If not is there a recommended way to query for scheduled events?

Is that yes, these tables are considered implementation details of hasura functionality and not intended to be used directly. Their schema may change when upgrading to newer versions of hasura.

The intended way to interact with these tables is to use the corresponding APIs. Unfortunately, an API allowing you to do exactly what you want (get a count of pending events) is not currently documented. Most likely it exists and is either undocumented because it is considered unstable, or we are simply missing documentation. We'll inquire into this and provide an update here.

Why are there multiple versions of the hdb_catalog schema

The answer to this question is a bit complicated so I'll have to provide some context on older versions of hasura. In hasura 1.3, the hdb_catalog schema is used to store metadata (configuration information), and a bunch of other things including event data.

Hasura 2.0 introduced the ability to connect multiple databases, which posed the question: if we can connect to multiple databases, where do we store the metadata required for startup?

The answer is that we have a dedicated metadata database. Access to this database is required for startup. On hasura cloud, this database is hosted by us and cannot be accessed directly by the users.

Users who are self-hosting and do not explicitly specify a metadata database using HASURA_GRAPHQL_METADATA_DATABASE_URL, and instead specify a single database using the legacy HASURA_GRAPHQL_DATABASE_URL will be able to continue using that database for metadata.

This explains why you may be seeing a different schema in a self-hosted dev environment, and on hasura cloud. If you are seeing these tables on a 2.x.x cloud environment, it may be because the database was at one point used with a 1.3 project, or with a self-hosted instance.

Event triggers

Hasura's event triggers work by using postgres triggers to build a queue of events, and then asynchronously processing those events.

For performance reasons, building this queue still happens on the user's database, which is why projects that use event triggers will still have a different version of the hdb_catalog schema.

Your two different schemas

Your first screenshot shows a metadata database hdb_catalog schema. This may either be a dedicated metadata database, or a project using a single database for both the metadata and the user's data.

Your second screenshot shows a schema of a project that has a dedicated metadata database (could be a cloud project) and has the tables used by the event processing loop.

I hope this helps clarify things.

BenoitRanque avatar Oct 19 '22 00:10 BenoitRanque

@BenoitRanque amazing, thank you - that really clears things up 🙏

If you are seeing these tables on a 2.x.x cloud environment, it may be because the database was at one point used with a 1.3 project

Yup, that's exactly it. We created our staging environment a little while ago with the old schema. Now we're ready to build a production environment the upgrade to 2.x means we are no longer able to query for event counts, I understand that this was undocumented so no there's absolutely blame, however to us it's still a regression and we need to find an urgent workaround.

an API allowing you to do exactly what you want (get a count of pending events) is not currently documented [...] We'll inquire into this and provide an update here.

That would be amazing. Without this we'll need to rethink our events strategy so really appreciate your advice

DomVinyard avatar Oct 19 '22 07:10 DomVinyard

@DomVinyard Thank you for your prompt update. We have created a separate feature request for an API and intimated the engineering team. You should be able to track its progress.

adas98012 avatar Oct 19 '22 18:10 adas98012

@DomVinyard We would like to share with you an internal API to get count of scheduled events. This one is not documented yet, so here is its usage example (as I verified in Postman):

POST <your-cloud-app-name>.hasura.app/v1/metadata 

Header:
Content-Type: application/json
X-Hasura-Role: admin
x-hasura-admin-secret: <your-admin-secret>

Body:
{
    "type": "get_scheduled_events",
    "args": {
        "type": "cron",
        "trigger_name": "<your-trigger-name>",
        "limit": 0,
        "offset": 0,
        "get_rows_count": true
    }
}

Here is a sample response:

{
    "count": 300,
    "events": []
}

Please let us know if the info suffice. Hope, this will give you the intended outcome and helps your deployment.

adas98012 avatar Oct 26 '22 21:10 adas98012

Hi @adas98012 that's really helpful thank you, how can I query for the one off events, something like type: "one-off"?

DomVinyard avatar Nov 23 '22 13:11 DomVinyard

@DomVinyard

You can try the same POST query as above, with a small variation in the Body :

{
    "type": "get_scheduled_events",
    "args": {
        "type": "one_off",
        "limit": 0,
        "offset": 0,
        "get_rows_count": true
    }
}

Here is a sample response :

{
    "count": 1,
    "events": []
}

adas98012 avatar Nov 24 '22 00:11 adas98012

Great, and I've yet to test this yet but I assume we can filter by comment too

{
    "type": "get_scheduled_events",
    "args": {
        "type": "one_off",
        "limit": 0,
        "offset": 0,
        "get_rows_count": true,
        "comment": "event_type_a"
    }
}

Between this and delete_scheduled_event we now have everything we need to manage our events directly within hasura, thank you

DomVinyard avatar Nov 28 '22 12:11 DomVinyard

@DomVinyard Thanks for the update. I don't think our API supports filtering by comment. We are happy to learn that you have all your event management cases covered otherwise.

adas98012 avatar Nov 28 '22 17:11 adas98012

OK, that's fine, we can fetch all events and filter on our side, there's nothing that should prevent us from doing that for the meantime, right? - thanks @adas98012

DomVinyard avatar Nov 30 '22 15:11 DomVinyard

I don't think our API supports filtering by comment

Any chance you could clarify whether this is definitely the case, also whether it is a part of your roadmap

DomVinyard avatar Nov 30 '22 17:11 DomVinyard