sea-orm sea-orm-cli generate entity PoolTimedOut error with large schema

First off, thanks for all the hard work!

Description

When trying to use sea-orm-cli generate entity on a large Postgres 9.6.22 database (>400 tables of varying shapes, sizes, constraints, etc.), I encounter the following error:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: PoolTimedOut', /Users/will.wang/dev-projects/sea-schema/src/postgres/discovery/executor/real.rs:30:55

Similar to: https://github.com/SeaQL/sea-orm/issues/555

Steps to Reproduce

Unfortunately, I can't provide exact details around the schema. I think any sufficiently large database should work. I believe this occurs because the connection times out in between executing the many introspection queries.

Expected Behavior

Entities are automatically generated

Actual Behavior

Connection fails, panic occurs

Reproduces How Often

Always

Versions

sea-orm-cli: 0.8.1 (built locally on master at https://github.com/SeaQL/sea-schema/commit/00429f3bdbfe56af1afcee94b069a9619cea4263) postgres: 9.6.22 os: MacOS Monterey 12.3.1

Additional Information

I think the default connect_timeout of 30 seconds for the underlying sqlx::PoolOptions is too short. When I changed this locally to 120 seconds, I was able to avoid this issue. It would be great if we could configure a connect_timeout as an argument in sea-orm-cli generate entity similar to max_connections.

May 17 '22 17:05 wbew

Hey @001wwang, thanks for the investigations! Increasing the connect_timeout seems like a good temporary fix. However, I want to know why it takes so long to fetch the info of all db tables? Does anyone have a large sample schema for testing?

May 18 '22 11:05 billy1624

No problem, thanks for the prompt reply! I'll see if I can produce a large sample schema for recreating this issue later.

May 18 '22 13:05 wbew

I think the default connect_timeout of 30 seconds for the underlying sqlx::PoolOptions is too short. When I changed this locally to 120 seconds, I was able to avoid this issue. It would be great if we could configure a connect_timeout as an argument in sea-orm-cli generate entity similar to max_connections.

How did you solve it? I also have this problem now

Aug 03 '22 07:08 aprchen

How did you solve it? I also have this problem now

Sorry, this issue slipped my mind. Perhaps you could provide more details about your schema to help @billy1624 reproduce this locally.

For my fix, I modified the sea-orm-cli codebase locally to use a timeout of 120 seconds (then, built and run locally). Disclaimer: it's been a few months since then, so this may not be guaranteed to work nor may be the problem you're facing.

Aug 03 '22 13:08 wbew

I encountered this error for tables with partitions (postgres). If a table has a lot of partitions, the generator will see those as separate tables and try to generate for all of them. Deleting those partitions solved this problem for me.

This is actually expected and JOOQ also has the same behavior. You can solve this in JOOQ by providing an exclude regex like <excludes>.*_p[0-9_]+</excludes> so that it skips those.

In an ideal scenario, you should not have partitions in your target/seed schema but regex option can be added in cli to help these kind of situations in general. In other cases, connect_timeout option will be useful as well.

Oct 27 '22 08:10 cemremengu

Hey everyone, I created 3,000 tables on my local Postgres but still not able to reproduce the error. Could someone guide me how to reproduce it?

CREATE TABLE IF NOT EXISTS t_793593555976192 (col1 integer, col2 integer, col3 integer);
... 3,000 create table statements

Dec 30 '22 10:12 billy1624

I'll go ahead and close this issue given the above comments. I haven't looked into this any time recently, and it seems like it might be working at this point. Thanks all!

Jan 05 '23 16:01 wbew

Just for anyone else facing the issue: This seem to happen intermittently on slow/weak/high packet loss networks. Ensure your network is low latency to make it work.

Jan 08 '23 15:01 option-greek

I am having the same issue with partitioned tables as well. +1 for regex exclusion.

I tried writing a script to just pull them one table at a time and it's still having the same issue. This is against a local docker container.

data=`psql ${DATABASE_URL} -AXqtc "SELECT table_name FROM information_schema.tables WHERE table_schema='public' AND table_type='BASE TABLE';"`
data=$(echo "$data" | rg -v -e "p_hash_p" -e "schema_migrations")
for TABLE in $data
do
  echo "$TABLE"
  sea-orm-cli generate entity -o entity/src --lib --tables "$TABLE"
done

Edit:

I realize the issue here is not which tables I am trying to gather but rather just the schema discovery's query is taking a long time. Would it be possible to take the approved table list (given to CLI above) and use it to filter the discovery piece?

Apr 06 '23 18:04 jared-mackey

Ah, I think I tracked it down.

In sea-schema, we are doing future::join_all() on potentially a very large list. They all end up rushing to the executor to get a connection to execute their query. The default cli configis to only have a single connection so they're all just waiting for that one connection to be available. So the pool times out for some of the later ones. The fix was to set --max-connections 50 to allow a higher throughput and avoid the timeout. A lower number probably would've worked too but that's what did it for me.

Apr 06 '23 22:04 jared-mackey

@jared-mackey thank you for pointing it out. It seems like we should not join_all more than the pool size in SeaSchema.

I imagine we can chunk the vector by the pool size.

for chunk in vec.iter().chunks(pool_size) {
    join_all(op(chunk)).await?;
}

Apr 07 '23 04:04 tyt2y3

@tyt2y3 Seems like a good approach.

A couple of thoughts on the issue after looking into it a bit.

There is the join all problem that should be solved. However, that seems to be exacerbated by these

It seems that tables set to ignore via the CLI argument are still pulled via the discovery process, they're just not generated in the output. There is probably a good reason for this, but thought I'd mention it.
And to tie it back to #1582 a bit, in my case all the partitions were the same schema and I don't care about anything other than the super table. But I do believe there are cases where partitions can be different than the super schema so maybe we cannot optimize those out. Would need to do some research to see if they have a flag or something where we can tell if the partitions are the same as the super or not. If they are, we shouldn't pull them at all I don't believe.

Apr 07 '23 05:04 jared-mackey

sea-orm sea-orm copied to clipboard

sea-orm-cli generate entity PoolTimedOut error with large schema

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Reproduces How Often

Versions

Additional Information

sea-orm
sea-orm copied to clipboard