odyssey icon indicating copy to clipboard operation
odyssey copied to clipboard

actual master from 29.01.2022 broke up auth_query functional to compare with 1.2 release

Open alexdyukov opened this issue 3 years ago • 11 comments

Hi there!

I had tried to dockerized odyssey but found out some random segfault under load (2k tps) at 1.2 release. pgbench works great on 1.2 release. To reproduce it on actual master I had made docker files like this: Dockerfile_centos7.txt Dockerfile_ubuntu.txt entrypoint.sh.txt and got this error by pgbench: 20220129_1547_odyssey_error

There are no more auth_query functional or configs have no backward compatibility?

alexdyukov avatar Jan 29 '22 20:01 alexdyukov

please add pool_routing "internal" to auth route.

database "auth" {
        user "auth" {
                authentication "none"
                storage "upstream"
                storage_db "postgres"
                storage_user "postgres"
                storage_password "postgres"
                pool_routing "internal"
                pool "transaction"
                pool_size 3
                pool_timeout 0
                pool_ttl 30
                pool_discard no
                pool_cancel no
                pool_rollback no
        }
}

Maybe we need to dynamicaly create internal route for auth queries if it is not configured by user, or a least fix a doc about auth query...

reshke avatar Jan 30 '22 05:01 reshke

Thanks, @reshke . It works. But current master still throws segfault. Is there any instruction for core dump my container?

P.S. You can reproduce it with toward configs: PGPASSWORD=secret pgbench -h odyssey-proxy -p 5432 -U nonpostgres -c 300 -j 100 -t 1000 pgbench ofc auth checker and nonpostgres is different users 20220130_0958_odyssey_segfault

P.S.2. last line is actual master from 29.01.2022 and others are release 1.2

alexdyukov avatar Jan 30 '22 07:01 alexdyukov

Thanks, your testcase was helpfull. There was a problems, when number of simultaneous auth queries is much bigger then (auth route) pool size. This should be fixed with https://github.com/yandex/odyssey/pull/413, could you please test it as well? Looks like it should help.

reshke avatar Jan 30 '22 18:01 reshke

@reshke now it doesnt segfault, but pool_size on auth querys in some cases throws failed to make auth_query errors in logs like in screenshot. When i disable it with pool_size = 0 its works well. It much better then topic error, but still an issue with auth_query. Should i create new one? photo_2022-02-01_13-40-01

to reproduce it:

  1. run docker container with current master from Dockerfile_ubuntu and entrypoint.sh from towards message
  2. set low pool_size to let auth_query pool size count much less then future pgbench
  3. run pgbench with high connections (-c)
  4. profit

alexdyukov avatar Feb 01 '22 10:02 alexdyukov

When i disable it with pool_size = 0 its works well.

It seems to me that in this case everything works as expected, doesn't it? pool_size regulates the number of server connections towards auth_db that can be taken simultaneously. With a large number of parallel authentications (pool_size << number of client trying to auth), there will be errors any way i think

reshke avatar Feb 02 '22 06:02 reshke

@reshke Nope, the feature has a different description: https://github.com/yandex/odyssey/blob/master/documentation/configuration.md#pool_size-integer Clients are put in a wait queue, when all servers are busy. and:

pool_timeout integer

Server pool wait timeout.
Time to wait in milliseconds for an available server. Disconnect client on timeout reach.
Set to zero to disable.

i do pool_timeout = 0 and expects that everything works as i noted :)

Btw, this feature works as expected at my main storage, but not for auth

alexdyukov avatar Feb 02 '22 07:02 alexdyukov

Seems having the same with 1.2 -> 1.3 transition:

ERROR: odyssey: c9af96b00a8d9: failed to make auth query (SQLSTATE 28000)
    database default {
      user default {
        authentication  "md5"
        auth_query      "SELECT username, password FROM odyssey.get_auth($1)"
        auth_query_db   "odyssey"
        auth_query_user "odyssey"
        password        "md5xxx"

        storage "postgresql"

        pool          "session"
        pool_size     0
        pool_timeout  0
        pool_ttl      60
        pool_cancel   no
        pool_rollback yes

        client_fwd_error no
      }
    }

Antiarchitect avatar Jun 27 '22 14:06 Antiarchitect

Having the same issue when migrated from 1.1 -> 1.3:

# ADMIN CONSOLE
storage "local" {
  type "local"
}

database "console" {
  user default {
    authentication "none"
    pool "session"
    storage "local"
  }
}

# DEFAULT CONNECTION
storage "postgres_server" {
  type "remote"
  host "xxx.xxx.xxx.xxx"
  port 5432
  tls "disable"
}

database default {
  user default {
    authentication "md5"
    auth_query "SELECT usename, passwd FROM pg_shadow WHERE usename='%u'"
    auth_query_db "postgres"
    auth_query_user "postgres"

    storage "postgres_server"
    pool "session"
    client_fwd_error yes
    client_max 400
  }
}

unix_socket_dir "/tmp"
unix_socket_mode "0644"
log_format "%p %t %l [%i %s] (%c) %m\n"
log_to_stdout yes
log_debug no
log_config yes
log_session yes
log_query no
log_stats yes
stats_interval 60

listen {
  host "*"
  port 6432
  tls "disable"
}

pySilver avatar Jun 29 '22 22:06 pySilver

Having the same issue when migrated from 1.1 -> 1.3:

Same for me, "auth_query" doesn't work at all in 1.3

jurim76 avatar Sep 16 '22 07:09 jurim76

Same error

MikeVL avatar Sep 26 '22 17:09 MikeVL

Defining database "postgres" with internal routing works for me

` storage "testdb" { type "remote" host "testdb" port 5432 }

database "postgres" {
  user "pgbouncer" {
    authentication "none"
    pool_routing "internal"
    storage "testdb"
    pool "session"
  }
}

database "test" {
  user default {
    authentication "md5"
    auth_query "SELECT uname,phash FROM pgbouncer.user_lookup($1)"
    auth_query_user "pgbouncer"
    auth_query_db "postgres"
    storage "testdb"
    ...
  }

}
`

jurim76 avatar Sep 26 '22 19:09 jurim76