babelfish_extensions icon indicating copy to clipboard operation
babelfish_extensions copied to clipboard

[Bug]: USE [dbname] runs for very long time

Open vijay750 opened this issue 3 months ago • 8 comments

What happened?

Every connection causes a call to the tsql "USE [dbname]" - See tdslogin.c These use a lot of cpu, and sometimes, when many clients connect, these show as heavily impacting performance when viewed on Performance Insights.

Edited - it is login related, not ORM related. Below is not needed. Some ORMs automatically issue a t-sql USE [dbname] at the beginning of their session. Sometimes, these show up as using cpu heavily on performance insights. A sample row from pg_activity_stat is below. Note the wait type - client - and the very high running time datid|datname |pid |leader_pid|usesysid |usename |application_name|client_addr|client_hostname|client_port|backend_start |xact_start |query_start |state_change |wait_event_type|wait_event|state |backend_xid|backend_xmin|query_id|query |backend_type | -----+------------+-----+----------+---------+---------------------+----------------+-----------+---------------+-----------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+---------------+----------+------+-----------+------------+--------+--------------------+--------------+ 20500|babelfish_db|26523| |274410165|providermastermb-user| |20.1.1.73 | | 44224|2025-09-22 14:29:03.182 +0530|2025-09-22 14:29:03.223 +0530|2025-09-22 14:29:03.247 +0530|2025-09-22 14:29:03.247 +0530| | |active| |40250470 | |USE [providermaster]|client backend|

Version

BABEL_5_X_DEV (Default)

Extension

None

Which flavor of Linux are you using when you see the bug?

No response

Relevant log output


Code of Conduct

  • [x] I agree to follow this project's Code of Conduct.

vijay750 avatar Sep 23 '25 01:09 vijay750

@vijay750 Thank you for reporting issue. Can you please also provide concise repro so that we can take a look into this? Additionally can you also provide observed runtime vs expectation?

Deepesh125 avatar Oct 08 '25 09:10 Deepesh125

Hi @Deepesh125 if you work for AWS, I have a working test case which shows this cpu spike on an instance in my account. We have tracked down the origin to babelfish code - when a connection is made to the bbf port via tds, as part of the initialisation, a USE statement is issued. There is a clear difference in cpu use for the babelfish port vs direct PG.

vijay750 avatar Nov 03 '25 15:11 vijay750

It is easily reproducible - set up an aurora cluster with babelfish, and contrast the cpu usage as reported by performance insights for the bbf port vs the PG port. I've used 100 connections, but even 50 reproduces the issue.

Code to repro, using python3. from threading import local from concurrent.futures import ThreadPoolExecutor import pyodbc import psycopg2 For the bbf port:

` host='test-bbf-cluster-ccvogfvw9jim.ap-south-1.rds.amazonaws.com' port='1433' user='youruser' password='yourpass' database='' connectionString = f'DRIVER={{ODBC Driver 18 for SQL Server}};SERVER={host};DATABASE={database};UID={user};PWD={password};PORT={port};TrustServerCertificate=YES'

def init(thr_local): thr_local.conn = pyodbc.connect(connectionString) thr_local.cur = thr_local.conn.cursor()

def conn_runner(local): local.cur.execute('select 1')

with ThreadPoolExecutor(max_workers=100, initializer=init, initargs=(thr_local,)) as rexe: futs = [] [futs.append(rexe.submit(conn_runner, thr_local)) for _ in range(100)] wait(futs) `

For the PG port - is it much faster, and there is no cpu spike. ` def pgconnect(): conn = psycopg2.connect(host='test-cluster-ccvogfvw9jim.ap-south-1.rds.amazonaws.com', port='5433', user='youruser', password='yourpasswd') cur = conn.cursor() cur.execute('select 1')

with ThreadPoolExecutor(max_workers=100) as pgexe: futs = [] [futs.append(pgexe.submit(pgconnect)) for _ in range(100)] wait(futs) `

vijay750 avatar Nov 04 '25 10:11 vijay750

  1. @vijay750: There is no concept of USE <database> in PostgreSQL. So it is moot to compare with it.

  2. If you are an AWS customer, please have AWS support work with the service team as a support case.

suprio-amzn avatar Nov 09 '25 19:11 suprio-amzn

@suprio-amzn you have COMPLETELY missed the point of the bug report. It was precisely to say that there is a big difference in connection overhead when connecting on bbf port vs pg port. This is DUE TO the bbf code calling "USE" - the connection code has NO control over this. I have done the following:

  1. Provided a test case
  2. Pointed out the probable code causing this (unnecessary table lock) I suggest you look at all the comments I have provided on THIS SAME bug report before replying.

vijay750 avatar Nov 11 '25 02:11 vijay750

Hi @vijay750 , we have verified this and created a JIRA BABEL-6184 to investigate further.

It's just the mention of direct comparison with PG in use context, which can be confusing.

But we get your point and there is a case for investigation. Thanks.

jsudrik avatar Nov 11 '25 04:11 jsudrik

@jsudrik a plausible clause is that in the call stack to implement the USE statement, a call is made to get_authid_user_ext_physical_name() in file session.c. In this, there is the following line: bbf_authid_user_ext_rel = table_open(get_authid_user_ext_oid(), RowExclusiveLock); The RowExclusiveLock seems excessively strict for this operation, and may cause a lot of cpu spinning in lock contention.

vijay750 avatar Nov 12 '25 12:11 vijay750

@vijay750 Could you please mention how many babelfish users and logins are present in the server where you reproduced the issue ?

tanscorpio7 avatar Nov 16 '25 10:11 tanscorpio7

@tanscorpio7 I have added a way to reproduce this issue on a fresh db, with only one user: https://github.com/babelfish-for-postgresql/babelfish_extensions/issues/4118#issuecomment-3485228703

vijay750 avatar Dec 04 '25 12:12 vijay750