dcache icon indicating copy to clipboard operation
dcache copied to clipboard

Bulk request stuck on queue state

Open cfgamboa opened this issue 9 months ago • 4 comments
trafficstars

Hello,

FYI, There were stage requests in queue state as shown below

---------- REQUESTS ----------
     STATUS |           COUNT
  CANCELLED |               2
  COMPLETED |             350
     QUEUED |          123854
    STARTED |             860
---------- TARGETS -----------
      STATE |           COUNT
  CANCELLED |             202
  COMPLETED |          193471
    CREATED |          273326
     FAILED |               1

Not evidence in the PoolManager about the staging requests

[dccore01] (PoolManager@dccore01Domain) admin > rc ls 
00003AAB7819ACB34083A74B4050CF4619E5@internal-net-external-net-world-net-*/* m=0 r=0 [dc280_12] [Waiting for stage: dc280_12 02.08 05:45:52] {0,}

Logs show these.

04 Feb 2025 10:30:19 [pool-6-thread-634] [Frontend-dcfrontend02 BulkRequestStatus] Uncaught exception in thread pool-6-thread-634com.google.common.util.concurrent.UncheckedExecutionException: org.springframework.dao.DataAccessResourceFailureException: PreparedStatementCallback; SQL [SELECT bulk_request.*, request_arguments.arguments as arguments FROM bulk_request LEFT OUTER JOIN request_arguments ON bulk_request.id = request_arguments.rid WHERE uid = ? ORDER BY arrived_at ASC LIMIT 1]; FATAL: terminating connection due to administrator command; nested exception is org.postgresql.util.PSQLException: FATAL: terminating connection due to administrator command
	at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2085)
	at com.google.common.cache.LocalCache.get(LocalCache.java:4011)
	at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4034)
	at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010)
	at org.dcache.services.bulk.store.jdbc.request.JdbcBulkRequestStore.get(JdbcBulkRequestStore.java:826)
	at org.dcache.services.bulk.store.jdbc.request.JdbcBulkRequestStore.valid(JdbcBulkRequestStore.java:910)
	at org.dcache.services.bulk.store.jdbc.request.JdbcBulkRequestStore.getKey(JdbcBulkRequestStore.java:353)
	at org.dcache.services.bulk.BulkService.lambda$messageArrived$4(BulkService.java:243)
	at org.dcache.util.CDCExecutorServiceDecorator$WrappedRunnable.run(CDCExecutorServiceDecorator.java:130)
	at org.dcache.util.BoundedExecutor$Worker.run(BoundedExecutor.java:247)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.springframework.dao.DataAccessResourceFailureException: PreparedStatementCallback; SQL [SELECT bulk_request.*, request_arguments.arguments as arguments FROM bulk_request LEFT OUTER JOIN request_arguments ON bulk_request.id = request_arguments.rid WHERE uid = ? ORDER BY arrived_at ASC LIMIT 1]; FATAL: terminating connection due to administrator command; nested exception is org.postgresql.util.PSQLException: FATAL: terminating connection due to administrator command
	at org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:107)
	at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:70)
	at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:79)
	at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:79)
	at org.springframework.jdbc.core.JdbcTemplate.translateException(JdbcTemplate.java:1541)
	at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:667)
	at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:713)
	at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:744)
	at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:757)
	at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:810)
	at org.dcache.services.bulk.store.jdbc.JdbcBulkDaoUtils.get(JdbcBulkDaoUtils.java:171)
	at org.dcache.services.bulk.store.jdbc.request.JdbcBulkRequestDao.get(JdbcBulkRequestDao.java:156)
	at org.dcache.services.bulk.store.jdbc.request.JdbcBulkRequestStore$RequestLoader.load(JdbcBulkRequestStore.java:146)
	at org.dcache.services.bulk.store.jdbc.request.JdbcBulkRequestStore$RequestLoader.load(JdbcBulkRequestStore.java:142)
	at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3570)
	at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2312)
	at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2189)
	at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2079)
	... 12 common frames omitted
Caused by: org.postgresql.util.PSQLException: FATAL: terminating connection due to administrator command
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2675)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2365)
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:355)
	at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:490)
	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:408)
	at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:167)
	at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:119)
	at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52)
	at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java)
	at org.springframework.jdbc.core.JdbcTemplate$1.doInPreparedStatement(JdbcTemplate.java:722)
	at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:651)
	... 24 common frames omitted

Restarting the bulk service permitted the requests in queue to resume

---------- REQUESTS ----------
     STATUS |           COUNT
  CANCELLED |               2
  COMPLETED |            1647
    STARTED |            1337

---------- TARGETS -----------
      STATE |           COUNT
  CANCELLED |             240
  COMPLETED |          198071
     FAILED |             861
    RUNNING |          150441

Carlos

cfgamboa avatar Feb 09 '25 17:02 cfgamboa