spring-framework icon indicating copy to clipboard operation
spring-framework copied to clipboard

There is an unknown problem with transaction support for reactive projects

Open ht-coder-wu opened this issue 2 years ago • 13 comments

Description

A simple demo project, using spring boot+webflux+spring data-r2dbc, stress testing, and some requesters hang for a long time until timeout. I'm not sure which link caused the suspension,but i did some experiments at https://github.com/ht-coder-wu/demo. I think :

  1. there must be some problems with connection pool or transactions
  2. request is suspended for some unknown reasons in the concurrent scenario for a long time

Version

  • spring-boot-starter-parent 2.7.8
  • r2dbc-mysql 0.8.2.RELEASE
  • r2dbc-pool 0.9.2.RELEASE
  • jdk 8

Recurrence(occasional)

application.properties like this: spring.application.name=demo spring.r2dbc.url=r2dbcs:mysql://localhost:3306/test spring.r2dbc.username=root spring.r2dbc.password=root spring.r2dbc.pool.enabled=true server.port=8080 logging.level.org.springframework.r2dbc=info spring.r2dbc.pool.initial-size=100 spring.r2dbc.pool.max-size=500

I did four sets of tests: image

Their difference is : testA Declarative transaction+request db testB Declarative transaction+do not request db testC no transaction+do not request db testD no transaction+request db

I use scripts for concurrency testing: iwtbafpdeMacBook-Pro:~ iwtbafp$ ab -n 100000 -c 100 -s 120 http://localhost:8080/testA image

there are 4 requests timeout(cause i set timeout 100s in webfilter and do response status 400),If I don't restart the service, continue to pressure test testD

request is suspended for some unknown reasons in the concurrent scenario for a long time

iwtbafpdeMacBook-Pro:~ iwtbafp$ ab -n 100000 -c 100 -s 120 http://localhost:8080/testD image

It looks like there's no problem but there are some tx hang in db image

If i restart the service after pressure test testA and test testD directly iwtbafpdeMacBook-Pro:~ iwtbafp$ ab -n 100000 -c 100 -s 120 http://localhost:8080/testD image

request is suspended for some unknown reasons in the concurrent scenario for a long time

I think there must be some problems with connection pool or transactions,It looks like testA causes some connections in the thread pool to no longer be submitted automatically,then testD which not in transaction hang in db tx.

there must be some problems with connection pool or transactions

Restart the service and pressure test testB directly iwtbafpdeMacBook-Pro:~ iwtbafp$ ab -n 100000 -c 100 -s 120 http://localhost:8080/testB image

request is suspended for some unknown reasons in the concurrent scenario for a long time

Restart the service and pressure test testC directly iwtbafpdeMacBook-Pro:~ iwtbafp$ ab -n 100000 -c 100 -s 120 http://localhost:8080/testC image I have tried this scene several times ,It seems to be all right (no transaction,no db).

Summarize

  • testA(transaction&db)+testD ,we see tx hang in db till i restart service, there must be some problems with connection pool or transaction(Declarative & Programmatic).

  • testC(no transaction&no db), it seems to be all right.

  • testB(transaction&no db),request hang till time out.

  • testD(no transaction& db ),request hang till time out (Occasional appearance).

I'm not sure what the problem is, but it does exist,hope to get helps,thanks.

ht-coder-wu avatar Feb 28 '23 10:02 ht-coder-wu

@ht-coder-wu Please check as follows: When your ab hung there, you open another terminal and just use curl http://localhost:8080/test*(ABCD is all OK) for test. I think the curling result should be ok. One possible problem is that your sysctl config for port range is not enough, which would cause ab to hang there(You could cat /proc/sys/net/ipv4/ip_local_port_range to check and ajust it to 1024~65536 and do your pressure test again).

QuantumXiecao avatar Mar 01 '23 03:03 QuantumXiecao

@ht-coder-wu Please check as follows: When your ab hung there, you open another terminal and just use curl http://localhost:8080/test*(ABCD is all OK) for test. I think the curling result should be ok. One possible problem is that your sysctl config for port range is not enough, which would cause ab to hang there(You could cat /proc/sys/net/ipv4/ip_local_port_range to check and ajust it to 1024~65536 and do your pressure test again).

@QuantumXiecao Thank you for your suggestion,I ignored the possibility of port range ,but when i adjusted it to 1024~65536 ,my ab hung there as before. indeed, when my ab hung there,,i can curl for test,the result is ok ,i think the problem is occasional. image

you can see the change: image

and the ab results like this: image

4 requests hung (100s) till time out as before ,then I didn't restart services and ab testD, problems existed as before. image

ht-coder-wu avatar Mar 01 '23 07:03 ht-coder-wu

@ht-coder-wu Please try ab -n 50000(modify from 100000 to 50000) -c 100 -s 120 http://localhost:8080/testC and show us the results. Many thx!

QuantumXiecao avatar Mar 02 '23 03:03 QuantumXiecao

@ht-coder-wu Please try ab -n 50000(modify from 100000 to 50000) -c 100 -s 120 http://localhost:8080/testC and show us the results. Many thx!

@QuantumXiecao request 50000's result: image request 100000's result: image

I ignored the influence of port range factors at first,so I ab 100000 times testC again .

Compare these results,increase or decrease in quantity of request will affect the final result,connect will waste more time ,what's your opinion?

Then I pressure (50000 requests) testA and testD without restart service, things have no change. image vs image image

by the way,It's hard for us to control production enviroment visits,if the key of the problem are hardware resource,we want it to be more visible instead of hanging .....

ht-coder-wu avatar Mar 02 '23 07:03 ht-coder-wu

@ht-coder-wu I am trying to get to the bottom of this. Do you confirm the problem goes away if that custom filter is not registered anymore?

snicoll avatar Dec 12 '23 18:12 snicoll

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

spring-projects-issues avatar Dec 19 '23 18:12 spring-projects-issues

@ht-coder-wu I am trying to get to the bottom of this. Do you confirm the problem goes away if that custom filter is not registered anymore?

If you are referring to ContextFilter, I confirm that the problem still exists,because initially, this issue occurred in the production environment.

ht-coder-wu avatar Dec 21 '23 08:12 ht-coder-wu

@ht-coder-wu thanks for following-up but that wasn't my question. I am asking if the problem does not occur if ContextFilter is not registered.

snicoll avatar Dec 21 '23 09:12 snicoll

@ht-coder-wu thanks for following-up but that wasn't my question. I am asking if the problem does not occur if ContextFilter is not registered.

sorry,It's my fault,I forgot to mention that ContextFilter is not registed in my production env at first. To clarify the issue more clearly,I comment out the register code like this

image

then. do testA and testD.

image image

some request time out ,and some tx hung as before...

image

ht-coder-wu avatar Dec 22 '23 03:12 ht-coder-wu

I'm not sure if this issue is due to the presence or absence of transactional DB requests in the code. As transactions are managed by Spring, the current session's auto commit is set to false before the request, while connections are taken out of the connection pool for reuse, causing transactional DB requests without transactions to be suspended

ht-coder-wu avatar Feb 01 '24 08:02 ht-coder-wu

So it's just because the usage is incorrect,right?

ht-coder-wu avatar Feb 02 '24 02:02 ht-coder-wu

Can you please:

  • Provide a docker-compose.yml with the version/configuration of MySQL to make the reproducer usable on my side
  • Upgrade to Spring Boot 3.2.2 to ensure this is still reproducible
  • Make sure that the code in the reproduced matches what you have in production (for ContextFilter that does not seems to be the case since it is enabled in your repro and not in production per your comments).
  • Check if you observe the same issue without transactions

sdeleuze avatar Feb 13 '24 14:02 sdeleuze

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

spring-projects-issues avatar Feb 20 '24 14:02 spring-projects-issues