pyroscope-java icon indicating copy to clipboard operation
pyroscope-java copied to clipboard

Wall profiler event doesn't like remote JDB session made via Cloudflare WARP

Open luaps opened this issue 1 year ago • 4 comments

Hi Pyroscope JAVA,

I have the following setup:

  • AWS EKS cluster with two namespaces dev and test
  • Both namespaces consist of a bunch of Java Spring Boot applications
  • Applications in the dev namespace use Pyroscope agent, the PYROSCOPE_PROFILER_EVENT is set to wall
  • The connection from local computers to the EKS PODs' network is done via Cloudflare WARP (WiteGuard)
    • The connection can be made directly to the POD's IP or via kubectl port-forward

What I have observed is that

  • Java Debugger attachment from inside the POD i.e. kubectl exec + jdb -attach localhost:4001 works like a charm
  • It also works from namespace to namespace or from POD to POD, meaning that it can jump from one EKS node to another
  • What does't work is Java Debugger attachment from the local computers to the applications in the dev namespace when PYROSCOPE_PROFILER_EVENT is set to wall. If I set cpu or itimer it does work.

The below error is shown:

jdb -attach IP_ADDRESS:4001
java.io.IOException: handshake failed - connection prematurally closed
	at jdk.jdi/com.sun.tools.jdi.SocketTransportService.handshake(SocketTransportService.java:137)
	at jdk.jdi/com.sun.tools.jdi.SocketTransportService.attach(SocketTransportService.java:271)
	at jdk.jdi/com.sun.tools.jdi.GenericAttachingConnector.attach(GenericAttachingConnector.java:119)
	at jdk.jdi/com.sun.tools.jdi.SocketAttachingConnector.attach(SocketAttachingConnector.java:83)
	at jdk.jdi/com.sun.tools.example.debug.tty.VMConnection.attachTarget(VMConnection.java:557)
	at jdk.jdi/com.sun.tools.example.debug.tty.VMConnection.open(VMConnection.java:367)
	at jdk.jdi/com.sun.tools.example.debug.tty.Env.init(Env.java:63)
	at jdk.jdi/com.sun.tools.example.debug.tty.TTY.main(TTY.java:1113)

I'm wondering if you folks have some ideas about the potential issue here, I do appreciate any feedback on this. I found similar issue in async-profiler, but I'm not sure if related https://github.com/async-profiler/async-profiler/issues/769

Java Version

openjdk 17.0.13 2024-10-15 LTS
OpenJDK Runtime Environment Corretto-17.0.13.11.1 (build 17.0.13+11-LTS)
OpenJDK 64-Bit Server VM Corretto-17.0.13.11.1 (build 17.0.13+11-LTS, mixed mode, sharing)

Pyroscope agent version is v0.14.0

EKS node run on Amazon Linux 2 x86_64

Best regards, Pawel

luaps avatar Dec 06 '24 12:12 luaps

I agree it looks like syscall interruption issue in the JDK you've linked.

There fix from async-profiler should have been included in the recent pyroscope-java 0.15.x releases. Please give a try with https://github.com/grafana/pyroscope-java/releases/tag/v0.15.2 and let us know if it helps.

korniltsev avatar Dec 09 '24 02:12 korniltsev

Unfortunately it doesn't work with v0.15.2 either.

luaps avatar Dec 09 '24 09:12 luaps

any ideas here? should I ask on async-profiler?

luaps avatar Jan 15 '25 21:01 luaps

I would double check if the genuine async-profiler has the same issues before asking them

korniltsev avatar Jan 16 '25 04:01 korniltsev