dcache
dcache copied to clipboard
NPE on xrootd-tpc with proxy delegation not available
Hi,
I found some NullPointerException errors raised on dCache pools after upgrading dCache from 6.2 to 7.2.17. It concerns xrootd-tpc tests run by JUNO from CNAF or junoeos01.ihep.ac.cn endpoint when authentication failed (proxy delegation required but not available / Authentication to xfer-archive-03.cr.cnaf.infn.it:1094 failed; all protocols have been tried.)
Jun 14 14:06:29 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: 14 Jun 2022 14:06:29 (pool-juno-disk-li366a) [] Unable to complete gsi authentication to xfer-archive-03.cr.cnaf.infn.it:1094, channel 0a7fa5f2, stream 3, session B60C0000DAD600001B000000B90C0000: org.dcache.xrootd.core.XrootdException: proxy delegation required but not available..
Jun 14 14:06:29 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: 14 Jun 2022 14:06:29 (pool-juno-disk-li366a) [] Exception caught on channel 0a7fa5f2: org.dcache.xrootd.core.XrootdException: Authentication to xfer-archive-03.cr.cnaf.infn.it:1094 failed; all protocols have been tried..
Jun 14 14:06:29 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: 14 Jun 2022 14:06:29 (pool-juno-disk-li366a) [] Exception caught on channel 0a7fa5f2: org.dcache.xrootd.core.XrootdException: Invalid request; user not authenticated
Jun 14 14:06:29 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: .
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: 14 Jun 2022 14:06:30 (pool-juno-disk-li366a) [] Unable to complete gsi authentication to junoeos01.ihep.ac.cn:1094, channel 8e3f6691, stream 4, session C6C827003329000092010000F8805700: org.dcache.xrootd.core.XrootdException: proxy delegation required but not available..
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: 14 Jun 2022 14:06:30 (pool-juno-disk-li366a) [] Exception caught on channel 8e3f6691: null
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: 14 Jun 2022 14:06:30 (pool-juno-disk-li366a) [] Uncaught exception in thread xrootd-tpc-client-3
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: java.lang.NullPointerException: null
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at org.dcache.xrootd.plugins.authn.gsi.GSIClientAuthenticationHandler.doOnAuthenticationResponse(GSIClientAuthenticationHandler.java:96)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at org.dcache.xrootd.tpc.AbstractClientRequestHandler.responseReceived(AbstractClientRequestHandler.java:441)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at org.dcache.xrootd.tpc.AbstractClientRequestHandler.channelRead(AbstractClientRequestHandler.java:77)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at org.dcache.xrootd.tpc.AbstractClientRequestHandler.doOnAuthenticationResponse(AbstractClientRequestHandler.java:151)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at org.dcache.xrootd.tpc.AbstractClientRequestHandler.responseReceived(AbstractClientRequestHandler.java:441)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at org.dcache.xrootd.tpc.AbstractClientRequestHandler.channelRead(AbstractClientRequestHandler.java:77)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at org.dcache.util.CDCThreadFactory.lambda$newThread$0(CDCThreadFactory.java:42)
Jun 14 14:06:30 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at java.base/java.lang.Thread.run(Thread.java:834)
Jun 14 14:06:32 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: 14 Jun 2022 14:06:32 (pool-juno-disk-li366a) [door:xrootd-ccdcacli447@xrootd-ccdcacli447Domain:AAXhZzmZdmg xrootd-ccdcacli447 PoolAcceptFile 0000810D976C3E554F58BC03B02B7A413B39] Transfer failed in post-processing: File size mismatch (expected=52428800, actual=0)
Jun 14 14:06:32 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: 14 Jun 2022 14:06:32 (pool-juno-disk-li366a) [door:xrootd-ccdcacli447@xrootd-ccdcacli447Domain:AAXhZzmZdmg xrootd-ccdcacli447 PoolAcceptFile 0000810D976C3E554F58BC03B02B7A413B39] Failed to read file size: java.nio.file.NoSuchFileException: /data/pool-juno-disk-li366a/pool/data/0000810D976C3E554F58BC03B02B7A413B39
Jun 14 14:06:33 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: 14 Jun 2022 14:06:33 (pool-juno-disk-li366a) [door:xrootd-ccdcacli447@xrootd-ccdcacli447Domain:AAXhZzmZdmg xrootd-ccdcacli447 PoolAcceptFile 0000F438CD52F653459E9212E2ED07292F58] Transfer failed in post-processing: File size mismatch (expected=52428800, actual=0)
Jun 14 14:06:33 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: 14 Jun 2022 14:06:33 (pool-juno-disk-li366a) [door:xrootd-ccdcacli447@xrootd-ccdcacli447Domain:AAXhZzmZdmg xrootd-ccdcacli447 PoolAcceptFile 0000F438CD52F653459E9212E2ED07292F58] Failed to read file size: java.nio.file.NoSuchFileException: /data/pool-juno-disk-li366a/pool/data/0000F438CD52F653459E9212E2ED07292F58
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: 14 Jun 2022 14:06:37 (pool-juno-disk-li366a) [] Unable to complete gsi authentication to eos.jinr.ru:1094, channel 9493c659, stream 5, session EC016E00770503002A010000F8056E00: org.dcache.xrootd.core.XrootdException: proxy delegation required but not available..
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: 14 Jun 2022 14:06:37 (pool-juno-disk-li366a) [] Exception caught on channel 9493c659: null
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: 14 Jun 2022 14:06:37 (pool-juno-disk-li366a) [] Uncaught exception in thread xrootd-tpc-client-4
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: java.lang.NullPointerException: null
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at org.dcache.xrootd.plugins.authn.gsi.GSIClientAuthenticationHandler.doOnAuthenticationResponse(GSIClientAuthenticationHandler.java:96)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at org.dcache.xrootd.tpc.AbstractClientRequestHandler.responseReceived(AbstractClientRequestHandler.java:441)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at org.dcache.xrootd.tpc.AbstractClientRequestHandler.channelRead(AbstractClientRequestHandler.java:77)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at org.dcache.xrootd.tpc.AbstractClientRequestHandler.doOnAuthenticationResponse(AbstractClientRequestHandler.java:151)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at org.dcache.xrootd.tpc.AbstractClientRequestHandler.responseReceived(AbstractClientRequestHandler.java:441)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at org.dcache.xrootd.tpc.AbstractClientRequestHandler.channelRead(AbstractClientRequestHandler.java:77)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at org.dcache.util.CDCThreadFactory.lambda$newThread$0(CDCThreadFactory.java:42)
Jun 14 14:06:37 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: at java.base/java.lang.Thread.run(Thread.java:834)
Jun 14 14:06:40 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: 14 Jun 2022 14:06:40 (pool-juno-disk-li366a) [door:xrootd-ccdcacli447@xrootd-ccdcacli447Domain:AAXhZzmZdmg xrootd-ccdcacli447 PoolAcceptFile 0000FB17008617F840499781B3D5D5F968FD] Transfer failed in post-processing: File size mismatch (expected=52428800, actual=0)
Jun 14 14:06:40 ccdcacli366 dcache@ccdcacli366-pool-juno-disk-li366a-Domain[89958]: 14 Jun 2022 14:06:40 (pool-juno-disk-li366a) [door:xrootd-ccdcacli447@xrootd-ccdcacli447Domain:AAXhZzmZdmg xrootd-ccdcacli447 PoolAcceptFile 0000FB17008617F840499781B3D5D5F968FD] Failed to read file size: java.nio.file.NoSuchFileException: /data/pool-juno-disk-li366a/pool/data/0000FB17008617F840499781B3D5D5F968FD
In Xrootd door log :
level=INFO ts=2022-06-14T14:06:31.711+0200 event=org.dcache.xrootd.request session=door:xrootd-ccdcacli447@xrootd-ccdcacli447Domain:AAXhZzw3kUg request=protocol response=ok
level=INFO ts=2022-06-14T14:06:31.712+0200 event=org.dcache.xrootd.request session=door:xrootd-ccdcacli447@xrootd-ccdcacli447Domain:AAXhZzw3kUg request=login username=root capver=4 pid=30063 response=ok sessionId=699575719A251E346A7A3B16264F8323 sec=&P=gsi,v:10400,c:ssl,ca:14e86c33
level=ERROR ts=2022-06-14T14:06:31.714+0200 event=org.dcache.xrootd.request session=door:xrootd-ccdcacli447@xrootd-ccdcacli447Domain:AAXhZzw3kUg request=endsess sessionId=699575719A251E346A7A3B16264F8323 response=error error.code=NotAuhorized error.msg="Authentication required"
level=INFO ts=2022-06-14T14:06:31.715+0200 event=org.dcache.xrootd.connection.end session=door:xrootd-ccdcacli447@xrootd-ccdcacli447Domain:AAXhZzw3kUg
Cheers, Adrien
Thanks for reporting this Adrien. Evidently there is a path when the request handler could be null (as in this failed authentication) which is not accounted for. Will look into fixing.
Hi Adrien
I wonder if you could provide me a little more information about how this test is run (the xrdcp or gfal command-line).
I am having difficulty trying to reproduce the NullPointerException. For instance, when I do:
[arossi@fndcatemp1 ~]$ xrdcp5x --tpc only xroot://fndcatemp2.fnal.gov:1095//pnfs/fs/usr/fermilab/users/arossi/volatile/cert-test xroot://fndcatemp2.fnal.gov:1094//pnfs/fs/usr/fermilab/users/arossi/volatile/cert-test-2
[0B/0B][100%][==================================================][0B/s]
Run: [ERROR] Server responded with an error: [3010] Authentication to fndcatemp2.fnal.gov:1095 failed; all protocols have been tried.
(that is, forgetting to delegate the proxy), I get
16 Jun 2022 11:44:12 [dcatest04-7-15] [door:xrootd-1094-fndcatemp2@xrootd-1094-fndcatemp2Domain:AAXhk1jkncg xrootd-1094-fndcatemp2 PoolAcceptFile 00001EC5A4A6B83243A3B4A459F59F568F79] Unable to parse the boolean system property 'java.net.preferIPv6Addresses':system - using the default value: false
16 Jun 2022 11:44:12 [xrootd-net-0] [] Third-party client started for fndcatemp2.fnal.gov:1095, channel 2731367a. stream 1.
16 Jun 2022 11:44:12 [xrootd-tpc-client-0] [] Unable to complete gsi authentication to fndcatemp2.fnal.gov:1095, channel 2731367a, stream 1, session 9E1A1D0E0F438944E9FFF118A1632061: org.dcache.xrootd.core.XrootdException: proxy delegation required but not available..
16 Jun 2022 11:44:12 [xrootd-tpc-client-0] [] Exception caught on channel 2731367a: org.dcache.xrootd.core.XrootdException: Authentication to fndcatemp2.fnal.gov:1095 failed; all protocols have been tried..
16 Jun 2022 11:44:12 [xrootd-tpc-client-0] [] Xrootd-Error-Response: [session ?][connection [id: 0xc9443eb2, L:/131.225.13.6:33143 - R:/131.225.240.89:35930]][request 3016 kXR_sync](error 3010, kXR_NotAuthorized, Authentication to fndcatemp2.fnal.gov:1095 failed; all protocols have been tried.).
16 Jun 2022 11:44:12 [xrootd-tpc-client-0] [] shutDown, doing endsession
16 Jun 2022 11:44:12 [xrootd-tpc-client-0] [] Exception caught on channel 2731367a: org.dcache.xrootd.core.XrootdException: Authentication required.
16 Jun 2022 11:44:12 [xrootd-tpc-client-0] [] Xrootd-Error-Response: [session ?][connection [id: 0xc9443eb2, L:/131.225.13.6:33143 - R:/131.225.240.89:35930]][request 3016 kXR_sync](error 3010, kXR_NotAuthorized, Authentication required).
16 Jun 2022 11:44:12 [xrootd-tpc-client-0] [] shutDown, doing endsession
16 Jun 2022 11:44:12 [xrootd-tpc-client-0] [] Third-party client stopped, for fndcatemp2.fnal.gov:1095, channel 2731367a, stream 1.
16 Jun 2022 11:44:15 [xrootd-net-0] [] Xrootd-Error-Response: [session ?][connection [id: 0xc9443eb2, L:/131.225.13.6:33143 - R:/131.225.240.89:35930]][request 3017 kXR_stat](error 3010, kXR_NotAuthorized, Authentication required).
on the pool, without any exception.
Thanks! Al
Hi Albert,
Those are daily TPC tests run by JUNO. I will ask them if they can provide more information and logs from the client side.
Adrien
Hi Albert,
This is the answer I got :
I am now using gfal-python api to do the auto-testing, it has same functionality with command,
gfal-copy -v root://ccxrootdegee.in2p3.fr:1094//pnfs/in2p3.fr/data/juno/dirac/juno/user/z/zhangxt/cronTest/IN2P3-DCACHE-cronBackup.txt root://junoeos01.ihep.ac.cn:1094//eos/juno/user/zhangxt/cronTest/IN2P3-DCACHE-to-IHEP-JUNOEOS-cronCopy.txt-<timestamp>
and,
gfal-copy -v root://ccxrootdegee.in2p3.fr:1094//pnfs/in2p3.fr/data/juno/dirac/juno/user/z/zhangxt/cronTest/IN2P3-DCACHE-cronBackup.txt root://eos.jinr.ru:1094//eos/juno/dirac/juno/user/z/zhangxt/cronTest/IN2P3-DCACHE-to-JINR-EOS-cronCopy.txt-<timestamp>
Gfal will automatically try to transfer by the order of "TPC mode -> local copy mode". The Gfal2 version must be more than v2.18.0 and here is my Gfal2 version in auto-tests:
$ gfal-copy --version
gfal2-util version 1.7.1 (gfal2 2.20.5)
dcap-2.20.5
file-2.20.5
gridftp-2.20.5
http-2.20.5
lfc-2.20.5
rfio-2.20.5
sftp-2.20.5
srm-2.20.5
xrootd-2.20.5
The error on client side :
[3]Error on XrdCl::CopyProcess::Run(): [ERROR] Server responded with an error: [3006] Invalid request; user not authenticated\x00
[6]Error on XrdCl::CopyProcess::Run(): [ERROR] Server responded with an error: [3012]
[10]Error on XrdCl::CopyProcess::Run(): [ERROR] Server responded with an error: [3010] Authentication required\x00
[14]Error on XrdCl::CopyProcess::Run(): [ERROR] Server responded with an error: [3012]
I asked him if he can provide any debug log.
Cheers, Adrien
Hello again. Sorry for the lag here, but I was off all last week.
Yes, debug log would be useful. But what I am unsure of is how this test is actually supposed to behave. My first impression was that this particular test was supposed to fail because no proxy was delegated, and that in addition to the failure you were seeing a Null Pointer Exception (which is a bug).
But now I am not sure. Is this test supposed to succeed? In which case there is something systemic which is wrong.
From 6.2 to 7.2.17 -- would you be able to tell us what was the last version of 6.2 used? I take it that you did not experience this NPE with 6.2, so it would be useful to know the intervening commits that may have affected it.
Thanks again, Al
Hi Adrien,
Rereading the explanantion concerning gfal: "local copy mode" means effectively no TPC? That is: the client first copies the source locally then writes the local file to destination (two hop)? In other words, the equivalent of:
xrdcp --tpc first root://ccxrootdegee.in2p3.fr:1094//pnfs/in2p3.fr/data/juno/dirac/juno/user/z/zhangxt/cronTest/IN2P3-DCACHE-cronBackup.txt root://junoeos01.ihep.ac.cn:1094//eos/juno/user/zhangxt/cronTest/IN2P3-DCACHE-to-IHEP-JUNOEOS-cronCopy.txt-<timestamp>
etc.?
OK, yes, this seems to be the issue.
Here is what I see using xrdcp. If you do:
[arossi@fndcatemp1 ~]$ xrdcp xroot://fndcatemp2.fnal.gov:1095//pnfs/fs/usr/fermilab/users/arossi/volatile/cert-test xroot://fndcatemp2.fnal.gov:1094//pnfs/fs/usr/fermilab/users/arossi/volatile/cert-test-`suffix`
[1B/1B][100%][==================================================][1B/s]
you get a two-party two-hop copy as I described above.
However, issuing this with the command:
[arossi@fndcatemp1 ~]$ xrdcp --tpc first xroot://fndcatemp2.fnal.gov:1095//pnfs/fs/usr/fermilab/users/arossi/volatile/cert-test xroot://fndcatemp2.fnal.gov:1094//pnfs/fs/usr/fermilab/users/arossi/volatile/cert-test-4
[0B/0B][100%][==================================================][0B/s]
Run: [ERROR] Server responded with an error: [3010] Authentication to fndcatemp2.fnal.gov:1095 failed; all protocols have been tried.
produces the failure. On the pool:
27 Jun 2022 08:10:09 [xrootd-net-2] [] Third-party client started for fndcatemp2.fnal.gov:1095, channel de54a742. stream 2.
27 Jun 2022 08:10:09 [xrootd-tpc-client-1] [] Unable to complete gsi authentication to fndcatemp2.fnal.gov:1095, channel de54a742, stream 2, session AC7625DBE88AF2D2E7FBE668ADF07647: org.dcache.xrootd.core.XrootdException: proxy delegation required but not available..
27 Jun 2022 08:10:09 [xrootd-tpc-client-1] [] Exception caught on channel de54a742: org.dcache.xrootd.core.XrootdException: Authentication to fndcatemp2.fnal.gov:1095 failed; all protocols have been tried..
27 Jun 2022 08:10:09 [xrootd-tpc-client-1] [] Xrootd-Error-Response: [session ?][connection [id: 0x5a0a7bcc, L:/131.225.13.5:33125 - R:/131.225.240.89:45886]][request 3016 kXR_sync](error 3010, kXR_NotAuthorized, Authentication to fndcatemp2.fnal.gov:1095 failed; all protocols have been tried.).
27 Jun 2022 08:10:09 [xrootd-tpc-client-1] [] shutDown, doing endsession
27 Jun 2022 08:10:09 [xrootd-tpc-client-1] [] Exception caught on channel de54a742: org.dcache.xrootd.core.XrootdException: Authentication required.
27 Jun 2022 08:10:09 [xrootd-tpc-client-1] [] Xrootd-Error-Response: [session ?][connection [id: 0x5a0a7bcc, L:/131.225.13.5:33125 - R:/131.225.240.89:45886]][request 3016 kXR_sync](error 3010, kXR_NotAuthorized, Authentication required).
27 Jun 2022 08:10:09 [xrootd-tpc-client-1] [] shutDown, doing endsession
27 Jun 2022 08:10:09 [xrootd-tpc-client-1] [] Third-party client stopped, for fndcatemp2.fnal.gov:1095, channel de54a742, stream 2.
27 Jun 2022 08:10:11 [xrootd-net-2] [] Xrootd-Error-Response: [session ?][connection [id: 0x5a0a7bcc, L:/131.225.13.5:33125 - R:/131.225.240.89:45886]][request 3017 kXR_stat](error 3010, kXR_NotAuthorized, Authentication required).
27 Jun 2022 08:10:11 [post-transfer-1] [door:xrootd-1094-fndcatemp2@xrootd-1094-fndcatemp2Domain:AAXibaOse8A xrootd-1094-fndcatemp2 PoolAcceptFile 0000C7E4B8BC229543A9AA8EE01E5154161C] Unable to generate checksum of sparse file: java.nio.channels.ClosedChannelException
27 Jun 2022 08:10:11 [post-transfer-1] [door:xrootd-1094-fndcatemp2@xrootd-1094-fndcatemp2Domain:AAXibaOse8A xrootd-1094-fndcatemp2 PoolAcceptFile 0000C7E4B8BC229543A9AA8EE01E5154161C] remove entry 0000C7E4B8BC229543A9AA8EE01E5154161C: Transfer failed and replica is empty
That is, the third-party client has no proxy to use. Notice, however: no Null Pointer Exception here, so there is still something slightly different about this wrt to what your test does.
Apart from my inability to reproduce the NPE, what I do gather is that you have not used the gfal version of --tpc first in the past.
The behavior of the client (either gfal or xrdcp) for this "fallback" is different from simply issuing a two-party two-hop:
- 2-party local copy –– this will just work if you have a credential (each endpoint authenticates OK);
- --tcp first –– checks to see if both ends support tpc. If they do, it will try tpc; otherwise, it will fall back to local copy. But it will not try local copy if tpc fails. In other words, fallback only occurs if tpc cannot be attempted based on the server endpoints.
Now, because both ends DO support tpc, tpc is attempted. But because your endpoints require GSI, this means you must delegate. If you don't, it will fail.
There are two possible solutions to this:
- client: don't use fallback to local copy, but use delegation (--tpc delegate only)
- servers: allow them to accept anonymous as well as GSI connections.
Unfortunately, the second option is only available in dCache with version 8.1 forward.
see slide 15 of https://indico.desy.de/event/33984/contributions/122056/attachments/74984/96139/project-update-v1.pdf
Hello Albert,
The previous version of dCache was 6.2.43 and we upgraded to 7.2.17. The NPE error was not raised before the upgrade so yes I guess a commit between those 2 versions may be responsible of this.
In fact, I'm not looking for a solution to make it works because JUNO already decided to use HTTP protocol instead of xrootd protocol for TPC transfers. They had many difficulties to make XrootD TPC works with their EOS instances. But they still got some xrootd tests running on their side to test their different SE. So is this test supposed to succeed? I don't think so :) I just wanted to report this NPE error I observed on pools when this kind of test happened. It's not easy to debug on my side because I don't have any access to the other JUNO endpoint nor the JUNO test instance.
Hi Adrien,
Thanks for the version number. It could indeed be due to one of the xroot patches applied between 6.2.43 and 6.2.46/7.2.17, but unless I can reproduce the NPE, I can't really say for sure what is going on.
On the other hand, it seems ... at least from the message you posted from JUNO ... that they switched to using gfal-python ("I am now using gfal-python api"), so that is another change that might affect the result.
I will keep my eye out for that NPE in our logs and try to do a little more head-scratching.
Thanks again, Al