dcache
dcache copied to clipboard
Race condition in WebDAV relayed upload
Observed the following stack-trace:
15 Dec 2020 12:03:23 (WebDAV-uct2-webdav) [door:WebDAV-uct2-webdav@uct2-webdavDomain:AAW2hIM/jgA] Internal server error
java.lang.RuntimeException: createNew method on: class org.dcache.webdav.DcacheDirectoryResource returned a null resource. Must return a reference to the newly created or modified resource
at io.milton.http.http11.PutHandler.processCreate(PutHandler.java:237)
at io.milton.http.http11.PutHandler.process(PutHandler.java:207)
at org.dcache.webdav.DcacheStandardFilter.process(DcacheStandardFilter.java:54)
at io.milton.http.FilterChain.process(FilterChain.java:46)
at org.dcache.webdav.transfer.CopyFilter.process(CopyFilter.java:301)
at io.milton.http.FilterChain.process(FilterChain.java:46)
at io.milton.http.HttpManager.process(HttpManager.java:158)
at org.dcache.webdav.MiltonHandler.handle(MiltonHandler.java:80)
at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:61)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.dcache.http.AuthenticationHandler.access$101(AuthenticationHandler.java:63)
at org.dcache.http.AuthenticationHandler.lambda$handle$0(AuthenticationHandler.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at org.dcache.http.AuthenticationHandler.handle(AuthenticationHandler.java:160)
at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:61)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.dcache.http.AbstractLoggingHandler.handle(AbstractLoggingHandler.java:65)
at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:61)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:505)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:427)
at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:321)
at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:159)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804)
at java.lang.Thread.run(Thread.java:748)
The problem is that DcacheResourceFactory#createFile uses DcacheResourceFactory#getResource to build the DcacheResource object that createFile returns to the caller (and ultimately, to Milton).
Although getResource is only called after a successful upload (relayed by the door), this method makes an independent query to PnfsManager to discover the necessary information about the file. There are failure modes in getResource where it returns null instead of a valid DcacheResource object.
The most likely explanation is that, during the upload, the namespace entry was deleted. If this happens then the PnfsHandler#getFileAttributes call (inside getResource) will throw FileNotFoundCacheException, which results in a null response.
Note that the transfer will have already queried PnfsManager, so the door should have enough information to build the DcacheResource object without querying PnfsManager again. (If not, then the WriteTransfer may be updated to acquire the missing information.)
In any case, I believe the correct solution here would be to ensure that a successful transfer always returns a non-null DcacheResource. This may be achieved by avoiding an unnecessary query to PnfsManager.