dcache
dcache copied to clipboard
Uncaught exception in thread RemoteTransferManager-0java.lang.NullPointerException: null
Hello,
It was brought to my attention the following error reported by the RTM cell.
The dCache version dcache-7.2.19-1.noarch.
While it is not clear to me if the error can be reproducible I created this issue to start tracing this type of error.
[dcblcore03] (RemoteTransferManager@srm-dcblsrm02Domain) admin > show pinboard 1000
10 Aug 2022 16:14:57 [RemoteTransferManager-0] [] Cell message monitoring set to false
10 Aug 2022 16:14:57 [RemoteTransferManager-0] [] Cell classification set to null
10 Aug 2022 16:14:57 [RemoteTransferManager-0] [] HikariPool-3 - Starting...
10 Aug 2022 16:14:57 [RemoteTransferManager-0] [] HikariPool-3 - Start completed.
10 Aug 2022 16:14:57 [RemoteTransferManager-0] [] Property datanucleus.validateTables unknown - will be ignored
10 Aug 2022 16:14:57 [RemoteTransferManager-0] [] Property datanucleus.autoCreateSchema unknown - will be ignored
10 Aug 2022 16:14:57 [RemoteTransferManager-0] [] Property datanucleus.autoCreateColumns unknown - will be ignored
10 Aug 2022 16:14:57 [RemoteTransferManager-0] [] Property datanucleus.validateConstraints unknown - will be ignored
12 Aug 2022 01:12:50 [RemoteTransferManager-0] [door:WebDAV2-dcbldoor04-2@webdav2-dcbldoor04_httpsDomain:AAXmBFVpwQg WebDAV2-dcbldoor04-2 TransferStatusQuery] Uncaught exception in thread RemoteTransferManager-0java.lang.NullPointerException: null
at diskCacheV111.services.TransferManagerHandler.dnGetstate(TransferManagerHandler.java)
at diskCacheV111.services.TransferManagerHandler.appendInfo(TransferManagerHandler.java:831)
at diskCacheV111.services.TransferManager.messageArrived(TransferManager.java:247)
at jdk.internal.reflect.GeneratedMethodAccessor580.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.dcache.cells.CellMessageDispatcher$LongReceiver.deliver(CellMessageDispatcher.java:286)
at org.dcache.cells.CellMessageDispatcher.call(CellMessageDispatcher.java:188)
at org.dcache.cells.AbstractCell.messageArrived(AbstractCell.java:302)
at dmg.cells.nucleus.CellAdapter.messageArrived(CellAdapter.java:848)
at dmg.cells.nucleus.CellNucleus$DeliverMessageTask.run(CellNucleus.java:1274)
at org.dcache.util.BoundedExecutor$Worker.run(BoundedExecutor.java:247)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at dmg.cells.nucleus.CellNucleus.lambda$wrapLoggingContext$2(CellNucleus.java:727)
at java.base/java.lang.Thread.run(Thread.java:829)
There is also another occurrence
31 Aug 2022 16:16:50 [RemoteTransferManager-10] [door:WebDAV2-dcbldoor03-2@webdav2-dcbldoor03_httpsDomain:AAXnjy1bg2g WebDAV2-dcbldoor03-2 TransferStatusQuery] Uncaught exception in thread RemoteTransferManager-10java.lang.NullPointerException: null
at diskCacheV111.services.TransferManagerHandler.dnGetstate(TransferManagerHandler.java)
at diskCacheV111.services.TransferManagerHandler.appendInfo(TransferManagerHandler.java:831)
at diskCacheV111.services.TransferManager.messageArrived(TransferManager.java:247)
at jdk.internal.reflect.GeneratedMethodAccessor580.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.dcache.cells.CellMessageDispatcher$LongReceiver.deliver(CellMessageDispatcher.java:286)
at org.dcache.cells.CellMessageDispatcher.call(CellMessageDispatcher.java:188)
at org.dcache.cells.AbstractCell.messageArrived(AbstractCell.java:302)
at dmg.cells.nucleus.CellAdapter.messageArrived(CellAdapter.java:848)
at dmg.cells.nucleus.CellNucleus$DeliverMessageTask.run(CellNucleus.java:1274)
at org.dcache.util.BoundedExecutor$Worker.run(BoundedExecutor.java:247)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at dmg.cells.nucleus.CellNucleus.lambda$wrapLoggingContext$2(CellNucleus.java:727)
at java.base/java.lang.Thread.run(Thread.java:829)
Thanks for reporting this @cfgamboa ,
Could you check whether you have transfermanagers.enable.db
set to true
?
@paulmillar here
transfermanagers.enable.db=true
Thanks.
My suspicion is that the problem comes from the persistency framework (DataNucleus) rather than from dCache code directly: either because of a bug in DataNucleus or we're somehow using it wrong within the dCache code.
Would it be possible to switch off the database (transfermanagers.enable.db = false
) on one of the RTMs and see if the messages then going away?
Thank you for your feedback. I think changing this parameter would require to restart the RTM?
Carlos,
what are the effects of it? Are transfers failing?
(and yes change of any parameters on layout or dcache.conf files require restart of the domain affected)
@paulmillar yes, the null pointer is in bytecode enhanced code
public static int dnGetstate(TransferManagerHandler objPC) {
if (objPC.dnFlags > 0 && objPC.dnStateManager != null && !objPC.dnStateManager.isLoaded((Persistable)objPC, 9))
return objPC.dnStateManager.getIntField((Persistable)objPC, 9, objPC.state);
if (objPC.dnIsDetached() && !((BitSet)objPC.dnDetachedState[2]).get(9))
throw new JDODetachedFieldAccessException("You have just attempted to access field \"state\" yet this field was not detached when you detached the object. Either dont access this field, or detach it when detaching the object.");
return objPC.state;
}
It will take me some time to figure out what is wrong. @cfgamboa do you know anyway to reproduce the problem?
I do not see this exception on our dCache installations at Fermilab. We use :
transfermanagers.enable.db=true
@DmitryLitvintsev the error has not recurred since then. There are not failed transfer observations correlated to that error.
@DmitryLitvintsev The code-path corresponds to the monitoring that the WebDAV door does to discover how far a transfer has progressed, in order to send FTS progress information.
My guess is that the error is propagated back to the door (in some form or other). If the door cannot find information about a transfer (from the RTM) then it returns only a subset of information back to FTS.
IIRC, FTS is somewhat forgiving with a transfer providing limited information, so the chances are that FTS will not kill the transfer it will proceed OK.
Further, I suspect that this is caused by a race condition. Perhaps between the first query (from the first progress marker) and the pool sending back confirmation that the pool has received the transfer-initialising message.
FYI, this discussion is now moved to dCache rt support system
older duplicate, now tracked on RT