dcache icon indicating copy to clipboard operation
dcache copied to clipboard

Uncaught exception in thread RemoteTransferManager-0java.lang.NullPointerException: null

Open cfgamboa opened this issue 2 years ago • 10 comments

Hello,

It was brought to my attention the following error reported by the RTM cell.

The dCache version dcache-7.2.19-1.noarch.

While it is not clear to me if the error can be reproducible I created this issue to start tracing this type of error.

[dcblcore03] (RemoteTransferManager@srm-dcblsrm02Domain) admin > show pinboard 1000
10 Aug 2022 16:14:57 [RemoteTransferManager-0] [] Cell message monitoring set to false
10 Aug 2022 16:14:57 [RemoteTransferManager-0] [] Cell classification set to null
10 Aug 2022 16:14:57 [RemoteTransferManager-0] [] HikariPool-3 - Starting...
10 Aug 2022 16:14:57 [RemoteTransferManager-0] [] HikariPool-3 - Start completed.
10 Aug 2022 16:14:57 [RemoteTransferManager-0] [] Property datanucleus.validateTables unknown - will be ignored
10 Aug 2022 16:14:57 [RemoteTransferManager-0] [] Property datanucleus.autoCreateSchema unknown - will be ignored
10 Aug 2022 16:14:57 [RemoteTransferManager-0] [] Property datanucleus.autoCreateColumns unknown - will be ignored
10 Aug 2022 16:14:57 [RemoteTransferManager-0] [] Property datanucleus.validateConstraints unknown - will be ignored
12 Aug 2022 01:12:50 [RemoteTransferManager-0] [door:WebDAV2-dcbldoor04-2@webdav2-dcbldoor04_httpsDomain:AAXmBFVpwQg WebDAV2-dcbldoor04-2 TransferStatusQuery] Uncaught exception in thread RemoteTransferManager-0java.lang.NullPointerException: null
    at diskCacheV111.services.TransferManagerHandler.dnGetstate(TransferManagerHandler.java)
    at diskCacheV111.services.TransferManagerHandler.appendInfo(TransferManagerHandler.java:831)
    at diskCacheV111.services.TransferManager.messageArrived(TransferManager.java:247)
    at jdk.internal.reflect.GeneratedMethodAccessor580.invoke(Unknown Source)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.dcache.cells.CellMessageDispatcher$LongReceiver.deliver(CellMessageDispatcher.java:286)
    at org.dcache.cells.CellMessageDispatcher.call(CellMessageDispatcher.java:188)
    at org.dcache.cells.AbstractCell.messageArrived(AbstractCell.java:302)
    at dmg.cells.nucleus.CellAdapter.messageArrived(CellAdapter.java:848)
    at dmg.cells.nucleus.CellNucleus$DeliverMessageTask.run(CellNucleus.java:1274)
    at org.dcache.util.BoundedExecutor$Worker.run(BoundedExecutor.java:247)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at dmg.cells.nucleus.CellNucleus.lambda$wrapLoggingContext$2(CellNucleus.java:727)
    at java.base/java.lang.Thread.run(Thread.java:829)

cfgamboa avatar Sep 06 '22 18:09 cfgamboa

There is also another occurrence

31 Aug 2022 16:16:50 [RemoteTransferManager-10] [door:WebDAV2-dcbldoor03-2@webdav2-dcbldoor03_httpsDomain:AAXnjy1bg2g WebDAV2-dcbldoor03-2 TransferStatusQuery] Uncaught exception in thread RemoteTransferManager-10java.lang.NullPointerException: null
	at diskCacheV111.services.TransferManagerHandler.dnGetstate(TransferManagerHandler.java)
	at diskCacheV111.services.TransferManagerHandler.appendInfo(TransferManagerHandler.java:831)
	at diskCacheV111.services.TransferManager.messageArrived(TransferManager.java:247)
	at jdk.internal.reflect.GeneratedMethodAccessor580.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.dcache.cells.CellMessageDispatcher$LongReceiver.deliver(CellMessageDispatcher.java:286)
	at org.dcache.cells.CellMessageDispatcher.call(CellMessageDispatcher.java:188)
	at org.dcache.cells.AbstractCell.messageArrived(AbstractCell.java:302)
	at dmg.cells.nucleus.CellAdapter.messageArrived(CellAdapter.java:848)
	at dmg.cells.nucleus.CellNucleus$DeliverMessageTask.run(CellNucleus.java:1274)
	at org.dcache.util.BoundedExecutor$Worker.run(BoundedExecutor.java:247)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at dmg.cells.nucleus.CellNucleus.lambda$wrapLoggingContext$2(CellNucleus.java:727)
	at java.base/java.lang.Thread.run(Thread.java:829)

cfgamboa avatar Sep 07 '22 13:09 cfgamboa

Thanks for reporting this @cfgamboa ,

Could you check whether you have transfermanagers.enable.db set to true ?

paulmillar avatar Sep 07 '22 13:09 paulmillar

@paulmillar here

transfermanagers.enable.db=true

cfgamboa avatar Sep 07 '22 13:09 cfgamboa

Thanks.

My suspicion is that the problem comes from the persistency framework (DataNucleus) rather than from dCache code directly: either because of a bug in DataNucleus or we're somehow using it wrong within the dCache code.

Would it be possible to switch off the database (transfermanagers.enable.db = false) on one of the RTMs and see if the messages then going away?

paulmillar avatar Sep 08 '22 07:09 paulmillar

Thank you for your feedback. I think changing this parameter would require to restart the RTM?

cfgamboa avatar Sep 08 '22 14:09 cfgamboa

Carlos,

what are the effects of it? Are transfers failing?

(and yes change of any parameters on layout or dcache.conf files require restart of the domain affected)

DmitryLitvintsev avatar Sep 08 '22 17:09 DmitryLitvintsev

@paulmillar yes, the null pointer is in bytecode enhanced code

  public static int dnGetstate(TransferManagerHandler objPC) {
    if (objPC.dnFlags > 0 && objPC.dnStateManager != null && !objPC.dnStateManager.isLoaded((Persistable)objPC, 9))
      return objPC.dnStateManager.getIntField((Persistable)objPC, 9, objPC.state); 
    if (objPC.dnIsDetached() && !((BitSet)objPC.dnDetachedState[2]).get(9))
      throw new JDODetachedFieldAccessException("You have just attempted to access field \"state\" yet this field was not detached when you detached the object. Either dont access this field, or detach it when detaching the object."); 
    return objPC.state;
  }

It will take me some time to figure out what is wrong. @cfgamboa do you know anyway to reproduce the problem?

I do not see this exception on our dCache installations at Fermilab. We use :

transfermanagers.enable.db=true

DmitryLitvintsev avatar Sep 08 '22 17:09 DmitryLitvintsev

@DmitryLitvintsev the error has not recurred since then. There are not failed transfer observations correlated to that error.

cfgamboa avatar Sep 08 '22 19:09 cfgamboa

@DmitryLitvintsev The code-path corresponds to the monitoring that the WebDAV door does to discover how far a transfer has progressed, in order to send FTS progress information.

My guess is that the error is propagated back to the door (in some form or other). If the door cannot find information about a transfer (from the RTM) then it returns only a subset of information back to FTS.

IIRC, FTS is somewhat forgiving with a transfer providing limited information, so the chances are that FTS will not kill the transfer it will proceed OK.

Further, I suspect that this is caused by a race condition. Perhaps between the first query (from the first progress marker) and the pool sending back confirmation that the pool has received the transfer-initialising message.

paulmillar avatar Sep 09 '22 07:09 paulmillar

FYI, this discussion is now moved to dCache rt support system

cfgamboa avatar Sep 22 '22 19:09 cfgamboa

older duplicate, now tracked on RT

lemora avatar Oct 17 '22 07:10 lemora