dcache icon indicating copy to clipboard operation
dcache copied to clipboard

7.2 (and trunk) - NPE on removal via WebDAV and token

Open DmitryLitvintsev opened this issue 2 years ago • 6 comments

when running this command:

curl -f -k -L -H "Authorization: Bearer ${TOKEN}" -X DELETE  https://fndcaitb4.fnal.gov:2880${spath}/junk.data

I get this error

curl: (22) NSS: client certificate not found (nickname not specified)

The file actually gets removed and I see a NPE in the door pinboard:

16 Sep 2022 15:27:42 [jetty-59] [door:WebDAV-fndcaitb4-1@webdavDomain:AAXo0TGqBGA] DELETE :: fndcaitb4.fnal.gov:2880///mu2e/scratch/users/litvinse/junk.data start
16 Sep 2022 15:27:42 [jetty-59] [door:WebDAV-fndcaitb4-1@webdavDomain:AAXo0TGqBGA] Internal server errorjava.lang.NullPointerException: null
	at org.dcache.notification.DoorRequestMessageSerializer.serialize(DoorRequestMessageSerializer.java:77)
	at org.dcache.notification.DoorRequestMessageSerializer.serialize(DoorRequestMessageSerializer.java:18)
	at org.apache.kafka.common.serialization.Serializer.serialize(Serializer.java:62)
	at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:945)
	at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:905)
	at org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer.send(DefaultKafkaProducerFactory.java:993)
	at org.springframework.kafka.core.KafkaTemplate.doSend(KafkaTemplate.java:649)
	at org.springframework.kafka.core.KafkaTemplate.send(KafkaTemplate.java:403)
	at org.springframework.kafka.core.KafkaTemplate.sendDefault(KafkaTemplate.java:382)
	at org.dcache.webdav.DcacheResourceFactory.sendRemoveInfoToBilling(DcacheResourceFactory.java:1098)
	at org.dcache.webdav.DcacheResourceFactory.deleteFile(DcacheResourceFactory.java:1083)
	at org.dcache.webdav.DcacheFileResource.delete(DcacheFileResource.java:190)
	at io.milton.http.DeleteHelperImpl.delete(DeleteHelperImpl.java:126)
	at io.milton.http.http11.DeleteHandler.processExistingResource(DeleteHandler.java:98)
	at io.milton.http.ResourceHandlerHelper.processResource(ResourceHandlerHelper.java:196)
	at io.milton.http.ResourceHandlerHelper.processResource(ResourceHandlerHelper.java:131)
	at io.milton.http.http11.DeleteHandler.processResource(DeleteHandler.java:83)
	at io.milton.http.ResourceHandlerHelper.process(ResourceHandlerHelper.java:127)
	at org.dcache.webdav.DcacheResourceHandlerHelper.process(DcacheResourceHandlerHelper.java:42)
	at io.milton.http.http11.DeleteHandler.process(DeleteHandler.java:72)
	at org.dcache.webdav.DcacheStandardFilter.process(DcacheStandardFilter.java:50)
	at io.milton.http.FilterChain.process(FilterChain.java:46)
	at org.dcache.webdav.transfer.CopyFilter.process(CopyFilter.java:276)
	at io.milton.http.FilterChain.process(FilterChain.java:46)
	at io.milton.http.HttpManager.process(HttpManager.java:158)
	at org.dcache.webdav.MiltonHandler.handle(MiltonHandler.java:77)
	at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
	at org.dcache.http.AuthenticationHandler.access$001(AuthenticationHandler.java:55)
	at org.dcache.http.AuthenticationHandler.lambda$handle$0(AuthenticationHandler.java:156)
	at java.base/java.security.AccessController.doPrivileged(Native Method)
	at java.base/javax.security.auth.Subject.doAs(Subject.java:361)
	at org.dcache.http.AuthenticationHandler.handle(AuthenticationHandler.java:153)
	at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
	at org.dcache.http.AbstractLoggingHandler.handle(AbstractLoggingHandler.java:110)
	at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
	at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
	at org.eclipse.jetty.server.Server.handle(Server.java:516)
	at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:400)
	at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:645)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:392)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
	at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:555)
	at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:410)
	at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:164)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
	at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
	at java.base/java.lang.Thread.run(Thread.java:829)

16 Sep 2022 15:27:42 [jetty-59] [door:WebDAV-fndcaitb4-1@webdavDomain:AAXo0TGqBGA] DELETE :: fndcaitb4.fnal.gov:2880///mu2e/scratch/users/litvinse/junk.data finished 36ms, Status:HTTP/1.1 500, Length:23

More details:

  1. get token:
[litvinse@fnisd1 scitokens]$ htgettoken -a fermicloud543.fnal.gov -i mu2e
Attempting to get token from https://fermicloud543.fnal.gov:8200 ... failed
Attempting kerberos auth with https://fermicloud543.fnal.gov:8200 ... succeeded
Storing vault token in /tmp/vt_u8637
Attempting to get token from https://fermicloud543.fnal.gov:8200 ... succeeded
Storing bearer token in /run/user/8637/bt_u8637

[litvinse@fnisd1 scitokens]$ spath=`httokendecode  | jq ".scope" | sed -e "s/\"//g"  | awk '{ print $NF}' | cut -d ":" -f2`
[litvinse@fnisd1 scitokens]$ echo $spath
/mu2e/scratch/users/litvinse

[litvinse@fnisd1 scitokens]$ httokendecode
{
  "wlcg.ver": "1.0",
  "aud": "https://wlcg.cern.ch/jwt/v1/any",
  "sub": "408c3486-b1ce-4d3d-ab3f-74717f80a534",
  "nbf": 1663358665,
  "scope": "storage.read:/mu2e compute.create compute.read storage.create:/mu2e/scratch/users/litvinse compute.cancel compute.modify storage.modify:/mu2e/scratch/users/litvinse",
  "iss": "https://cilogon.org/mu2e",
  "exp": 1663369470,
  "iat": 1663358670,
  "wlcg.groups": [
    "/mu2e"
  ],
  "jti": "https://cilogon.org/oauth2/3cbad440f5235fc44c38e465d3f86c6e?type=accessToken&ts=1663358669990&version=v2.0&lifetime=10800000"
}
  1. Upload a file:
[litvinse@fnisd1 scitokens]$ curl -f -k -L -H "Authorization: Bearer ${TOKEN}" -Tjunk https://fndcaitb4.fnal.gov:2880${spath}/junk.data
curl: (22) NSS: client certificate not found (nickname not specified)
[litvinse@fnisd1 scitokens]$  TOKEN=`cat /run/user/8637/bt_u8637`
[litvinse@fnisd1 scitokens]$ curl -f -k -L -H "Authorization: Bearer ${TOKEN}" -Tjunk https://fndcaitb4.fnal.gov:2880${spath}/junk.data
837787900 bytes uploaded
  1. check that it now exists:
[root@fndcaitb2 litvinse]# ls -altr
total 3272611
drwxrwxr-x 100    17840 9914       512 Aug 16 15:09 ..
drwxrwxr-x   6 litvinse 9914       512 Sep 16 15:26 .
-rw-rw-r--   1 litvinse 9914 837787900 Sep 16 15:26 junk.data

  1. try to remove
[litvinse@fnisd1 scitokens]$  curl -f -k -L -H "Authorization: Bearer ${TOKEN}" -X DELETE  https://fndcaitb4.fnal.gov:2880${spath}/junk.data
curl: (22) NSS: client certificate not found (nickname not specified)
  1. observe that file does not exist:
[root@fndcaitb2 litvinse]# ls -altr
total 2454459
drwxrwxr-x 100    17840 9914       512 Aug 16 15:09 ..
drwxrwxr-x   5 litvinse 9914       512 Sep 16 15:27 .
  1. using gfal client fails as well:
[litvinse@fnisd1 scitokens]$ export BEARER_TOKEN=`cat /run/user/8637/bt_u8637`
[litvinse@fnisd1 scitokens]$  gfal-copy -K adler32 junk  https://fndcaitb4.fnal.gov:2880${spath}/junk.data
Copying file:///home/litvinse/scitokens/junk   [DONE]  after 33s 

                                                                                                         
[litvinse@fnisd1 scitokens]$  gfal-rm  https://fndcaitb4.fnal.gov:2880${spath}/junk.data
https://fndcaitb4.fnal.gov:2880/mu2e/scratch/users/litvinse/junk.data	FAILED
gfal-rm error: 5 (Input/output error) - DavPosix::unlink  HTTP 500 : Unexpected server error: 500  with url https://fndcaitb4.fnal.gov:2880/mu2e/scratch/users/litvinse/junk.data
[litvinse@fnisd1 scitokens]$ 

There is NPE on sever side in both cases. File is ultimately removed but as you can see client sees a fail.

DmitryLitvintsev avatar Sep 16 '22 20:09 DmitryLitvintsev

As stack trace indicates this seems to have to do with Kafka. So I did:

webdav.enable.kafka=false

and I see no NPE and delete succeeds.

@mksahakyan may be you can take a look?

DmitryLitvintsev avatar Sep 16 '22 20:09 DmitryLitvintsev

@paulmillar and @mksahakyan I have looked at the code in

DoorRequestMessageSerializer.java

And I modified it like so:

        JSONArray subject = new JSONArray();
        // data.getSubject().getPrincipals().forEach(s -> subject.put(s));
        data.getSubject().getPrincipals()
            .stream()
            .filter(s -> !(s instanceof JwtJtiPrincipal))
            .forEach(subject::put);

so that I am filtering out JwtJtiPrincipal.... And I get no NPE. All works as expected.

So, something in that Principal leads to NPE.

	Principal: JwtJtiPrincipal[https://cilogon.org/oauth2/3cbad440f5235fc44c38e465d3f86c6e?type=accessToken&ts=1663358669990&version=v2.0&lifetime=10800000@mu2e]

DmitryLitvintsev avatar Sep 16 '22 22:09 DmitryLitvintsev

As I mentioned in #6722, the line numbers in the stack-trace simply don't match what dCache code is doing: not in the 7.2 branch and not in master branch. The line numbers match (more or less) how the method worked in a much older (and unsupported) version of dCache.

The only explanation that I can think of (with the evidence so far) is that you're not running the version of dCache you think you are.

There are (at least) two possible explanations:

  1. you have a rogue dcache-core.jar file installed in the /usr/share/dcache/classes, /usr/share/dcache/plugins or /usr/local/share/dcache/plugins directory, which dCache is using instead of the dcache-core.jar from the RPM.
  2. you have installed a plugin that (inadvertently) contains classes (such as DoorRequestMessageSerializer) from some ancient version of dCache.

paulmillar avatar Sep 23 '22 07:09 paulmillar

Paul,

I build RPM from master

mvn clean package -am -pl packages/fhs -P rpm

and I deployed it using

rpm -Uvh --force ...

I have been building and deploying dcache this way for many years - I do not remember anything along you suggested fault lines happening.

Please note, that the stack trace looks different from #6772

DmitryLitvintsev avatar Sep 23 '22 13:09 DmitryLitvintsev

I just re-installed having removed everything in classes directory:

2405  2022-09-23 08:54:26 cd /usr/share/dcache/classes/
 2406  2022-09-23 08:54:27 ls -al 
 2407  2022-09-23 08:54:35 dcache stop 
 2408  2022-09-23 08:55:27 rm -rf * 
 2409  2022-09-23 08:55:37 cd /opt/
 2413  2022-09-23 08:56:35 rpm -Uvh -force dcache-8.2.0.7d79a22-1.noarch.rpm

I get the same stacktrace on remove:

23 Sep 2022 08:59:45 (WebDAV-fndcaitb4-1) [door:WebDAV-fndcaitb4-1@webdavDomain:AAXpWJcZTYg] Internal server error
java.lang.NullPointerException: null
        at org.dcache.notification.DoorRequestMessageSerializer.serialize(DoorRequestMessageSerializer.java:77)
        at org.dcache.notification.DoorRequestMessageSerializer.serialize(DoorRequestMessageSerializer.java:18)
        at org.apache.kafka.common.serialization.Serializer.serialize(Serializer.java:62)
        at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:945)
        at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:905)
        at org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer.send(DefaultKafkaProducerFactory.java:993)
        at org.springframework.kafka.core.KafkaTemplate.doSend(KafkaTemplate.java:649)
        at org.springframework.kafka.core.KafkaTemplate.send(KafkaTemplate.java:403)
        at org.springframework.kafka.core.KafkaTemplate.sendDefault(KafkaTemplate.java:382)
        at org.dcache.webdav.DcacheResourceFactory.sendRemoveInfoToBilling(DcacheResourceFactory.java:1102)
        at org.dcache.webdav.DcacheResourceFactory.deleteFile(DcacheResourceFactory.java:1087)
        at org.dcache.webdav.DcacheFileResource.delete(DcacheFileResource.java:190)
        at io.milton.http.DeleteHelperImpl.delete(DeleteHelperImpl.java:126)
        at io.milton.http.http11.DeleteHandler.processExistingResource(DeleteHandler.java:98)
        at io.milton.http.ResourceHandlerHelper.processResource(ResourceHandlerHelper.java:196)
        at io.milton.http.ResourceHandlerHelper.processResource(ResourceHandlerHelper.java:131)
        at io.milton.http.http11.DeleteHandler.processResource(DeleteHandler.java:83)
        at io.milton.http.ResourceHandlerHelper.process(ResourceHandlerHelper.java:127)
        at org.dcache.webdav.DcacheResourceHandlerHelper.process(DcacheResourceHandlerHelper.java:42)
        at io.milton.http.http11.DeleteHandler.process(DeleteHandler.java:72)
        at org.dcache.webdav.DcacheStandardFilter.process(DcacheStandardFilter.java:50)
        at io.milton.http.FilterChain.process(FilterChain.java:46)
        at org.dcache.webdav.transfer.CopyFilter.process(CopyFilter.java:276)
        at io.milton.http.FilterChain.process(FilterChain.java:46)
        at io.milton.http.HttpManager.process(HttpManager.java:158)
        at org.dcache.webdav.MiltonHandler.handle(MiltonHandler.java:77)
        at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.dcache.http.AuthenticationHandler.access$001(AuthenticationHandler.java:55)
        at org.dcache.http.AuthenticationHandler.lambda$handle$0(AuthenticationHandler.java:156)
        at java.base/java.security.AccessController.doPrivileged(Native Method)
        at java.base/javax.security.auth.Subject.doAs(Subject.java:361)
        at org.dcache.http.AuthenticationHandler.handle(AuthenticationHandler.java:153)
        at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.dcache.http.AbstractLoggingHandler.handle(AbstractLoggingHandler.java:110)
        at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.eclipse.jetty.server.Server.handle(Server.java:516)
        at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
        at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
        at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:555)
        at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:410)
        at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:164)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
        at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
        at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:386)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
        at java.base/java.lang.Thread.run(Thread.java:829)

DmitryLitvintsev avatar Sep 23 '22 14:09 DmitryLitvintsev

Hi Dmitry,

Could you try downloading one of the 7.2 RPMs from the dcache.org site, install it, and see if you can reproduce this problem?

Cheers, Paul.

paulmillar avatar Sep 23 '22 18:09 paulmillar

the error is coming from JSONObject class

public String toString() {
    try {
        return this.toString(0);
    } catch (Exception var2) {
        return null;
    }
}

i am trying to debug is now to get the exception

mksahakyan avatar Sep 27 '22 12:09 mksahakyan

The patch fixing the issue is on the RBT

mksahakyan avatar Sep 27 '22 15:09 mksahakyan

@mksahakyan Where does this JSONObject code come from?

paulmillar avatar Sep 28 '22 08:09 paulmillar

JSONObject (org.json) is comming from line 77 in DoorRequestMessageSerializer

return o.toString().getBytes(UTF_8); then at some point in JSONObject on line 185 happens the following:

static final Writer writeValue(Writer writer, Object value, int indentFactor, int indent) throws JSONException, IOException { > if (value != null && !value.equals((Object)null)) {

and then is OpScopedPrincipal 's equal() is called.

It is worth noticing that, "org.json" version is old 2014

mksahakyan avatar Sep 28 '22 09:09 mksahakyan

Sorry, I was a little unclear.

You posted a code snippet; specifically,

public String toString() {
    try {
        return this.toString(0);
    } catch (Exception var2) {
        return null;
    }
}

It isn't dCache code, so from which jar file does this code come?

paulmillar avatar Sep 28 '22 09:09 paulmillar

from the jar json-20141113.jar

 <dependency>
                <groupId>org.json</groupId>
                <artifactId>json</artifactId>
                <version>20141113</version>
   </dependency>

mksahakyan avatar Sep 28 '22 09:09 mksahakyan

https://github.com/stleary/JSON-java/blob/master/src/main/java/org/json/JSONObject.java#L2365

kofemann avatar Sep 28 '22 10:09 kofemann

Ta:

https://github.com/stleary/JSON-java/pull/690

paulmillar avatar Sep 28 '22 10:09 paulmillar

What I still don't understand is why the stack-trace contains this entry:

	at org.dcache.notification.DoorRequestMessageSerializer.serialize(DoorRequestMessageSerializer.java:18)

The method serialize isn't defined at line 18, it's the class definition.

paulmillar avatar Sep 28 '22 10:09 paulmillar

Here is the link to RB fix from @mksahakyan

Patch: https://rb.dcache.org/r/13690/

paulmillar avatar Sep 28 '22 15:09 paulmillar

Thank you @mksahakyan for fixing this issue

DmitryLitvintsev avatar Oct 04 '22 17:10 DmitryLitvintsev