dcache icon indicating copy to clipboard operation
dcache copied to clipboard

Bulk: malformed JSON hangs bulk service

Open onnozweers opened this issue 1 year ago • 2 comments
trafficstars

Dear dCache devs,

My colleague Natalie found out, that a malformed JSON hangs the bulk service.

The logs show the full request and the error:

20 Aug 2024 10:22:17 (Frontend-hedgehog14) [] Failed to parse JSON "{ "activity": "PIN", "arguments": {"lifetime": "10",  "lifetimeUnit":"MINUTES"}, "expand_directories": "ALL", "target": ["/users/anatolid/tape//1GBfile"]}": Unexpected character (' ' (code 160)): was expecting double-quote to start field name
 at [Source: (String)"{ "activity": "PIN", "arguments": {"lifetime": "10",  "lifetimeUnit":"MINUTES"}, "expand_directories": "ALL", "target": ["/users/anatolid/tape//1GBfile"]}"; line: 1, column: 55]
20 Aug 2024 10:22:17 (bulk) [] ConcurrentRequestProcessor exiting...

Not sure the bad character ("code 160", a non-breaking space?) will get through after pasting in this issue. Here's a hexdump of the character:

[onno@ui ~]# echo ' ' | hexdump
0000000 a0c2 000a                              
0000003

A hexdump of a normal space:

[onno@ui ~]# echo ' ' | hexdump
0000000 0a20                                   
0000002

So, after submitting the request, we see that the request and all following requests are stuck as "QUEUED":

[root@hedgehog14 ~]# ssh -T -l admin -p "22224" "dcachetest.grid.surfsara.nl" "\s bulk request ls"
ID           | ARRIVED             |            MODIFIED |        OWNER |     STATUS | UID
43           | 2024/08/07-15:49:47 | 2024/08/07-15:49:47 |  39147:35932 |  COMPLETED | cfdee0cc-bd8e-4407-95d2-762e456dca6a
44           | 2024/08/07-15:51:05 | 2024/08/07-15:51:05 |  39147:35932 |  COMPLETED | 1ca2167b-9f45-4bad-afff-7eee33f106e2
45           | 2024/08/07-15:59:30 | 2024/08/07-15:59:30 |  39147:35932 |  COMPLETED | f3701757-8dfc-4603-bc26-509062682ad0
46           | 2024/08/07-16:00:07 | 2024/08/07-16:00:07 |  39147:35932 |  COMPLETED | 95c8d078-d8ed-4640-a740-432da584f640
47           | 2024/08/07-16:00:38 | 2024/08/07-16:00:38 |  39147:35932 |  COMPLETED | fffa7038-ecd4-44f2-9182-cc33ea1a7051
48           | 2024/08/07-16:10:23 | 2024/08/07-16:10:23 |  39147:35932 |  COMPLETED | 630ea987-d0ed-4d98-8ecf-c1eb6246a95b
49           | 2024/08/20-10:22:17 | 2024/08/20-10:22:17 |  39147:35932 |     QUEUED | 5d3d8437-2c7c-47e1-9a93-c30aa5e8fc83
50           | 2024/08/20-10:25:35 | 2024/08/20-10:25:35 |  39147:35932 |     QUEUED | d35f92cf-18fb-4241-996d-b1a8ecddee12
51           | 2024/08/20-10:34:59 | 2024/08/20-10:34:59 |  39147:35932 |     QUEUED | 1b47ed2a-665d-4f7b-ad91-2eb90564bebb
52           | 2024/08/20-10:38:23 | 2024/08/20-10:38:23 |  39147:35932 |     QUEUED | 2491428e-a71b-4263-9bd9-2d592d01090a
53           | 2024/08/20-14:02:17 | 2024/08/20-14:02:17 |  31029:31631 |     QUEUED | 991f10cd-d370-4d67-bc56-aa6209a5cb92

Listing the request fails because of a JSON error (both with API and with admin commands).

[root@hedgehog14 /var/log]# ssh -T -l admin -p "22224" "dcachetest.grid.surfsara.nl" "\s bulk request info 5d3d8437-2c7c-47e1-9a93-c30aa5e8fc83"
(3) com.google.common.util.concurrent.UncheckedExecutionException: org.json.JSONException: Expected a ':' after a key at 3 [character 4 line 1] from request info

A database query shows some more info:

# psql -U ******** bulk -c 'select * from request_arguments;'
 rid |              arguments              
-----+-------------------------------------
  39 | lifetime:60,lifetimeUnit:MINUTES
  40 | lifetime:60,lifetimeUnit:MINUTES
  41 | lifetime:60,lifetimeUnit:MINUTES
  42 | lifetime:60,lifetimeUnit:MINUTES
  43 | lifetime:60,lifetimeUnit:MINUTES
  44 | lifetime:60,lifetimeUnit:MINUTES
  45 | lifetime:60,lifetimeUnit:MINUTES
  46 | lifetime:60,lifetimeUnit:MINUTES
  47 | lifetime:60,lifetimeUnit:MINUTES
  48 | lifetime:60,lifetimeUnit:MINUTES
  49 |  "lifetimeUnit":MINUTES,lifetime:10
  50 | lifetime:300
  51 | lifetime:300
  52 | lifetime:300
(14 rows)

I was unable to remove the request with admins commands. The problem can be fixed by deleting the request directly from the database:

# psql -U ******** bulk -c "delete from bulk_request where id = 49;"

And then restarting the domain containing the bulk service.

So, we have a situation where any user with a crafted API request can hang up the bulk service.

It would be nice if JSON errors don't hang up the bulk service but instead are returned as failed.

Cheers, Onno

onnozweers avatar Aug 21 '24 11:08 onnozweers