dcache
dcache copied to clipboard
Bulk: malformed JSON hangs bulk service
Dear dCache devs,
My colleague Natalie found out, that a malformed JSON hangs the bulk service.
The logs show the full request and the error:
20 Aug 2024 10:22:17 (Frontend-hedgehog14) [] Failed to parse JSON "{ "activity": "PIN", "arguments": {"lifetime": "10", "lifetimeUnit":"MINUTES"}, "expand_directories": "ALL", "target": ["/users/anatolid/tape//1GBfile"]}": Unexpected character (' ' (code 160)): was expecting double-quote to start field name
at [Source: (String)"{ "activity": "PIN", "arguments": {"lifetime": "10", "lifetimeUnit":"MINUTES"}, "expand_directories": "ALL", "target": ["/users/anatolid/tape//1GBfile"]}"; line: 1, column: 55]
20 Aug 2024 10:22:17 (bulk) [] ConcurrentRequestProcessor exiting...
Not sure the bad character ("code 160", a non-breaking space?) will get through after pasting in this issue. Here's a hexdump of the character:
[onno@ui ~]# echo ' ' | hexdump
0000000 a0c2 000a
0000003
A hexdump of a normal space:
[onno@ui ~]# echo ' ' | hexdump
0000000 0a20
0000002
So, after submitting the request, we see that the request and all following requests are stuck as "QUEUED":
[root@hedgehog14 ~]# ssh -T -l admin -p "22224" "dcachetest.grid.surfsara.nl" "\s bulk request ls"
ID | ARRIVED | MODIFIED | OWNER | STATUS | UID
43 | 2024/08/07-15:49:47 | 2024/08/07-15:49:47 | 39147:35932 | COMPLETED | cfdee0cc-bd8e-4407-95d2-762e456dca6a
44 | 2024/08/07-15:51:05 | 2024/08/07-15:51:05 | 39147:35932 | COMPLETED | 1ca2167b-9f45-4bad-afff-7eee33f106e2
45 | 2024/08/07-15:59:30 | 2024/08/07-15:59:30 | 39147:35932 | COMPLETED | f3701757-8dfc-4603-bc26-509062682ad0
46 | 2024/08/07-16:00:07 | 2024/08/07-16:00:07 | 39147:35932 | COMPLETED | 95c8d078-d8ed-4640-a740-432da584f640
47 | 2024/08/07-16:00:38 | 2024/08/07-16:00:38 | 39147:35932 | COMPLETED | fffa7038-ecd4-44f2-9182-cc33ea1a7051
48 | 2024/08/07-16:10:23 | 2024/08/07-16:10:23 | 39147:35932 | COMPLETED | 630ea987-d0ed-4d98-8ecf-c1eb6246a95b
49 | 2024/08/20-10:22:17 | 2024/08/20-10:22:17 | 39147:35932 | QUEUED | 5d3d8437-2c7c-47e1-9a93-c30aa5e8fc83
50 | 2024/08/20-10:25:35 | 2024/08/20-10:25:35 | 39147:35932 | QUEUED | d35f92cf-18fb-4241-996d-b1a8ecddee12
51 | 2024/08/20-10:34:59 | 2024/08/20-10:34:59 | 39147:35932 | QUEUED | 1b47ed2a-665d-4f7b-ad91-2eb90564bebb
52 | 2024/08/20-10:38:23 | 2024/08/20-10:38:23 | 39147:35932 | QUEUED | 2491428e-a71b-4263-9bd9-2d592d01090a
53 | 2024/08/20-14:02:17 | 2024/08/20-14:02:17 | 31029:31631 | QUEUED | 991f10cd-d370-4d67-bc56-aa6209a5cb92
Listing the request fails because of a JSON error (both with API and with admin commands).
[root@hedgehog14 /var/log]# ssh -T -l admin -p "22224" "dcachetest.grid.surfsara.nl" "\s bulk request info 5d3d8437-2c7c-47e1-9a93-c30aa5e8fc83"
(3) com.google.common.util.concurrent.UncheckedExecutionException: org.json.JSONException: Expected a ':' after a key at 3 [character 4 line 1] from request info
A database query shows some more info:
# psql -U ******** bulk -c 'select * from request_arguments;'
rid | arguments
-----+-------------------------------------
39 | lifetime:60,lifetimeUnit:MINUTES
40 | lifetime:60,lifetimeUnit:MINUTES
41 | lifetime:60,lifetimeUnit:MINUTES
42 | lifetime:60,lifetimeUnit:MINUTES
43 | lifetime:60,lifetimeUnit:MINUTES
44 | lifetime:60,lifetimeUnit:MINUTES
45 | lifetime:60,lifetimeUnit:MINUTES
46 | lifetime:60,lifetimeUnit:MINUTES
47 | lifetime:60,lifetimeUnit:MINUTES
48 | lifetime:60,lifetimeUnit:MINUTES
49 | "lifetimeUnit":MINUTES,lifetime:10
50 | lifetime:300
51 | lifetime:300
52 | lifetime:300
(14 rows)
I was unable to remove the request with admins commands. The problem can be fixed by deleting the request directly from the database:
# psql -U ******** bulk -c "delete from bulk_request where id = 49;"
And then restarting the domain containing the bulk service.
So, we have a situation where any user with a crafted API request can hang up the bulk service.
It would be nice if JSON errors don't hang up the bulk service but instead are returned as failed.
Cheers, Onno