leofs
leofs copied to clipboard
List Objects took too long and blocking
Description
Get Bucket (List Objects) in a bucket with a large number of objects takes long (90 seconds) and blocks other operations
Details
The issue is caught when testing with goofys, goofys tries to grab the "file" list and at the same time serving other file operations. It seems the system is stuck with the long get bucket request and cannot server other requests (such as HEAD)
...
[HEAD] test test/1375000 0 0 2017-06-27 14:17:46.157886 +0900 1498540666157944 200 10329
[HEAD] test test/1812500 0 0 2017-06-27 14:17:55.351098 +0900 1498540675351158 200 9191
[HEAD] test test/1437500 0 0 2017-06-27 14:18:03.513853 +0900 1498540683513912 200 8161
[HEAD] test test/1312500 0 0 2017-06-27 14:18:03.516416 +0900 1498540683516472 200 1
[HEAD] test test/1500000 0 0 2017-06-27 14:18:05.559687 +0900 1498540685559748 200 2042
...
[GET] test test/1437500 0 14451 2017-06-27 14:18:21.951445 +0900 1498540701951475 200 2 miss
[GET] test test/625000 0 43079 2017-06-27 14:18:21.954458 +0900 1498540701954516 200 2 miss
[GET] test test/250000 0 13481 2017-06-27 14:18:21.957682 +0900 1498540701957717 200 2 miss
[BUCKET-GET] test 0 0 2017-06-27 14:18:22.803471 +0900 1498540702803522 500 90003
...
The call down to leo_object_storage
does not limit the number of keys returned, probably the cause of long operation
https://github.com/leo-project/leofs/blob/develop/apps/leo_storage/src/leo_storage_handler_object.erl#L968
Related issue: https://github.com/leo-project/leofs/issues/758. It might be good to make the problem covered by #758 more generic and manage the issue at one place (close this issue).
Pass Max Keys to backend db for prefix_search should help the situatation
Previous attempt https://github.com/leo-project/leofs/pull/802 failed. As results from multiple metadata server are only concatenated, while upper layer assumes it is a sorted list.
Quote from previous discussion
We are doing unnecessary long listing when max_key is not used and it causes problem of listing bucket in system with tons of object
The problem is also due to the code at leo_object_storage_api:fetch_by_key/3 https://github.com/leo-project/leo_object_storage/blob/develop/src/leo_object_storage_api.erl#L285
The sub list results came from multiple metadata server are not sorted, when we use the sublist to crop off the list, the marker for subsequent would be incorrect
===== Illustration =====
Metadata1: obj0, obj13 <- sorted Metddata2: obj1, obj2 <- sorted
Combined List: [obj0, obj13, obj1, ojb2] <- not sorted Cropped List: [obj0, obj13] <- not sorted Correct List: [obj0, obj1]
The cropped list is incorrect and therefore the next marker is incorrect
To solve the problem we can first sort the combined list
Reply = lists:sort(lists:reverse(lists:flatten(Ret))),
Or
Reply = lists:sublist(ordsets:from_list(lists:flatten(List)), MaxKeys)
Or merge it, so order is maintained (n-way merge though)
This problem did not show up when the max_key is undefined (i.e. not passed to leo_object_storage_api)
Link to Issue https://github.com/leo-project/leofs/issues/823 (Issue about the max_key approach)