neofs-node icon indicating copy to clipboard operation
neofs-node copied to clipboard

SN forwards a request despite of `access to object operation denied` error

Open aprasolova opened this issue 2 years ago • 7 comments

In a container, where HEAD is denied, I execute HEAD on an object D6w3YA6k9FaGnM4vrv2UoyyRjGwcjLAiSDYTfcHCyBac/GhPM1dEK27HvkqdoW8L9nyJda7ZGMYBHGWhJFWMq9Eu1. In storage node logs I see that the node gets access to object operation denied, and despite of it forwards the request into the container.

  1. Is it necessary to forward the request? I believe local response is exhaustive.
  2. In the last line of the log, there is a 2049 status code, which corresponds to the object not found error. Is it ok? Anyways, in a CLI I get 2048 status code, but the log message confuses me.
2022-08-19T12:52:17.211Z	debug	get/get.go:87	serving request...	{"component": "Object.Get service", "request": "HEAD", "address": "D6w3YA6k9FaGnM4vrv2UoyyRjGwcjLAiSDYTfcHCyBac/GhPM1dEK27HvkqdoW8L9nyJda7ZGMYBHGWhJFWMq9Eu1", "raw": true, "local": false, "with session": true, "with bearer": false}
2022-08-19T12:52:17.211Z	debug	get/local.go:25	local get failed	{"component": "Object.Get service", "request": "HEAD", "address": "D6w3YA6k9FaGnM4vrv2UoyyRjGwcjLAiSDYTfcHCyBac/GhPM1dEK27HvkqdoW8L9nyJda7ZGMYBHGWhJFWMq9Eu1", "raw": true, "local": false, "with session": true, "with bearer": false, "error": "status: code = 2049"}
2022-08-19T12:52:17.211Z	debug	get/get.go:108	operation finished with error	{"component": "Object.Get service", "request": "HEAD", "address": "D6w3YA6k9FaGnM4vrv2UoyyRjGwcjLAiSDYTfcHCyBac/GhPM1dEK27HvkqdoW8L9nyJda7ZGMYBHGWhJFWMq9Eu1", "raw": true, "local": false, "with session": true, "with bearer": false, "error": "status: code = 2049"}
2022-08-19T12:52:17.211Z	debug	get/container.go:18	trying to execute in container...	{"component": "Object.Get service", "request": "HEAD", "address": "D6w3YA6k9FaGnM4vrv2UoyyRjGwcjLAiSDYTfcHCyBac/GhPM1dEK27HvkqdoW8L9nyJda7ZGMYBHGWhJFWMq9Eu1", "raw": true, "local": false, "with session": true, "with bearer": false, "netmap lookup depth": 0}
2022-08-19T12:52:17.211Z	debug	get/container.go:46	process epoch	{"component": "Object.Get service", "request": "HEAD", "address": "D6w3YA6k9FaGnM4vrv2UoyyRjGwcjLAiSDYTfcHCyBac/GhPM1dEK27HvkqdoW8L9nyJda7ZGMYBHGWhJFWMq9Eu1", "raw": true, "local": false, "with session": true, "with bearer": false, "number": 7}
2022-08-19T12:52:17.211Z	debug	get/remote.go:14	processing node...	{"component": "Object.Get service", "request": "HEAD", "address": "D6w3YA6k9FaGnM4vrv2UoyyRjGwcjLAiSDYTfcHCyBac/GhPM1dEK27HvkqdoW8L9nyJda7ZGMYBHGWhJFWMq9Eu1", "raw": true, "local": false, "with session": true, "with bearer": false}
2022-08-19T12:52:17.213Z	debug	get/remote.go:34	remote call failed	{"component": "Object.Get service", "request": "HEAD", "address": "D6w3YA6k9FaGnM4vrv2UoyyRjGwcjLAiSDYTfcHCyBac/GhPM1dEK27HvkqdoW8L9nyJda7ZGMYBHGWhJFWMq9Eu1", "raw": true, "local": false, "with session": true, "with bearer": false, "error": "read object header from NeoFS: status: code = 2048 message = access to object operation denied"}
2022-08-19T12:52:17.213Z	debug	get/remote.go:14	processing node...	{"component": "Object.Get service", "request": "HEAD", "address": "D6w3YA6k9FaGnM4vrv2UoyyRjGwcjLAiSDYTfcHCyBac/GhPM1dEK27HvkqdoW8L9nyJda7ZGMYBHGWhJFWMq9Eu1", "raw": true, "local": false, "with session": true, "with bearer": false}
2022-08-19T12:52:17.215Z	debug	get/remote.go:34	remote call failed	{"component": "Object.Get service", "request": "HEAD", "address": "D6w3YA6k9FaGnM4vrv2UoyyRjGwcjLAiSDYTfcHCyBac/GhPM1dEK27HvkqdoW8L9nyJda7ZGMYBHGWhJFWMq9Eu1", "raw": true, "local": false, "with session": true, "with bearer": false, "error": "read object header from NeoFS: status: code = 2048 message = access to object operation denied"}
2022-08-19T12:52:17.215Z	debug	get/remote.go:14	processing node...	{"component": "Object.Get service", "request": "HEAD", "address": "D6w3YA6k9FaGnM4vrv2UoyyRjGwcjLAiSDYTfcHCyBac/GhPM1dEK27HvkqdoW8L9nyJda7ZGMYBHGWhJFWMq9Eu1", "raw": true, "local": false, "with session": true, "with bearer": false}
2022-08-19T12:52:17.219Z	debug	get/remote.go:34	remote call failed	{"component": "Object.Get service", "request": "HEAD", "address": "D6w3YA6k9FaGnM4vrv2UoyyRjGwcjLAiSDYTfcHCyBac/GhPM1dEK27HvkqdoW8L9nyJda7ZGMYBHGWhJFWMq9Eu1", "raw": true, "local": false, "with session": true, "with bearer": false, "error": "read object header from NeoFS: status: code = 2048 message = access to object operation denied"}
2022-08-19T12:52:17.219Z	debug	get/container.go:63	no more nodes, abort placement iteration	{"component": "Object.Get service", "request": "HEAD", "address": "D6w3YA6k9FaGnM4vrv2UoyyRjGwcjLAiSDYTfcHCyBac/GhPM1dEK27HvkqdoW8L9nyJda7ZGMYBHGWhJFWMq9Eu1", "raw": true, "local": false, "with session": true, "with bearer": false}
2022-08-19T12:52:17.219Z	debug	get/get.go:108	operation finished with error	{"component": "Object.Get service", "request": "HEAD", "address": "D6w3YA6k9FaGnM4vrv2UoyyRjGwcjLAiSDYTfcHCyBac/GhPM1dEK27HvkqdoW8L9nyJda7ZGMYBHGWhJFWMq9Eu1", "raw": true, "local": false, "with session": true, "with bearer": false, "error": "status: code = 2049"}

This test fails

Your Environment

NeoFS v0.31.0

aprasolova avatar Aug 19 '22 13:08 aprasolova

Firstly node tries to read the object locally not matter what, so 2049 (404) is pretty normal. Further we just see the behavior "do until 1st success". In current implementation operation isn't aborted on ACCESS_DENIED error's encounter, and in general I think it's OK. But for some scenarios we actually want to abort the execution on such an error, so I suggest to think about providing an option to tune the behavior for request.

cthulhu-rider avatar Aug 22 '22 11:08 cthulhu-rider

@cthulhu-rider, we discussed that a little with @aprasolova and thought that it is pretty strange behavior: if a node looks for the object locally and finds out that it should not return that according to the ACL rules why does it try to ask other nodes for that object? What could change if other nodes will respond successfully? Should a node return object that it thinks it must not?

Also, all appearances that a node, in that case, has all the needed info to answer 2048, not 2049.

carpawell avatar Aug 22 '22 11:08 carpawell

Maybe I haven't caught the issue theme correctly. In described scenario object isn't stored locally, yes? So how can

node looks for the object locally and finds out that it should not return that according to the ACL rules

?

cthulhu-rider avatar Aug 22 '22 13:08 cthulhu-rider

object isn't stored locally, yes?

@cthulhu-rider, no, an object IS stored locally and a Node is not able to get it because of ACL. And the logic is: "could not get it locally (does not matter why), then ask other nodes". Not clear why. If the local engine returns 2048 (access denied), RPCs to other nodes seem redundant.

carpawell avatar Sep 14 '22 11:09 carpawell

behavior continues to amaze for reading ops https://rest.fs.neo.org/HXSaMJXk2g8C14ht8HSi7BBaiYZ1HeWh2xnWPGQCg4H6/1487-1712598182/index.html#suites/7d0d2bff4573ced7a233aef01d2141a2/d40aabea2b91708f/

i see following possible options:

  1. abort survey on any status response (except Not Found one ofc) and forward it
  2. if all nodes failed, return multi-status response
  3. extend 2: return simple status when it's received from all nodes

1 finishes faster, but may intro inconsistency when nodes respond with different statuses (buggy, malicious, etc.). At the same time, this is ok for specific networks keeping node versions in sync. It's also stands for the best UX for the "good" case (like described in the issue)

2 is the most clear but also costly. 3 lightens response to the original client as the number of nodes in the container increases

1 is easier to implement since 2-3 responses have neven been supported before and 'd require protocol extensions

in total, i suggest to:

  • change behavior to 1
  • consider an option to enable 3 mode. Future default behavior (best to me), request flag (good to me) or node option (worse to me)

@roman-khimov @carpawell

cthulhu-rider avatar Apr 09 '24 08:04 cthulhu-rider

abort survey on any status response

Dangerous for open networks, easy to DoS.

return multi-status response

Clients will freak out on this.

return simple status when it's received from all nodes

Makes sense even for 2/3+ replies.

But at the same time can ACL checks be completely done locally? Because

If the local engine returns 2048 (access denied), RPCs to other nodes seem redundant.

this is a correct thing to do in general, SN is being asked for something and if it can check ACLs locally then any problems with them can be immediately returned to the client. If client is not satisfied (bad node) he can try another SN.

roman-khimov avatar Apr 09 '24 10:04 roman-khimov

Clients will freak out on this. ... If client is not satisfied (bad node) he can try another SN.

this is what really freaks the clients. Everyone expects responsive and transparent behavior. I meant returning multi-status when majority averaging a single status cannot be done. E.g. N container nodes returned N different statuses. Server got them, returned them, so client is not burderned to get them on his own

with this, the behavior of container nodes (one of the core NeoFS features) becomes even more transparent to the client

But at the same time can ACL checks be completely done locally?

they are done locally. If object holder sees access violation, it aborts request immediately. The behavior discussed concerns transit nodes w/o an object, and they behave different

Makes sense even for 2/3+ replies.

such a threshold is not defined anywhere, now it sounds like fiction. Proposal is welcome


this topic may also affect case when most nodes, for example, refuse to service the request while "last" node responds. Now this will be a success because GET makes best effort. At the same time, this differs great from "all OK" case

cthulhu-rider avatar Apr 09 '24 10:04 cthulhu-rider