lakeFS Extract permission node creation 8201

Related to #8201

Change Description

Automating the documentation for endpoint permissions requires to decouple the permission objects from the endpoint code. This PR extracts the permission objects to a separate file. Since the permission code is sensitive, I opted for a straightforward refactor as a first step. This also makes it possible to test the next changes in the permission code separately from the endpoint code. The functions are named after the logAction operation names when available (e.g copy_objects -> CopyObjectsPermissions).

Testing Details

When refactoring I used the IDE's "Extract Method" feature for consistency and minimize room for error. I also ran the tests locally (excluding Hadoop related tests, which I had an issue with running locally). Let me know if you think additional coverage is required.

Apr 14 '25 13:04 tkalir

@arielshaqed thanks for reviewing!

My initial thought was doing something similar to the draft here. The tldr version is along the lines of mapping each operation name to a permissions object that has the resource template (e.g "group/{groupID}") in the resource field. The doc generation part will use this object as is, and when used in the endpoint flow it will call a "fill" function that substitutes the placeholders with the actual values.

If I understand your suggestion correctly, the doc-gen code is supposed to call the function that MakeGroupPermissions (etc) returns, which means it needs to know how many arguments this function expects. It's not impossible to solve, but I think the template solution is easier in this case.

Unrelated question - there's an automatic check that fails this PR for not closing an issue. I'd rather split the code to multiple PRs (I also want to extract the log_action strings to enums before making the map object, which I don't think belongs here). In that case, this PR won't close the original issue. Should I create another issue for this PR to close? Thanks

Apr 20 '25 14:04 tkalir

Hi @arielshaqed thanks for the review.

honestly, I didn't realize that the function names in the controller should match the operationIds, so as I mentioned, I named the functions after the logAction strings. I corrected that now.
regarding how this PR helps: my next step is to construct the operationId -> permission-node map, where the resource parts of the nodes are templates. So the docgen flow can use these nodes as-is, and the controller flow will fill these templates with the request data.

I intend to replace the code in the operationPermissions functions to use and fill the nodes from the map. I did not include it in this PR because I thought it would be even harder to review and verify. Additionally, if the current operationPermissions functions are correct, it will be easy to verify that the new versions (that use the map) return the same results.

Part of my assumption was that this PR would be hard to review and verify automatically, so I tried to make the changes as straightforward as possible (extracting code from each controller function to a helper function). I am open to suggestions how to to make such PRs more reviewable in the future. I am also curious to see how you plan to automate the verification that the pre-pr and post-pr code do the same, I didn't think of a good way to do so myself.

Thanks!

Jun 24 '25 18:06 tkalir

Hi @tkalir , Apologies for another delay - I've been unwell. Better now, so will reply by Sunday.

Jul 04 '25 15:07 arielshaqed

I will write a script to extract permissions used for each call. I think that may give us confidence we're doing the right thing here.

Jul 06 '25 18:07 arielshaqed

Here is a more concrete proposal. The chief driving force here is reliability: reduce the risk of human error when extracting the correct permissions by automating it a bit. So I wrote a dumb script that just manages to extract the permissions. Because this is a forklift (we replace one thing with another, but we do not plan on doing the same replacement again), it is allowed for the script to fail and say that it failed in some cases. This is better than making a silent error.

After wasting some time and burning the planet using Gemini Code (pro tip: it doesn't work), I hacked together the attached Python script and ran it on controller.go. It manages to extract permissions for most of the API functions - we can do the remaining 4 manually. Because of reasonably strict error checking in the script, I feel that the automation here is trustworthy.

I think next up we can extract permissions. We will need to extract permissions to a more usable object. For instance, if we had an interface Allower with a method Allowed(permissionParams, r http.Request, w *http.Response) then we could have a map element "create_presign_multipart_upload" -> AllowByPath(...).

An implementation of AllowByPath would be something like this:

type AllowByPath struct {
    Action string
}

func (a *AllowByPath) func Allowed(params PermissionParams, r http.Request, w *http.Response) bool {
    return a.Controller.AuthorizeCallback(w, r, permissions.Node{
		Permission: permissions.Permission{
			Action:   a.Action
			Resource: permissions.ObjectArn(params.Repository, params.Path),
		},
	})
}

Once we've done that, we can generate documentation by adding another method to Allower. And we can order the documentation correctly by ordering all the function names output in permissions.out.

WDYT?

permissions.out.gz extract_permissions.py.gz

Jul 07 '25 06:07 arielshaqed

Thanks, this looks like a solid idea.

My main concern with this approach is how to verify the code's correctness after the change. In the last PR, I relied on the IDE's refactoring tool to reduce the chance of human error when extracting the nodes' code into functions. With this approach, the IDE won't be as much help. While this approach does result in a lot less new functions, I think the new objects + the changes in the calling code still results in many changes that would be hard to check manually.

One option for mitigating that could be verifying that Authorize() receives the same input when the same endpoints are called with the same parameters. To try this, I called LinkPhysicalAddress() with a fake Controller object whose Authorize() logs the permission object, and it worked. So what I could do is:

create a fake controller that logs Authorize()'s input to a file
call all endpoint handlers
write the new code
call the endpoint handlers again
check that the outputs are the same

I suspect the test code would have to change a little for each endpoint, so the test code won't be small, but it is doable. It might be an overkill, but since this touches security code along all handlers, it might be justified. What do you think? Is there a simpler way to test this change?

Jul 08 '25 22:07 tkalir

Since we have reproducible automation, I suggest we:

Review the Python script (I wrote it, so either you review it or I find a third person).
Sample 10 of its outputs.
Go on from there.

(Of course change the script if you need to tweak its output).

Jul 14 '25 11:07 arielshaqed

I do believe the script works as intended - but my concern is also with the boilerplate that the new code will require, which the script won't help with verifying. But we can postpone this discussion to after I write the code. (Or, if you prefer, I can start with doing only the flows involving AllowByPath to make the review more manageable)

Jul 17 '25 14:07 tkalir

Hi @arielshaqed ,

Working on the code, I noticed that some endpoints pass a callback to AuthorizeCallback, which will need to be passed to Allowers. It can certainly work, but made me wonder if we can simplify the design.

This suggestion assumes that the only real difference between Allowers is the Node object they use, let me know if you think that might change. Otherwise, I suggest storing in the map objects like these:

type PermissionDescriptor interface {
	Permission(params PermissionParams) permissions.Node
        <doc generation function>
}

type PathPermissionDescriptor struct {
	Action string
}

func (p *PathPermissionDescriptor) Permission(params PermissionParams) permissions.Node {
	return permissions.Node{
		Permission: permissions.Permission{
			Action:   p.Action,
			Resource: permissions.ObjectArn(params.Repository, params.Path),
		},
	}
}

Then, instead of calling authorize() directly, we can wrap it in a function that receives OperationId and all the relevant params, and have it handle the map and call authorize(). This way there's less chance that changes to the flow will ripple to every endpoint.

What do you think?

Jul 24 '25 07:07 tkalir

Hi @tkalir ,

Really glad we're back on this!

Sorry, I am not sure about this part of your suggestion; I am probably missing your point in all of this.

	return permissions.Node{
  	Permission: permissions.Permission{
  		Action:   p.Action,
  		Resource: permissions.ObjectArn(params.Repository, params.Path),
  	},
  }

There are many more types of Nodes, in actual use! I mean, many Nodes refer to other ARNs than object ARNs, and we sometimes use "AND" Nodes.

I think you refer to more than what I read -- and would be glad if you could help me understand.

Jul 24 '25 07:07 arielshaqed

Hi, I'll explain a bit more.

This suggestion is similar to your last one - having all objects in the map share an interface (PermissionDescritor), with the same number of implementations. Meaning, every time the permission node is the same ARN structure with a different action, they share an implementation. Times when AND is used may require a different implementation each.

Comparing to the Allowers design, it's just about splitting a bit differently what's in the interface implementation and what's in the calling code. Instead of having each implementation calling AuthorizeCallback with a different node object, I am having each implementation returning a different node, which the calling code passes to Authorize().

Some more examples:

type BranchPermissionDescriptor struct {
	Action string
}

func (b *BranchPermissionDescriptor) Permission(params PermissionParams) permissions.Node {
	return permissions.Node{
		Permission: permissions.Permission{
			Action:   b.Action,
			Resource: permissions.BranchArn(params.Repository, params.Branch),
		},
	}
}

example using AND:

type CreateRepoPermissionDescriptor struct {}

func (c *CreateRepoPermissionDescriptor) Permission(params PermissionParams) permissions.Node {
	return permissions.Node{
		Type: permissions.NodeTypeAnd,
		Nodes: []permissions.Node{
			{
				Permission: permissions.Permission{
					Action:   permissions.CreateRepositoryAction,
					Resource: permissions.RepoArn(params.Repository),
				},
			},
			{
				Permission: permissions.Permission{
					Action:   permissions.AttachStorageNamespaceAction,
					Resource: permissions.StorageNamespace(params.StorageNamespace),
				},
			},
		},
	}
}

(PermissionDescriptor seems too long, if we use this design we can go with "OpPermission", "PermissionSpec" or something similar) Makes sense?

Jul 24 '25 13:07 tkalir

Thanks for the explanation - this really makes a lot of sense! So most of the logic I thought you needed to put in the PermissionDescriptor, you actually put into the factory (constructor) functions Permission, right?

You've obviously put a great deal of understanding into our requirements, even those I did not phrase formally. And I think this proposal offers a great path forward.

I am asking for some clarifications, to be sure that I understand and to try to avoid some blockers. I really think we can manage all of these, either this way or another way.

IIUC this means PermissionParams will have many more fields, for branches, source objects (for copy), repositories, users, groups, tags, and all that. That sounds great, I had not thought of it like that. If I now understand correctly, my only request is that PermissionParams fields will be pointers to strings not strings: I would like code in controller.go to be very clear about the difference between "I don't have a parameter value to give here" (nil) and "I have a parameter value to give here, and it is the empty string" (a pointer to an empty string). I realize it's probably OK because empty strings probably don't make sense for any of the parameters. But... maybe they do? Maybe an empty path component could make sense in future? If we do it with pointers we will never have to worry about it.

Creating a pointer to a string can be unpleasant in Go, particularly when you have a literal or a computed string. We use swag.String(...) which does that for us; you could use that in your upcoming code too.

Jul 31 '25 11:07 arielshaqed

Hi, I was sick for a few days so it got delayed. Thanks for the feedback!

Regarding where the logic is located, to make sure we are on the same page, my suggestion is to replace Authorize() with a function like this

func (c *Controller) authorizeReq(w http.ResponseWriter, r *http.Request, params PermissionParams, operationId string, opts *AuthorizeOpts) bool {
	desc := GetPermissionDescriptor(operationId)
	callback := writeError
	if opts != nil && opts.callback != nil {
		callback = opts.callback
	}
	return c.authorizeCallback(w, r, desc.Permission(params), callback)
}
// currently custom callback is needed only in ~2 endpoints, but having opts reduces 
// the chance of needing to change the function's signature in the future

GetPermissionDescriptor just looks in the map for a PermissionDescriptor object, using the operationId as key. The map is populated by (for the examples from my last message)

"CreatePresignMultipartUpload" -> &BranchPermissionDescriptor{action: permissions.WriteObjectAction}
"CreateRepo" -> &CreateRepoPermissionDescriptor{}

So not a lot of logic overall.

Some additional points:

as you said, PermissionParams will use multiple fields for tags, branches, etc. I think that currently there won't be a use for having nil values there, but future-proofing won't harm us, I'll do as you suggested.
which keys to use in the map - I suggest using the operationId names, like createPresignMultipartUpload. When generating the docs we will need to fetch info from the swagger (like the endpoint url for the operation), and the swagger uses operationIds. There are a few possibilities: a - using hard coded strings as keys in the map b - generate operationId constants from the swagger and use them as keys c - generate Operation objects from the swagger, which will include the operationIds and endpoint urls (so parsing the swagger in the doc gen part won't be necessary). If we want to create operation->logAction mapping, the logAction names can be added as custom fields in the swagger, for the times where getting logAction from operationId is not trivial. In this option we use <operation>.id as key.

A or B will be easier, as this issue doesn't necessitate dealing with the logaction names, but tell me what you think.

Aug 10 '25 16:08 tkalir

Glad to hear you're better!

I'm happy with any of the options. Personally I do not see much benefit in consts / string enums / int enums compared to just copying the operation ID. I mean, it basically means writing stuff like

func GetObject(...params) {
    something.Permissions(GetObjectActionName, ...)
    // ...
}

rather than

func GetObject(...params) {
    something.Permissions("get-object", ...)
    // ...
}

The minor advantage to the first is that it catches some errors at compile-time rather than at run-time. But:

If you write "geet-object" with an error, then any test of the code will fail. And I really hope we test our changes by running them at least once 🙀
A much more common error for me is to copy-paste and end up with "put-object" instead of "get-object". And then I'd be as likely to write PutObjectActionName.

So no strong feelings here.

Aug 11 '25 06:08 arielshaqed

thanks!

I think another benefit of having constant variables is that it makes it easier to find where in the code the string is used (for this flow's purpose). I mean, the operationid might appear in multiple places in the code base, but the constant appears only in this flow. However, since you don't have a preference and this task is quite big as it is, I think I'll go for the simpler solution.

I think my next step will be doing a PR for migrating the Path-using endpoints to this method, and then we can talk testing with concrete examples.

Aug 11 '25 07:08 tkalir

lakeFS lakeFS copied to clipboard

Extract permission node creation 8201

Change Description

Testing Details

lakeFS
lakeFS copied to clipboard