headscale
headscale copied to clipboard
✨ feat: implements APIs for managing ACLs
Description
Headscale currently lacks the APIs to manage the ACLs. The only way possible is presently to load the ACLs via file and changes to the policy require reloading the headscale process. This also makes it difficult to integrate your headscale via APIs with no ACL management.
This commit introduces two APIs allowing you to set the policy.
APIs
Set ACL Policy
PUT /api/v1/policy
Payload (Not JSON, so convert to string before invoking the API.)
{"policy": "{// This is a comment\n\"groups\":{\"g1\":[\"u1\",\"u2\"]},\"acls\":[]}"}
Get ACL Policy
GET /api/v1/policy
Response
{"policy": {"groups": {}, "tags": {}}}
CLI
→ go run cmd/headscale/headscale.go --config ./config.yaml policy --help
Manage the Headscale ACL Policy
Usage:
headscale policy [command]
Available Commands:
get Print the current ACL Policy
set Updates the ACL Policy
Flags:
-h, --help help for policy
Global Flags:
-c, --config string config file (default is /etc/headscale/config.yaml)
--force Disable prompts and forces the execution
-o, --output string Output format. Empty for human-readable, 'json', 'json-line' or 'yaml'
Use "headscale policy [command] --help" for more information about a command.
Testing
https://github.com/juanfont/headscale/assets/5305744/e3f68195-c78d-4cf3-8483-fb1418251137
- [x] read the CONTRIBUTING guidelines
- [x] raised a GitHub issue or discussed it on the projects chat beforehand
- [x] added unit tests
- [ ] added integration tests
- [x] updated documentation if needed
- [ ] updated CHANGELOG.md
Resolves
- https://github.com/juanfont/headscale/issues/582
@evenh @kradalby Please take a look. 🙇🏼
I might have some free time this evening to do a review
I've attached a screen recording to the PR in case someone wishes to see how the APIs work.
Overall it looks good - I've left some inline comments. One open question is whether to support reading the current ACL via the new APIs/CLI commands even if the policy is backed by a file (for convenience)?
Enabled it. 👍🏼 Now it will work for both modes.
I think this looks reasonable, some comments and questions and to larger things:
I am tempted to use this opportunity to change the general phrasing from ACL to Policy through out the code, particularly since we are changing the config, adding commands etc.
I think the base format for the policy should be HuJSON, it is a bit more flexible format that allows preserving comments. This probably means using string/byte for the database and not the JSON type, but we dont really use it (or will use SQL to look up in the JSON) so that should not be a problem. As part of this, I am willing to discuss dropping the YAML support to just streamline what we support.
Thanks for the review. This sounds like a good idea. I wrote this implementation to quickly demonstrate the APIs and commands that we can use as a reference to discuss the feature further.
Keeping the base format as HuJSON
should be the way to proceed and we can store it as a bytearray
or string in the db since we don't intend to query it any further. Although I'm not entirely sure how the API request and response will look over RPC since I do not want to model the policy structure in the proto definition.
I am also okay with keeping the default mode as file
.
Let me update this PR with the suggestions and put it our for a second round of review.
Hi @pallabpain. Have you had a chance to update the PR yet?
Hi @pallabpain. Have you had a chance to update the PR yet?
I haven't had the chance to update the PR, yet. I have some untested changes locally that I should be able to work on this weekend. I may change the request-response structure based on the newer comments.
Maybe, we should create 2 new tables to store groups and tags. And sometimes, we just want to update groups or tags. So /api/v1/acl/groups
, /api/v1/acl/tags
and /api/v1/acl/all
may be more useful.
I'm thinking about how to update ACL groups automatically, e.g. AD/LDAP or extract from OIDC (OIDC may be a problem because it can only get groups when users authenticate)
@pallabpain there has been quite some changes, do you think that you might have time to pick this up again?
@pallabpain there has been quite some changes, do you think that you might have time to pick this up again?
Sure. I'll review the recent changes and update the pull request. I'll try and get a working version ready soon.
Hi, all this pull request is still active? I ask because I plan to create ui editor in package https://github.com/tale/headplane for update acl, and that api endpoint will be great opportunity.
I'm open for the example config to be db, but the default should still be file.
Personally I strongly disagree.
- Many people still use docker. (I know I know, it's not officially supported... still!) Manipulating files inside it, especially if the container is not running is a very hard task.
- IMHO it is more secure to store ACL data inside a
db
. A separate file can be easier hacked / modified / viewed. After all, ACL is the "main security" config of HeadScale, telling who has access to what ! Please keep in mind: The main reason for using VPNs is security. Offering an easy-to-view / edit file to the open is against it.- Eventually more people will use this feature using a web-UI, so it would make much easier to setup with that already.
- For those who like to do things manually (by typing text files, etc) -> changing the config to
file
will not be a big task to do, but for those who are not used to do tasks like this: it would be an "impossible nightmare".Anyway: is the database password protected by default, so only HS can access it?
- or can other programs access it too?
As someone who develops a web UI for Headscale I don't think that the default should be db mode, atleast not yet. I don't understand the security standpoint here, if someone is able to get into a machine where the policy file is stored they can view your config which will contain much more sensitive information (ie. database credentials, secret keys, OIDC credentials, etc).
On the web UI end, I'd expect API requests to the ACL endpoints to just return status 403 or something if file mode is enabled. It's very easy to handle the status code and inform users on a web UI that they need to change their config.
Maybe, we should create 2 new tables to store groups and tags. And sometimes, we just want to update groups or tags. So
/api/v1/acl/groups
,/api/v1/acl/tags
and/api/v1/acl/all
may be more useful.I'm thinking about how to update ACL groups automatically, e.g. AD/LDAP or extract from OIDC (OIDC may be a problem because it can only get groups when users authenticate)
Sounds like a good idea to implement. However, if we get and set the entire policy file contents, it keeps things simple instead of exposing APIs to do specific things on the policy. This shall also keep headscale
in-line with the Tailscale ACL APIs and someone migrating from headscale to Tailscale SaaS can do it without making a lot of changes.
I would prefer keeping it simple and doing the whole thing now. Further extensions can be discussed later.
I think this looks reasonable, some comments and questions and to larger things:
I am tempted to use this opportunity to change the general phrasing from ACL to Policy through out the code, particularly since we are changing the config, adding commands etc.
I think the base format for the policy should be HuJSON, it is a bit more flexible format that allows preserving comments. This probably means using string/byte for the database and not the JSON type, but we dont really use it (or will use SQL to look up in the JSON) so that should not be a problem. As part of this, I am willing to discuss dropping the YAML support to just streamline what we support.
@kradalby Revised the PR. I've tried to keep the implementation simple in the first iteration. While I've updated the naming convention from ACL to Policy in some places, I've left some code untouched as of now. The default format will be HuJSON
and the API will also support the same.
Please take a look.
I think this is more or less there, a couple of small things.
I would also like to see a CLI integration test to validate the roundtrip:
- Load new ACL
- Read it out and compare
- Check that changes have a applied (if possible since we dont have hotreloading)
Thanks for the review. I'll implement the suggestions and write the tests.
Great, I think it would be great to try to get this into 0.23.0, I am looking to release a beta soonish so let me know if looking at this work before that is feasible, if not, there is no problem pushing it for later.
Great, I think it would be great to try to get this into 0.23.0, I am looking to release a beta soonish so let me know if looking at this work before that is feasible, if not, there is no problem pushing it for later.
@kradalby I've addressed the comments. I should be able to write the tests this week. But when is v0.23.0 planned for? Is there a due date or a cut-off date that you're looking at?
Please take a look at this comment as well: https://github.com/juanfont/headscale/pull/1792#discussion_r1649686735
Sounds good, no hard cutoff, I'm on holiday for two weeks, so staying a bit offline, if you make the test I'll pop in and review and we can do an alpha.
I'm back, and I would love to cut a beta this week, do you think it would be feasible to push some tests before midweek~ so we have time for a review cycle? I'll do a re-review of the code now.
No problem, I'm available to take a stab at the integration tests now if you have not started. Maybe I could get it in shape to merge today.
@pallabpain I ended up working on another bug today, let me know if you will not have time to look at the integration tests and I'll have a look next week.
@pallabpain I ended up working on another bug today, let me know if you will not have time to look at the integration tests and I'll have a look next week.
I managed to find some time to look into the integration tests, tonight. While I understood how I'd go about implementing the tests, I may have to spend some time to figure out the right way to configure the headscale instance with the right policy mode. 😅
Sounds good, have a look at the hsic.With options, there you can configure things, use environment variables instead of file.
Sounds good, have a look at the hsic.With options, there you can configure things, use environment variables instead of file.
@kradalby I have implemented a simple test that sets a policy, reads it back and then verifies the content. I had to expose WriteFile
via the ControlServer
interface to create a file for the headscale policy set --file FILE
command.
Please take a look.
I think that looks great, what would be great is a test in the ACLs that do something like:
- Start with two nodes and no ACL, using database mode
- have them ping eachother
- apply a ACL with the command that makes them unable to connect
- verify they can no longer connect
I can have a look at that tomorrow if you do not have time to take a stab at it, let me know.
I've added the test as described, but I found a bug I dont know if is new or old,
when a new policy is set, the nodes reload, but one of the nodes gets a packet filter that is wide open ":", the node belonging to user1, which should get an empty filter. I will continue investigating.
Ok, I found the bug, it likely hasnt been discovered since most people never reloaded ACLs without restarting the server.
If you set the packetfilter in tailcfg to an empty slice, it will marshall to nil
, and that will cause it to be ignored, so the client will continue to use the previous filter:
https://github.com/tailscale/tailscale/blob/main/tailcfg/tailcfg.go#L1863-L1865
I wonder if there is a way to get around this without copy and maintain a copy of the whole tailcfg 🤔 .
Ok, I found the bug, it likely hasnt been discovered since most people never reloaded ACLs without restarting the server.
If you set the packetfilter in tailcfg to an empty slice, it will marshall to
nil
, and that will cause it to be ignored, so the client will continue to use the previous filter: https://github.com/tailscale/tailscale/blob/main/tailcfg/tailcfg.go#L1863-L1865I wonder if there is a way to get around this without copy and maintain a copy of the whole tailcfg 🤔 .
Thanks for adding the tests and resolving the issue. I can clean up the commits once the tests pass.
Thanks for adding the tests and resolving the issue. I can clean up the commits once the tests pass.
Thank you, I think the last commit should resolve it, not sure why, but we can leave it for now.