servant-hmac-auth Introduce HmacAuthed with secured request's content

This PR introduces the new Servant combinator HmacAuthed. It is essentially the same as HmacAuth but adds the following:

Takes the request's body into account when generating the HMAC signature (HmacAuth ignores it and as a consequence doesn't guarantee the integrity of the request)
Ignore the () parameter introduced by HmacAuth used to represent the user.

Discussions

To achieve this, an entirely new module is introduced with a specific client hmacClient and runner runHmacClient. The implementation guidelines behind the module were:

An API as close to the original Servant API as possible. That's the reason why these hmacClient and runHmacClient are different from the old versions. It also means that to switch to and from the unsecured equivalent (client and runClientM), the user only need to rename the function call and add the required parameter.
Keep the old versions working for smoother transitions.
An API no longer using the experimental Servant authentication mechanism. The experimental aspect aside, it also wasn't suited to our needs: we only guarantee the identify of a single user and don't distinguish user from a set of users (though that would be technically possible, with each user having its own secret key), we also need to access the request body for authentication which poses its "consumption problem" in the regular request handling workflow...

Pros

Simpler API to use (though that is very subjective, at the very least the API is closer to the Servant API)
The integrity of the request's content is guaranteed by the signature.
Request hashing is only performed when authentication is required (as opposed to the previous workaround)

Caveats

As the new API hashes the request's body content to generate the HMAC signature, it needs to make to copy of it to give to the client / server backend. This copy lives in memory. This would be very expensive for very large requests and streaming. Streaming is also affected by the fact that the different body chunks are only transferred as soon as they have all been inspected for signature. This HMAC authentication implementation is therefore not really suited for streaming. Plus, care needs to be taken and protection put in place to prevent accidental (or voluntary and malicious) DoS attacks with very large requests sent to APIs protected by this library. A possible workaround would be to limit the size of the requests for the protected endpoints.

Breaking changes

During implementation, the existing API was kept as-is after all... almost. The forced addition of the Host and Accept-Encoding HTTP headers was removed as it was deemed unnecessary, even harmful for the test cases, for the new implementation in the shared module: the Host may be set by the client and we shouldn't change it and use the existing value for signature. If it's not set, then it should be ok to extract the value from the target URL. As for the Accept-Encoding HTTP header, it doesn't look like a good idea to tell the remote side that we can accept a Gzip encoding if the client didn't say so and if we actually don't (both the client and this library). Perhaps we should drop it off the whitelist of HTTP headers as well: if any intermediate reverse proxy accepts additional encodings, both the client and server shouldn't care about that and see the unharmed body content, no matter what happens in-between. If the client actually supports the Gzip encoding and the library doesn't, that actually doesn't matter as well: this library doesn't need to know that and the Gzip middleware should ensure that the library only sees and authenticates the decoded content with the new workflow.

Questions

As it is, the old version of the API (the one that doesn't take the request's content into account) is still functional but not very useful: it only guarantees the caller's identity and integrity of the requested URL (including the query string). There is no integrity check on the content. Though it can be kept in the library, it doesn't seem very useful in terms of security. It doesn't suffer the issue of the very large request / streaming though. What should be done about it?
Should we consider the possibility of distinguishing multiple users by their secret key? There are multiple ways of achieving this. But it was not a requirement for this library. Shall we continue with that? If we intend on adding support to that, it might be easier to take it on immediately.
See comment on the headers in the Breaking changes section. Can we still reproduce the reverse-proxy issue without the Accept-Encoding header?

Further actions required

[ ] Discuss this PR.
[x] Update the documentation
[ ] Decide on the old API's future

Mar 10 '22 18:03 michivi

As discussed with @arbus and @JolandaNava , this PR has been changed in the following way: The client and server APIs have been changed to identify the user and use their secret key for authentication. As opposed to the first proposed version, the user is now provided to the server handler after the given context has identified and authenticated the user. The client API will only need to be provided with the user while the given context will select their secret key and add the necessary information in the HTTP request so that the server is able to identify the user and use their secret key as well.

Mar 14 '22 18:03 michivi

If that's ok with everyone, we can merge this as experimental and create a new version out of it. There should be no real issue as the previous implementation is still available. If there's any feedback, we can still have PRs changing the experimental implementations. /Cc @JolandaNava @arbus @alexterz

Apr 15 '22 09:04 michivi

@kahlil29 What do you think we should do about this PR? There should be no breaking changes as the previous API is still here. The new proposed API is still experimental, though functional in our project for now.

Apr 21 '22 07:04 michivi

Hey @michivi Given that there are no breaking changes, I am in favor of merging and cutting a new release, then we go ahead and test it out and figure out next steps. I feel like the "older" (existing) API would always need to be kept, since it's being used in multiple projects, internally at Holmusk as well as by other external parties like @ocharles (CircuitHub).

Apr 21 '22 08:04 kahlil29

From my perspective, I wouldn't worry about maintaining an API just for us - we'll adapt our code quick enough!

Apr 21 '22 08:04 ocharles

@ocharles Thanks for that! Any other thoughts/opinions on this PR are also appreciated and welcome 👍🏻 😄

Apr 21 '22 08:04 kahlil29

servant-hmac-auth servant-hmac-auth copied to clipboard

Introduce HmacAuthed with secured request's content

Discussions

Pros

Caveats

Breaking changes

Questions

Further actions required

servant-hmac-auth
servant-hmac-auth copied to clipboard