Create hash from key and log this hash with each http request or webhook endpoint (for event grid)
Currently, if the client sends a request with an invalid or incorrect key, we do not have a way check in the logs if the key is valid. Adding a hash would be useful for CRI investigations.
Scenario: The customer has set the authorization level to function for their httptrigger (function), so any request must include the key either in the query string or the x-functions-key HTTP header. In the frontend, we do not log the key, so we do not have a way to check if the provided key is correct.
The customer says that they were sending the correct key (they shared a screenshot of applications insights where it shows the key in the query string). In the frontend, we can see the request coming in. In the Functions logs, we can see that the customer was getting a 401 authorized error due to authorizationlevel = "Anonymous". Please correct me if I am wrong, but it looks like frontend or Functions Host checks that key matches what’s in storage, and it does not, the key gets removed from the request and sends it as anonymous, and therefore, getting an 401. With the current logs, we cannot determined if the provided key was correct.
Having a hash that gets generated from the key could help investigate issues like this. This way we could compare the hash for successful vs failed requests.
cc: @alrod
@Francisco-Gamino can you expand on how you'd expect to use this information to aid in investigation? Would it be purely comparing the hash with previous ones? Can you share more details?
@fabiocav -- I have added more details to the issue description. Please let me know if we need additional information. Thanks.
Thanks! The hash will help if you have successful requests to the same endpoint if you know they would be using the same key. It's valid for a function to have multiple keys, and therefore, multiple possible successful hashes, so having the hash differ from another request doesn't immediately indicate the key was incorrect without user confirmation that they were supposed to be the same. I'll follow up offline to discuss.
This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment.
Having a hash in the logs would help corelate when the authentication fails. With this information, we will know what key was used for the request.
Thanks! The hash will help if you have successful requests to the same endpoint if you know they would be using the same key. It's valid for a function to have multiple keys, and therefore, multiple possible successful hashes, so having the hash differ from another request doesn't immediately indicate the key was incorrect without user confirmation that they were supposed to be the same. I'll follow up offline to discuss.
It would be helpful to have hashes logged even if we can call http endpoints with different successful keys.
This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment.
Adding @cjaliaga given that he is investigating an issue where it seems that the client was using the incorrect function key.
Hi Team, is there any news about this implementation? I have worked with the customer that raised this issue handled by @cjaliaga.
Hello @fabiocav -- Could you please consider fixing this issue? High level, we need to have some kind of record in the logs on what key was used for authentication. Having a hash or something like this could be helpful for CRI investigation where authentication failed. Right now, we do not have a way to determine what key was used and whether that key was valid. Thank you.
/cc @cjaliaga @FinVamp1
Wanted to note here that I just received and resolved a CRI with almost the exact same details as the one that resulted in opening this issue (synced w/ @Francisco-Gamino offline to compare).
The difference in mine is that we know there was a slot swap a few hours before the incident I received, so my resolution/suggestion also included the mitigation in #7710 in case that was the source of error.
@Francisco-Gamino would be good to again sync on this. Last time we discussed, the value of having this in place, in practice, was questionable.