covid-alert-server
covid-alert-server copied to clipboard
BVC-55: Race condition in bruce force attack pattern against claim OTC key path.
General Information Function: claimKey() Location: covid-shield-server/pkg/server/keyclaim.go Path: /claim-key
Logic of claimKey() Check how many attempts the IP has left based on the current number of failure associated with the IP
https://github.com/cds-snc/covid-shield-server/blob/4518d8d8bca083e450a459e8a3bcac8ef5be83e0/pkg/server/keyclaim.go#L115
If the IP has 0 attempt left → return HTTP 429 Too Many Requests <Initially, a user has 8 attempts available>
https://github.com/cds-snc/covid-shield-server/blob/4518d8d8bca083e450a459e8a3bcac8ef5be83e0/pkg/server/keyclaim.go#L119
Reads the data from the HTTP request received
https://github.com/cds-snc/covid-shield-server/blob/4518d8d8bca083e450a459e8a3bcac8ef5be83e0/pkg/server/keyclaim.go#L127
Tries to claim the one time code read from the data read in previous step
https://github.com/cds-snc/covid-shield-server/blob/7d0a103f6b5ca2b66a5598a21c2ce3f9cbf9b3b7/pkg/persistence/queries.go#L65
Verify that the application public key isn't already in the encryption_keys table
Verify that the one time code is valid <In this use case, this is the last step because the one time code isn't valid>
If the one time code isn't valid
The number of failure associated with the IP is increased.
https://github.com/cds-snc/covid-shield-server/blob/b10ebdf7a7f926f24526ad1d2b485389c5536e29/pkg/persistence/queries.go#L384
Return HTTP 401 unauthorized
If the one time code is valid
https://github.com/cds-snc/covid-shield-server/blob/4518d8d8bca083e450a459e8a3bcac8ef5be83e0/pkg/server/keyclaim.go#L190
return the server public key to send keys to the server using /upload
Race Condition Description
The race condition takes advantage of the execution time in between step 1 and step 5a. The number of failure in between those step is assume to stay the constant. Since more than a thread can be executing claimKey() at the same time, this assumption leaves a possibility that the value won't be updated before another thread accesses the database to verify if the same IP can try to get a valid one time key step 2. Since there are 2 database queries before updating the number of failures, it leaves enough time for a user to send more than a request simultaneously. This would give the user more chances to guess a valid one time code.
Proof of concept Two methods were use to successfully exploit the race condition. The POC was written in python. Multi processing is used instead of multi threading in order to avoid the global interpreter lock (GIL). In other languages, a multi threaded solution should work.
First method (Most reliable) Current setup: 7 processes Do sequentially 7 requests to /claim-key <available attempt - 1> Create a number of processes Synchronize all the processes Send a request to /claim-key with each process created at step 2
Second Method (Best results) Current setup: 25 processes Create a number of processes. Synchronize all the processes Send a request to /claim-key with each process created at step 1
Marking this as low pri, and will discuss a proper fix for after launch, based on presence of AWS WAF rate limit rules as well.
Fixing this will require some locks that may potentially introduce other instability into the ingress flow so pausing for now.
Looked into this a bit, I have written down some thoughts here.
- [ ] Double Check assumptions
Implement leaky bucket once redis is available