streamalert
streamalert copied to clipboard
Feature: HTTP Endpoint Support
StreamAlert will also support receiving data via an HTTP endpoint. This is for service providers or appliances that support HTTP endpoints for logging. Example: Akamai, OneLogin: https://support.onelogin.com/hc/en-us/articles/215214143-Streaming-Real-Time-OneLogin-Event-Data-to-your-SIEM-Solution
I'm interested in this in order to provide https://canary.tools/ and https://canarytokens.org/generate with a webhook that will result in StreamAlert being notified of the canaries being triggered. This is especially important because I want StreamAlert to contain more detailed information about where this canarytoken was placed and therefore what reaction I should take if it is triggered.
@jacknagz mentioned that using the API Gateway could be used for this, and I've found this is correct, by following the rough guidance on https://medium.com/@tombray/using-amazon-api-gateway-as-a-proxy-for-kinesis-6242ce132e3d
Here is a simple demonstration of the end result for what I've set up with API Gateway:
# Get shard iterator to read kinesis stream
$ aws kinesis get-shard-iterator --shard-id shardId-000000000000 --shard-iterator-type LATEST --stream-name test_prod_stream_alert_kinesis
{
"ShardIterator": "AAAAAAAAAAF2OY...="
}
# Send data to the API Gateway
$ curl -H "Content-Type: application/json" -X POST -d '{"test":"testdata"}' https://REDACTED.execute-api.us-east-1.amazonaws.com/Prod
{"SequenceNumber":"495...8","ShardId":"shardId-000000000000"}
# Read the latest item from the kinesis stream to a file
$ aws kinesis get-records --shard-iterator "AAAAAAAAAAF2OY...=" > get-records.json
# Extract the record from the file
$ cat get-records.json | jq -r '.Records[0].Data' | base64 --decode | jq '.'
{
"test": "testdata"
}
To set this up, I created an API Gateway with an integration to Kinesis.
The role for this simply needs:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"kinesis:PutRecord"
],
"Resource": "*"
}
]
}
Then you need to setup the body mapping template:
I used:
{
"Data": "$util.base64Encode("$input.json('$')")",
"PartitionKey": "0",
"StreamName": "test_prod_stream_alert_kinesis"
}
Note, that I should change that partition key to be a random value. I should also provide more information within the Data
element so it has a schema that can be better identified by StreamAlert.
Next I deployed it so it can be invoked.
I plan on first documenting this setup as CLI commands and ensuring canary tokens really can hit this and write a rule for them to be picked up. Then we can look into integrating this into StreamAlert in such a way that it can be stood up automatically via configuration.
Note that there is no authentication on this webhook, as canary tokens don't supply anything. I do plan on making the URL at least randomized so it can't be easily found.
The first step is setting up the IAM role that API Gateway will use:
cat << EOF > assume_role.json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "apigateway.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
EOF
aws iam create-role --role-name StreamWriter --assume-role-policy-document file://assume_role.json --description "Allows API Gateway to write to Kinesis"
aws iam attach-role-policy --role-name StreamWriter --policy-arn "arn:aws:iam::aws:policy/service-role/AmazonAPIGatewayPushToCloudWatchLogs"
cat << EOF > AllowPutRecord.json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"kinesis:PutRecord"
],
"Resource": "*"
}
]
}
EOF
aws iam put-role-policy --role-name StreamWriter --policy-name AllowPutRecord --policy-document file://AllowPutRecord.json
Creating the webhook looks like this:
aws apigateway create-rest-api --name StreamWriter --description "Webhook that writes to the Kinesis Stream for StreamAlert" --endpoint-configuration types=REGIONAL
{
"id": "API_ID",
"name": "StreamWriter",
"description": "Webhook that writes to the Kinesis Stream for StreamAlert",
"createdDate": 1519762634,
"apiKeySource": "HEADER",
"endpointConfiguration": {
"types": [
"REGIONAL"
]
}
}
# Need to get the resource id
aws apigateway get-resources --rest-api-id API_ID
{
"items": [
{
"id": "RESOURCE_ID",
"path": "/"
}
]
}
aws apigateway put-method --rest-api-id API_ID --resource-id RESOURCE_ID --http-method POST --authorization-type NONE --request-parameters {}
cat << EOF > requestTemplate.json
{
"application/json": "{\n \"Data\": \"\$util.base64Encode(\"\$input.json('$')\")\",\n \"PartitionKey\": \"0\",\n \"StreamName\": \"test_prod_stream_alert_kinesis\"\n}"
}
EOF
# The partition key should be a random value. This template uses the Velocity Template Language.
# http://velocity.apache.org/engine/devel/vtl-reference.html
# For my needs these webhooks will be triggered infrequently enough that I'm not concerned
# about randomizing the partition key.
aws apigateway put-integration \
--rest-api-id API_ID \
--resource-id RESOURCE_ID \
--http-method POST \
--integration-http-method POST \
--type AWS \
--uri "arn:aws:apigateway:us-east-1:kinesis:action/PutRecord" \
--credentials "arn:aws:iam::ACCOUNT_ID:role/StreamWriter" \
--request-templates file://requestTemplate.json \
--passthrough-behavior NEVER
aws apigateway create-deployment --rest-api-id API_ID --stage-name deployed
aws apigateway put-method-response --rest-api-id API_ID --resource-id RESOURCE_ID --http-method POST --status-code 200 --response-models '{"application/json": "Empty"}'
aws apigateway put-integration-response --rest-api-id API_ID --resource-id RESOURCE_ID --http-method POST --status-code 200 --response-templates '{"application/json":""}'
Calling this looks like:
curl -H "Content-Type: application/json" -X POST -d '{"test":"test1"}' https://API_ID.execute-api.us-east-1.amazonaws.com/deployed
I created a web token canary token, as they trigger the fastest, and this immediately resulted in a new record in my kinesis stream:
aws kinesis get-shard-iterator --shard-id shardId-000000000000 --shard-iterator-type LATEST --stream-name test_prod_stream_alert_kinesis
{
"ShardIterator": "AAAAAAAAAAF0...="
}
aws kinesis get-records --shard-iterator "AAAAAAAAAAF0...=" > get-records.json
cat get-records.json | jq -r '.Records[0].Data' | base64 --decode | jq '.'
{
"manage_url": "http://canarytokens.org/manage?token=5p...5",
"memo": "You hit my webhook",
"additional_data": {
"src_ip": "6.6.6.666",
"useragent": "Mozilla/5.0....",
"referer": null,
"location": null
},
"channel": "HTTP",
"time": "2018-02-27 23:35:26"
}
Thing to do still:
- Add the schema for this.
- Create a StreamAlert rule that can trigger from this.
Nice to have:
- Add more info to the record that is written to kinesis so this can be better identified by StreamAlert. This will involve changing the
requestTemplate.json
above. - Set a randomized partition key.
- Use a resource so this webhook won't be randomly hit, as it may be possible to brute-force web api gateway subdomains. Example, I want the webhook url to be https://ffffffffff.execute-api.us-east-1.amazonaws.com/deployed/mysecrethookname It would also be good to use a regex in the resource name or have query parameters end up in this, so you could create webhook like: https://ffffffffff.execute-api.us-east-1.amazonaws.com/deployed/mysecrethookname?laptop=bobs_macbook&file=/home/bob/canary.html
Then ultimately we need to decide to either do a better write-up of how to set this up, or incorporate it directly into StreamAlert (which would be better, but harder).
This is super cool. We use Canary as well. @jacknagz - thoughts on impl?
Using a resource name (ie. a path such as /secrethook
) is easy, but you MUST create the resource before making the deployment, else you get {"message": "Internal server error"}
and the CloudWatch Logs will show No match for output mapping and no default output mapping configured
(I'm making note of that error here for posterity, as it took me a long time to figure out).
This is how you use a resource:
aws apigateway create-resource --rest-api-id XXX --parent-id YYY --path-part mysecrethook
aws apigateway put-method-response --rest-api-id XXX --resource-id YYY --http-method POST --status-code 200 --response-models '{"application/json": "Empty"}'
aws apigateway put-integration-response --rest-api-id XXX --resource-id YYY --http-method POST --status-code 200 --response-templates '{"application/json":""}'
# Then, only after you've done the above, create the deployment
aws apigateway create-deployment --rest-api-id XXX --stage-name deployed
Then you can hit /mysecrethook
with curl as follows:
curl -H "Content-Type: application/json" -X POST -d '{"test":"test1"}' https://XXX.execute-api.us-east-1.amazonaws.com/deployed/mysecrethook
There isn't a ton of value in doing this as it's just security through obscurity, but it's nice to have for services that don't use any other authentication in their requests, and the only effect of this being "abused" is you would get alerts you didn't care about, so security through obscurity is acceptable here. You could always add stronger security to this if you wanted.
Getting a requestTemplate.json
to work was pretty tricky due to the quote escaping needed, and the lack of functionality (or my lack of knowledge) of VTL: https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-mapping-template-reference.html
I set my requestTemplate.json
with:
cat << EOF > requestTemplate.json
{
"application/json": "{\n \"Data\": \"\$util.base64Encode(\"{\"\"url\"\": \"\"\$context.path\"\", \"\"sourceIp\"\":\"\"\$context.identity.sourceIp\"\", \"\"userAgent\"\":\"\"\$context.identity.userAgent\"\", \"\"requestTime\"\":\"\"\$context.requestTime\"\", \"\"querystring\"\":\"\"\$util.urlDecode(\$input.params().querystring)\"\",\"\"detail\"\":\$input.json('$')}\")\",\n \"PartitionKey\": \"0\",\n \"StreamName\": \"test_prod_stream_alert_kinesis\"\n}"
}
EOF
That nightmare of escaping and lack of white-space can be deciphered as:
{
"Data": "$util.base64Encode("{
'url': '$context.path',
'sourceIp':'$context.identity.sourceIp',
'userAgent':'$context.identity.userAgent',
'requestTime':'$context.requestTime',
'querystring':'$util.urlDecode($input.params().querystring)',
'detail':$input.json('$')
}")",
"PartitionKey": "0",
"StreamName": "test_prod_stream_alert_kinesis"
}
Here is what I have when I provide a canary token webhook of:
https://XXX.execute-api.us-east-1.amazonaws.com/deployed/mysecrethook?device=bobs_laptop&location=secret.txt
You can see I've added query strings of device=bobs_laptop&location=secret.txt
as I might want to use something like that in my webhook that I provide to the service so I know what the purpose of this was.
When the canary is triggered, I end up with the following record data in my kinesis stream:
{
"url": "/deployed/mysecrethook",
"sourceIp": "52.18.63.80",
"userAgent": "python-requests/2.7.0 CPython/2.7.12 Linux/3.13.0-61-generic",
"requestTime": "01/Mar/2018:04:05:37 +0000",
"querystring": "{device=bobs_laptop, location=secret.txt}",
"detail": {
"manage_url": "http://canarytokens.org/manage?token=XXX&auth=YYY",
"memo": "My StreamAlert test",
"additional_data": {
"src_ip": "6.6.6.666",
"useragent": "Mozilla/5.0 ...",
"referer": null,
"location": null
},
"channel": "HTTP",
"time": "2018-03-01 04:05:37"
}
}
You can see, I'm getting some relevant info about who called this webook, and what the webhook is, along with the data that was sent to the webhook inside the detail
element.
Some comments:
-
"url": "/deployed/mysecrethook"
: I don't seem to be able to find a way of getting the whole URL. -
sourceIp
anduserAgent
: These are fine -
"requestTime": "01/Mar/2018:04:05:37 +0000"
: I don't have the ability to change this to a better format. -
"querystring": "{device=bobs_laptop, location=secret.txt}"
: I'm upset that this isn't all json, but it doesn't look like I have an ability to do anything better. -
"detail"
: This is just a blob of exactly what canary tools sent.
It works!
I updated my requestTemplate.json
to:
cat << EOF > requestTemplate.json
{
"application/json": "{\n \"Data\": \"\$util.base64Encode(\"{\"\"webhookApiId\"\": \"\"\$context.apiId\"\", \"\"url\"\": \"\"\$context.path\"\", \"\"sourceIp\"\":\"\"\$context.identity.sourceIp\"\", \"\"userAgent\"\":\"\"\$context.identity.userAgent\"\", \"\"requestTime\"\":\"\"\$context.requestTime\"\", \"\"querystring\"\":\"\"\$util.urlDecode(\$input.params().querystring)\"\",\"\"detail\"\":\$input.json('$')}\")\",\n \"PartitionKey\": \"0\",\n \"StreamName\": \"test_prod_stream_alert_kinesis\"\n}"
}
EOF
Then I added the following to my logs.json
:
"webhook": {
"schema": {
"webhookApiId": "string",
"url": "string",
"sourceIp": "string",
"userAgent": "string",
"requestTime": "string",
"querystring": "string",
"detail": {}
},
"parser": "json"
}
and updated my sources.json
to include webhook
as log in my kinesis stream.
Then I made a rule webhook.py
:
"""Alert on webhook being called."""
from stream_alert.rule_processor.rules_engine import StreamRules
rule = StreamRules.rule
@rule(logs=['webhook'],
matchers=[],
outputs=['slack:alerts'])
def webhook(rec):
return True
The PR #615 collects the comments in this ticket into a single, more coherent, guide, along with changes to the logs.json for the schema of the webhook and a sample rule that will fire anytime a webhook is triggered.
hey @0xdabbad00 I'll take some more team to read through this, but it looks great so far. can you adjust the IAM policy for API gateway to allow to send to specific streams vs *
?
Yes. The API Gateway is configured such that it will only write to the stream you configure it for, but once we decide how to integrate this into StreamAlert, I'll tighten those permissions in a PR. This would be better if integrated into the terraform config of StreamAlert so we know the name of the Stream it will be sending the records to and can automatically set the policy accordingly.
I corrected IAM policy, fixed a step I had forgotten, swapped the ordering of another step I discovered needed to be swapped, and fixed the formatting.
We tried integrating Facebook's Certificate Transparency into this webhook flow: https://developers.facebook.com/docs/certificate-transparency/certificates-webhook
Unfortunately, the webhook I created is only for a POST
request, and Facebook sends an initial GET
request with a parameter that must be echo'd back. I assume this is done to avoid sending unsolicated requests to someone. This is sent in a query string such as ?hub.mode=subscribe&hub.challenge=123456&hub.verify_token=token_you_provide
so you need the GET
to respond back to this with 123456
as the body. I think cases like this are going to be one-off's, and probably should be handled independently of StreamAlert, but I wanted to mention it as something to consider for this.
For those following along, the challenge response to Facebook for this can be set up as follows, where I perform most of the steps as previously, except this time for a GET
request and also this time I echo back a challenge that is sent as a query parameter.
aws apigateway put-method --rest-api-id REST_API_ID --resource-id RESOURCE_ID --http-method GET --authorization-type None
# Use the same requestTemplate.json as described previously
aws apigateway put-integration \
--rest-api-id REST_API_ID \
--resource-id RESOURCE_ID \
--http-method GET \
--integration-http-method POST \
--type AWS \
--uri "arn:aws:apigateway:REGION:kinesis:action/PutRecord" \
--credentials "arn:aws:iam::ACCOUNT_ID:role/StreamWriter" \
--request-templates file://requestTemplate.json \
--passthrough-behavior NEVER
# This `put-integration-response` is the key part of this that will echo back the challenge to Facebook.
cat << EOF > response-template.json
{"text/plain":"\$input.params().get('querystring').get('hub.challenge')"}
EOF
aws apigateway put-integration-response --rest-api-id REST_API_ID --resource-id RESOURCE_ID --http-method GET --status-code 200 --response-templates file://response-template.json
aws apigateway put-method-response --rest-api-id REST_API_ID --resource-id RESOURCE_ID --http-method GET --status-code 200 --response-models '{"application/json": "Empty"}'
aws apigateway create-deployment --rest-api-id REST_API_ID --stage-name deployed