loki
loki copied to clipboard
lambda-promtail crashing with a panic
Describe the bug lambda-promtail crashing with a panic when trying to pull AWS load balancer logs from S3.
I am using a slightly modified version of the lambda-promtail CloudFormation template to deploy the AWS resources (it's missing the S3 permissions cf. terraform template) which pushes logs to a basic authentication Promtail instance (which then forwards the logs to our Loki instance).
To Reproduce Steps to reproduce the behavior:
- Create private ECR repository, pull
public.ecr.aws/grafana/lambda-promtail:2.5.0-amd64and push to private ECR repository. - Use lambda-promtail CloudFormation template to create AWS resources.
- Create S3 Event Notification to run lambda-promtail function on all object create events in the AWS load balancer log S3 bucket.
- lambda-promtail function fails with below log
Expected behavior lambda-promtail to get log from S3 and send it to Promtail/Loki.
Environment:
- Infrastructure: AWS Lambda using
public.ecr.aws/grafana/lambda-promtail:2.5.0-amd64container image - Deployment tool: AWS CloudFormation
Screenshots, Promtail config, or terminal output Logs from lambda function:
2022-06-22T12:58:07.336+12:00 | START RequestId: 7ed4d51b-2459-48b2-8336-16742c131614 Version: $LATEST
2022-06-22T12:58:07.336+12:00 | write address: https://{REDACTED}/aws-lb
2022-06-22T12:58:07.336+12:00 | keep stream: false
2022-06-22T12:58:07.630+12:00 | 2022-06-22 00:58:07.630385 I | calling the handler function resulted in a panic, the process should exit
2022-06-22T12:58:07.632+12:00 | END RequestId: 7ed4d51b-2459-48b2-8336-16742c131614
2022-06-22T12:58:07.632+12:00 | REPORT RequestId: 7ed4d51b-2459-48b2-8336-16742c131614 Duration: 292.89 ms Billed Duration: 2235 ms Memory Size: 128 MB Max Memory Used: 33 MB Init Duration: 1941.68 ms
2022-06-22T12:58:07.632+12:00 | Unknown application error occurred
Assigning @cstyan to have a look when they get some time; I believe they wrote this code
@elliotdobson can you try a newer version of the lambda-promtail image? we don't really version lambda-promtail like we do with loki release so just because there's a 2.5.0 tag doesn't mean it's a stable release like loki 2.5.0. So there could have been a bug in the 2.5 version. The ECR repo with images is here: https://gallery.ecr.aws/grafana/lambda-promtail
On top of that looking at the merge of the s3 feature for lambda-promtail and the branch that loki 2.5.0 is based off of, the lambda-promtail s3 support hadn't yet been merged. So again, I think upgrading your image version will help.
The last thing that is suspicious is that your lambda logging states that there's a panic but doesn't provide the go stacktrace from the panic.
Hey @cstyan. I tried using public.ecr.aws/grafana/lambda-promtail@sha256:db33e17246d4e713d743717a20ea1757534e58c26220cb2d22e1ce489bf3f697 which was the latest main image at the time but it gave another error.
2022-06-29T11:05:32.561+12:00 | START RequestId: 5de33725-3312-4930-bcdc-eff2d08e7f82 Version: $LATEST
2022-06-29T11:05:32.563+12:00 | IMAGE Launch error: fork/exec /app/main: exec format error Entrypoint: [/app/main] Cmd: [] WorkingDir: [/app]
2022-06-29T11:05:32.569+12:00 | END RequestId: 5de33725-3312-4930-bcdc-eff2d08e7f82
Here's a redacted version of the AWS CloudFormation template I'm using:
AWSTemplateFormatVersion: "2010-09-09"
Description: Creates AWS resources for lambda-promtail
Parameters:
LambdaPromtailPassword:
Description: The basic auth password, for the external-promtail endpoint.
Type: String
Default: ""
NoEcho: true
Outputs:
LambdaPromtailFunction:
Description: Lambda Promtail Function ARN
Value: !GetAtt LambdaFunctionLambdaPromtail.Arn
Export:
Name: lambda-promtail
Resources:
IamRoleLambdaPromtail:
Type: AWS::IAM::Role
Properties:
RoleName: lambda-promtail
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service:
- lambda.amazonaws.com
Action:
- sts:AssumeRole
Policies:
- PolicyName: lambda-logs
PolicyDocument:
Version: '2012-10-17'
Statement:
-
Effect: Allow
Action:
- logs:CreateLogGroup
- logs:CreateLogStream
- logs:PutLogEvents
Resource: arn:aws:logs:*:*:*
-
Effect: Allow
Action: 's3:GetObject'
Resource: 'arn:aws:s3:::aws-elb-logs/*'
LambdaFunctionLambdaPromtail:
Type: AWS::Lambda::Function
Properties:
FunctionName: "lambda-promtail"
Code:
ImageUri: "{PRIVATE_ECR}/lambda-promtail:main"
MemorySize: 128
PackageType: Image
Timeout: 60
Role: !GetAtt IamRoleLambdaPromtail.Arn
ReservedConcurrentExecutions: 2
Environment:
Variables:
WRITE_ADDRESS: "https://{REDACTED}/aws-lb"
USERNAME: "lambda-promtail"
PASSWORD: !Ref LambdaPromtailPassword
KEEP_STREAM: "false"
EXTRA_LABELS: ""
TENANT_ID: ""
LambdaPermissionLambdaPromtail:
Type: AWS::Lambda::Permission
Properties:
FunctionName: !GetAtt LambdaFunctionLambdaPromtail.Arn
Action: lambda:InvokeFunction
Principal: s3.amazonaws.com
SourceAccount: !Ref 'AWS::AccountId'
SourceArn: 'arn:aws:s3:::aws-elb-logs'
The S3 Bucket CloudFormation template that the load balancer logs are in looks like:
AWSTemplateFormatVersion: "2010-09-09"
Description: Creates AWS resources for AWS load balancer logs
S3BucketAwsElbLogs:
Type: AWS::S3::Bucket
DeletionPolicy: "Retain"
Properties:
BucketName: "aws-elb-logs"
BucketEncryption:
ServerSideEncryptionConfiguration:
- ServerSideEncryptionByDefault:
SSEAlgorithm: "AES256"
LifecycleConfiguration:
Rules:
- Id: Delete-Logs-After-30Days
Status: Enabled
ExpirationInDays: 30
NotificationConfiguration:
LambdaConfigurations:
- Event: "s3:ObjectCreated:*"
Filter:
S3Key:
Rules:
- Name: "prefix"
Value: "ingress/"
- Name: "suffix"
Value: ".gz"
Function: !ImportValue lambda-promtail
Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.
We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.
Stalebots are also emotionless and cruel and can close issues which are still very relevant.
If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.
We regularly sort for closed issues which have a stale label sorted by thumbs up.
We may also:
- Mark issues as
revivableif we think it's a valid issue but isn't something we are likely to prioritize in the future (the issue will still remain closed). - Add a
keepalivelabel to silence the stalebot if the issue is very common/popular/important.
We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.
Hey @cstyan, I think the error in my previous comment may have been caused by trying to run an arm64 image on amd64 lambda.
I've just tested again using the latest main image (public.ecr.aws/grafana/lambda-promtail:main-2766e0a) on both arm64 and amd64 lambdas but I am still getting the same error as my original post.
2022-09-16T10:33:37.931+12:00 | write address: https://{REDACTED}/aws-lb
2022-09-16T10:33:37.931+12:00 | keep stream: false
2022-09-16T10:38:37.134+12:00 | START RequestId: 6553c683-f97e-4785-bd4a-bf656c6e70a7 Version: $LATEST
2022-09-16T10:38:37.397+12:00 | 2022-09-15 22:38:37.397454 I | calling the handler function resulted in a panic, the process should exit
2022-09-16T10:38:37.414+12:00 | END RequestId: 6553c683-f97e-4785-bd4a-bf656c6e70a7
2022-09-16T10:38:37.414+12:00 | REPORT RequestId: 6553c683-f97e-4785-bd4a-bf656c6e70a7 Duration: 278.55 ms Billed Duration: 639 ms Memory Size: 128 MB Max Memory Used: 32 MB Init Duration: 360.10 ms
2022-09-16T10:38:37.414+12:00 | Unknown application error occurred
Any suggestions would be much appreciated!
Upon further investigation of the lambda-promtail code it looks like S3 ingester was developed with AWS Application Load Balancers in mind. However there are also AWS Network Load Balancers and AWS Classic Load Balancers that output access logs to S3. Unfortunately the file names AND log formats differ for each type of load balancer.
For reference I am using AWS Network Load Balancers.
I've managed to fix the panic in this commit by updating the regex used to match the file name of the log and timestamp in the log line (and the regex's should now support all three types of load balancers).
However I am still not receiving any logs into Loki. The lambda-promtail logs now look like:
2022-09-16T16:01:26.278+12:00 | write address: https://{REDACTED}/aws-lb
2022-09-16T16:01:26.278+12:00 | keep stream: false
2022-09-16T16:01:26.280+12:00 | START RequestId: a5cba339-b29b-44db-baa3-447331c25d84 Version: $LATEST
2022-09-16T16:01:26.587+12:00 | END RequestId: a5cba339-b29b-44db-baa3-447331c25d84
2022-09-16T16:01:26.587+12:00 | REPORT RequestId: a5cba339-b29b-44db-baa3-447331c25d84 Duration: 306.43 ms Billed Duration: 393 ms Memory Size: 128 MB Max Memory Used: 32 MB Init Duration: 86.48 ms
2022-09-16T16:01:31.873+12:00 | START RequestId: b7db0647-758d-441d-91f9-1f5f2b00f0eb Version: $LATEST
2022-09-16T16:01:31.894+12:00 | END RequestId: b7db0647-758d-441d-91f9-1f5f2b00f0eb
2022-09-16T16:01:31.894+12:00 | REPORT RequestId: b7db0647-758d-441d-91f9-1f5f2b00f0eb Duration: 19.39 ms Billed Duration: 20 ms Memory Size: 128 MB Max Memory Used: 32 MB
2022-09-16T16:03:36.592+12:00 | START RequestId: 17b7a6d3-4319-4500-b70a-dd16cf8095a5 Version: $LATEST
2022-09-16T16:03:36.691+12:00 | END RequestId: 17b7a6d3-4319-4500-b70a-dd16cf8095a5
2022-09-16T16:03:36.691+12:00 | REPORT RequestId: 17b7a6d3-4319-4500-b70a-dd16cf8095a5 Duration: 97.80 ms Billed Duration: 98 ms Memory Size: 128 MB Max Memory Used: 33 MB
Is there any debugging logging in lambda-promtail that I can enable?
Since my last comment I figured out that AWS Network Load Balancers log timestamps are not RFC3339 compatible as they don't contain the timestamp they were recorded in. However they are in recorded in UTC.
I've created PR #7194 which fixes the regex and timestamp issues I found regarding ingesting AWS Network Load Balancers logs.
I was able to figure this out by replacing the return err lines with fmt.Println(err) so that the error was printed out in the logs. I'm not sure why the error was not being output by the lambda runtime in my case.