results icon indicating copy to clipboard operation
results copied to clipboard

Storing logs fails with `operation error S3: UploadPart, https response error StatusCode: 404 ... api error NoSuchUpload: UnknownError`

Open iainsproat opened this issue 11 months ago • 4 comments

There is a step in a pipeline run which fails with an error (the application exits with a non-zero exit code). Unfortunately the log content is not stored.

All other pipelines and steps have their logs stored as expected, so this does not seem to be a general configuration problem.

Expected Behavior

All steps in the pipeline would have their logs stored, even (and especially) if the user application exited with a non-zero exit code. A copy of the pipeline can be found below.

This is consistently failing for all numerous repeats.

Actual Behavior

The Tekton results API logs the following two errors:

{
  "level": "error",
  "ts": 1709314179.3972826,
  "caller": "v1alpha2/logs.go:103",
  "msg": "operation error S3: UploadPart, https response error StatusCode: 404, RequestID: tx00000563224ac1343fcba-0065e21083-4fb58942-ams3c, HostID: 4fb58942-ams3c-ams3-zg03, api error NoSuchUpload: UnknownError",
  "stacktrace": "github.com/tektoncd/results/pkg/api/server/v1alpha2.(*Server).UpdateLog.func1\n\tgithub.com/tektoncd/results/pkg/api/server/v1alpha2/logs.go:103\ngithub.com/tektoncd/results/pkg/api/server/v1alpha2.(*Server).UpdateLog\n\tgithub.com/tektoncd/results/pkg/api/server/v1alpha2/logs.go:156\ngithub.com/tektoncd/results/proto/v1alpha2/results_go_proto._Logs_UpdateLog_Handler\n\tgithub.com/tektoncd/results/proto/v1alpha2/results_go_proto/api_grpc.pb.go:686\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/recovery.StreamServerInterceptor.func1\n\tgithub.com/grpc-ecosystem/go-grpc-middleware/[email protected]/interceptors/recovery/interceptors.go:48\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n\tgithub.com/grpc-ecosystem/[email protected]/chain.go:49\ngithub.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).StreamServerInterceptor.func1\n\tgithub.com/grpc-ecosystem/[email protected]/server_metrics.go:121\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n\tgithub.com/grpc-ecosystem/[email protected]/chain.go:49\ngithub.com/grpc-ecosystem/go-grpc-middleware/auth.StreamServerInterceptor.func1\n\tgithub.com/grpc-ecosystem/[email protected]/auth/auth.go:66\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n\tgithub.com/grpc-ecosystem/[email protected]/chain.go:49\ngithub.com/grpc-ecosystem/go-grpc-middleware/logging/zap.StreamServerInterceptor.func1\n\tgithub.com/grpc-ecosystem/[email protected]/logging/zap/server_interceptors.go:53\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n\tgithub.com/grpc-ecosystem/[email protected]/chain.go:49\ngithub.com/grpc-ecosystem/go-grpc-middleware/tags.StreamServerInterceptor.func1\n\tgithub.com/grpc-ecosystem/[email protected]/tags/interceptors.go:39\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n\tgithub.com/grpc-ecosystem/[email protected]/chain.go:49\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1\n\tgithub.com/grpc-ecosystem/[email protected]/chain.go:58\ngoogle.golang.org/grpc.(*Server).processStreamingRPC\n\tgoogle.golang.org/[email protected]/server.go:1673\ngoogle.golang.org/grpc.(*Server).handleStream\n\tgoogle.golang.org/[email protected]/server.go:1787\ngoogle.golang.org/grpc.(*Server).serveStreams.func2.1\n\tgoogle.golang.org/[email protected]/server.go:1016"
}
{
  "level": "error",
  "ts": 1709314179.3973947,
  "caller": "zap/options.go:212",
  "msg": "finished streaming call with code Unknown",
  "grpc.auth_disabled": true,
  "grpc.start_time": "2024-03-01T17:29:38Z",
  "system": "grpc",
  "span.kind": "server",
  "grpc.service": "tekton.results.v1alpha2.Logs",
  "grpc.method": "UpdateLog",
  "peer.address": "10.244.1.76:58650",
  "grpc.user": "system:serviceaccount:tekton-pipelines:tekton-results-watcher",
  "grpc.issuer": "https://kubernetes.default.svc.cluster.local",
  "error": "operation error S3: UploadPart, failed to rewind transport stream for retry, request stream is not seekable",
  "grpc.code": "Unknown",
  "grpc.time_duration_in_ms": 766,
  "stacktrace": "github.com/grpc-ecosystem/go-grpc-middleware/logging/zap.DefaultMessageProducer\n\tgithub.com/grpc-ecosystem/[email protected]/logging/zap/options.go:212\ngithub.com/grpc-ecosystem/go-grpc-middleware/logging/zap.StreamServerInterceptor.func1\n\tgithub.com/grpc-ecosystem/[email protected]/logging/zap/server_interceptors.go:61\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n\tgithub.com/grpc-ecosystem/[email protected]/chain.go:49\ngithub.com/grpc-ecosystem/go-grpc-middleware/tags.StreamServerInterceptor.func1\n\tgithub.com/grpc-ecosystem/[email protected]/tags/interceptors.go:39\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n\tgithub.com/grpc-ecosystem/[email protected]/chain.go:49\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1\n\tgithub.com/grpc-ecosystem/[email protected]/chain.go:58\ngoogle.golang.org/grpc.(*Server).processStreamingRPC\n\tgoogle.golang.org/[email protected]/server.go:1673\ngoogle.golang.org/grpc.(*Server).handleStream\n\tgoogle.golang.org/[email protected]/server.go:1787\ngoogle.golang.org/grpc.(*Server).serveStreams.func2.1\n\tgoogle.golang.org/[email protected]/server.go:1016"
}

I can see that a corresponding log record has been stored:

{
  "name": "e271e90551-68622d75ea-b5dd1651b51a2a99cc0d/results/8b706226-37fa-4e79-add7-5e6f1d4e7534/records/2c4acb8d-3234-3176-93b9-6df94deb472c",
  "id": "7ca4d473-7f0f-41e1-92b3-a85083b9e7ad",
  "uid": "7ca4d473-7f0f-41e1-92b3-a85083b9e7ad",
  "data": {
    "type": "results.tekton.dev/v1alpha2.Log",
    "value": "eyJraW5kIjogIkxvZyIsICJzcGVjIjogeyJ0eXBlIjogIlMzIiwgInJlc291cmNlIjogeyJ1aWQiOiAiMDAxNDNmOTktODU5MC00MjJmLWFkMTQtM2FkOThjOGFhYWE3IiwgImtpbmQiOiAiVGFza1J1biIsICJuYW1lIjogImI1ZGQxNjUxYjUxYTJhOTljYzBkLTcxZmExNDI0MzctM2EwMGQ0YmJjMi0wIiwgIm5hbWVzcGFjZSI6ICJlMjcxZTkwNTUxLTY4NjIyZDc1ZWEtYjVkZDE2NTFiNTFhMmE5OWNjMGQifX0sICJzdGF0dXMiOiB7InBhdGgiOiAiZTI3MWU5MDU1MS02ODYyMmQ3NWVhLWI1ZGQxNjUxYjUxYTJhOTljYzBkLzJjNGFjYjhkLTMyMzQtMzE3Ni05M2I5LTZkZjk0ZGViNDcyYy9iNWRkMTY1MWI1MWEyYTk5Y2MwZC03MWZhMTQyNDM3LTNhMDBkNGJiYzItMC1sb2ciLCAic2l6ZSI6IDIwNzA5Mzc2MH0sICJtZXRhZGF0YSI6IHsidWlkIjogIjJjNGFjYjhkLTMyMzQtMzE3Ni05M2I5LTZkZjk0ZGViNDcyYyIsICJuYW1lIjogImI1ZGQxNjUxYjUxYTJhOTljYzBkLTcxZmExNDI0MzctM2EwMGQ0YmJjMi0wLWxvZyIsICJuYW1lc3BhY2UiOiAiZTI3MWU5MDU1MS02ODYyMmQ3NWVhLWI1ZGQxNjUxYjUxYTJhOTljYzBkIiwgImNyZWF0aW9uVGltZXN0YW1wIjogbnVsbH0sICJhcGlWZXJzaW9uIjogInJlc3VsdHMudGVrdG9uLmRldi92MWFscGhhMiJ9"
  },
  "etag": "7ca4d473-7f0f-41e1-92b3-a85083b9e7ad-1709314181023022210",
  "createdTime": "2024-03-01T17:29:38.541771Z",
  "createTime": "2024-03-01T17:29:38.541771Z",
  "updatedTime": "2024-03-01T17:29:41.023022Z",
  "updateTime": "2024-03-01T17:29:41.023022Z"
}

For convenience, the decoded base64 content of the above value is:

{
  "kind": "Log",
  "spec": {
    "type": "S3",
    "resource": {
      "uid": "00143f99-8590-422f-ad14-3ad98c8aaaa7",
      "kind": "TaskRun",
      "name": "b5dd1651b51a2a99cc0d-71fa142437-3a00d4bbc2-0",
      "namespace": "e271e90551-68622d75ea-b5dd1651b51a2a99cc0d"
    }
  },
  "status": {
    "path": "e271e90551-68622d75ea-b5dd1651b51a2a99cc0d/2c4acb8d-3234-3176-93b9-6df94deb472c/b5dd1651b51a2a99cc0d-71fa142437-3a00d4bbc2-0-log",
    "size": 207093760
  },
  "metadata": {
    "uid": "2c4acb8d-3234-3176-93b9-6df94deb472c",
    "name": "b5dd1651b51a2a99cc0d-71fa142437-3a00d4bbc2-0-log",
    "namespace": "e271e90551-68622d75ea-b5dd1651b51a2a99cc0d",
    "creationTimestamp": null
  },
  "apiVersion": "results.tekton.dev/v1alpha2"
}

When inspecting the content of the s3 blob storage directly I can see that no bucket has been created at the given path.

Likewise the logs cannot be retrieved via the API, as would be expected given the above symptoms.

Steps to Reproduce the Problem

Unfortunately I am not able to yet recreate a minimally failing example of the user workload generating the logs. The pipeline is as follows:

apiVersion: tekton.dev/v1
kind: Pipeline
metadata:
  labels:
    app.kubernetes.io/instance: pipelines
    app.kubernetes.io/name: pipelines
    app.kubernetes.io/part-of: e271e90551
    app.kubernetes.io/version: 68622d75ea
    kubernetes.io/metadata.name: e271e90551-68622d75ea-b5dd1651b51a2a99cc0d
  name: e271e90551-68622d75ea
  namespace: e271e90551-68622d75ea-b5dd1651b51a2a99cc0d
spec:
  tasks:
  - name: 71fa142437-3a00d4bbc2-0
    taskSpec:
      metadata: {}
      spec: null
      steps:
      - args:
        - /userinput/values.json
        command:
        - dotnet
        - function.dll
        computeResources:
          limits:
            cpu: "1"
            memory: 1000Mi
          requests:
            cpu: "1"
            memory: 1000Mi
        image: private_registry/71fa142437:d66859c
        name: 71fa142437-3a00d4bbc2-0
        volumeMounts:
        - mountPath: /userinput
          name: 71fa142437-3a00d4bbc2-0
          readOnly: true
      volumes:
      - name: 71fa142437-3a00d4bbc2-0
        secret:
          secretName: 71fa142437-3a00d4bbc2-0

Additional Info

  • Kubernetes version:

v1.28.2

  • Tekton Pipeline version:

v0.56.1

  • Tekton Results version

v0.9.1

iainsproat avatar Mar 01 '24 20:03 iainsproat

Excerpt from the above:

"status": {
    "path": "e271e90551-68622d75ea-b5dd1651b51a2a99cc0d/2c4acb8d-3234-3176-93b9-6df94deb472c/b5dd1651b51a2a99cc0d-71fa142437-3a00d4bbc2-0-log",
    "size": 207093760

Does this status.size value indicate a log size of 207Mb? Or am I misinterpreting this?

iainsproat avatar Mar 01 '24 20:03 iainsproat

To answer my own comment, it seems to be bytes so this would be 207Mb: https://github.com/tektoncd/results/blob/32314594d4e5cf6de35e13cb7429ae216a969781/pkg/api/server/v1alpha2/logs.go#L173C21-L173C28

iainsproat avatar Mar 20 '24 10:03 iainsproat

@iainsproat I believe the size of the log has something to do with this. Are you still having difficulties in storing large logs?

sayan-biswas avatar Mar 28 '24 18:03 sayan-biswas

Hi @sayan-biswas - yes, this continues to be an issue for me.

Unfortunately I haven't had time to yet recreate a minimal replicating example so I'm as yet not sure if it's an issue with our cloud provider, my configuration of s3 or logs in tekton results, or something amiss with tekton results.

iainsproat avatar Mar 28 '24 19:03 iainsproat