openwhisk icon indicating copy to clipboard operation
openwhisk copied to clipboard

Webaction runs twice within one activation, and reported as failure when it did not actually fail

Open mretallack opened this issue 1 year ago • 2 comments

Environment details:

  • K8s Deployment using AWS ESK, 1.29

Steps to reproduce the issue:

  1. Have an incoming HTTP POST from AWS IoT (using HTTP as a Action Rule)
  2. Have a webaction (visible on internet), using python:3.9 action runner
  3. Monitor the failures of the web action.

Provide the expected results and outputs:

Expected output

2024-05-02 08:07:27 aa9313bdf52a490b9313bdf52ac90b6b python:3 warm  170ms      success         dataspace/iot_action:0.0.76:

    "logs": [
        "2024-05-02T08:07:27.203249047Z stdout: Message type is None",
        "2024-05-02T08:07:27.203353817Z stdout: Authenticated with deployment test",
        "2024-05-02T08:07:27.203359561Z stdout: Processing IoT message iot/statusResponse",
        "2024-05-02T08:07:27.203362535Z stdout: Message: {\"location\": {\"latitude\": \"x\", \"locationDescription\": \"\", \"longitude\": \"x\"}, \"tenantName\": \"LoadTest1\"}",
        "2024-05-02T08:07:27.203366872Z stdout: Publishing statusResponse to topic example.fct.iot_test_statusresponse.1",
        "2024-05-02T08:07:27.203369844Z stdout: New keys ['\"1454419940501\"']",
        "2024-05-02T08:07:27.203372379Z stdout: Finished processing",
        "2024-05-02T08:07:27.20337515Z  stdout: returning: {\"headers\": {\"Content-Type\": \"application/json\", \"Connection\": \"close\"}, \"statusCode\": 200, \"body\": \"\\\"OK\\\"\"}"
    ],

Provide the actual results and outputs:

    "logs": [
        "2024-05-02T08:03:35.441164892Z stdout: Message type is None",
        "2024-05-02T08:03:35.441233404Z stdout: Authenticated with deployment test",
        "2024-05-02T08:03:35.44123762Z  stdout: Processing IoT message iot/statusResponse",
        "2024-05-02T08:03:35.441241006Z stdout: Message: {\"location\": {\"latitude\": \"X\", \"locationDescription\": \"Engineering Lab\", \"longitude\": \"X\"}, \"tenantName\": \"LoadTest1\"}",
        "2024-05-02T08:03:35.441244653Z stdout: Publishing statusResponse to topic topic example.fct.iot_test_statusresponse.1",
        "2024-05-02T08:03:35.441247605Z stdout: New keys ['\"1454419850988\"']",
        "2024-05-02T08:03:35.441250359Z stdout: Finished processing",
        "2024-05-02T08:03:35.441253345Z stdout: returning: {\"headers\": {\"Content-Type\": \"application/json\", \"Connection\": \"close\"}, \"statusCode\": 200, \"body\": \"\\\"OK\\\"\"}",
        "2024-05-02T08:05:10.601380606Z stdout: Message type is None",
        "2024-05-02T08:05:10.601398036Z stdout: Authenticated with deployment test",
        "2024-05-02T08:05:10.601401152Z stdout: Processing Stratos message iot/status",
        "2024-05-02T08:05:10.601404588Z stdout: Message: {\"commonName\": \"1454419850988\", .......,
        "2024-05-02T08:05:10.6014228Z   stdout: Publishing status to topic example.fct.iot_test_status.1",
        "2024-05-02T08:05:10.601426211Z stdout: New keys ['\"1454419850988\"']",
        "2024-05-02T08:05:10.601428478Z stdout: Finished processing",
        "2024-05-02T08:05:10.601431725Z stdout: returning: {\"headers\": {\"Content-Type\": \"application/json\", \"Connection\": \"close\"}, \"statusCode\": 200, \"body\": \"\\\"OK\\\"\"}",
        "2024-05-02T08:05:10.604225532Z stderr: The action did not initialize or run as expected. Log data might be missing."
    ],

Additional information you deem important:

It appears that the same activation is capturing the output from two runs. And therefore the second (or first) run is assumed to of failed.

I have tried adding a sleep into the while loop (thinking it was a timeing issue), but still does it·

https://github.com/apache/openwhisk-runtime-python/blob/1389697e3251ef08a930ed95629379551a8d3c9a/core/python3Action/lib/launcher.py#L54

mretallack avatar May 02 '24 08:05 mretallack

You might see this if the webaction returned a 50x status code - nginx may be automatically retrying the request and hence running the action again. You can confirm this by checking the nginx logs for the action in question.

rabbah avatar May 06 '24 04:05 rabbah

Thanks for the suggestion,

I will retry it a few times over the next day, but this is what I see on the access log in nginx:

10.0.0.65 - - [02/May/2024:10:31:17 +0000] [#tid_8daa7a35d7e1b39c3114d4c94d99ae12] POST /api/v1/web/dataspace/iot_package/iot_action HTTP/1.1 200 4 - AHC/2.1 172.20.41.68:8080
10.0.1.70 - - [02/May/2024:10:31:18 +0000] [#tid_63cd895864243ab89c8ed39fec47d83f] POST /api/v1/web/dataspace/iot_package/iot_action HTTP/1.1 200 4 - AHC/2.1 172.20.41.68:8080

From reading this it looks like both are returning 200.

I am wondering if AWS do something strange with there HTTP client and send multiple requests over the same connection and dont wait for a reply (only a theory)

I will keep looking at the logs until somthing else comes up :)

mretallack avatar May 07 '24 09:05 mretallack