artillery icon indicating copy to clipboard operation
artillery copied to clipboard

Artillery runs forever and never exit (v2)

Open endyjasmi opened this issue 7 years ago • 41 comments

When the command artillery quick -d 60 -r 256 https://www.company.com/ is executed, artillery will run fine for the first few request and later keep on looping with the following progress report and never exit.

Report for the previous 10s @ 2017-01-11T09:08:56.069Z
  Scenarios launched:  0
  Scenarios completed: 0
  Requests completed:  0
  RPS sent: NaN
  Request latency:
    min: NaN
    max: NaN
    median: NaN
    p95: NaN
    p99: NaN
  Scenario duration:
    min: NaN
    max: NaN
    median: NaN
    p95: NaN
    p99: NaN

Following is the environment it runs in:

root@nodejs-512mb-lon1-01:~/artillery# node -v
v6.9.2
root@nodejs-512mb-lon1-01:~/artillery# artillery -V
1.5.0-22

Following is the log file generated by running DEBUG=* artillery quick -d 60 -r 256 https://www.bloomon.nl/ 2>&1 | tee debug.log: https://gist.github.com/endyjasmi/a597adc6a0fc5e1c874d7abbe9a93262

Thanks in advance for the helps.

endyjasmi avatar Jan 11 '17 01:01 endyjasmi

Thanks for the report @endyjasmi. I'm unable to reproduce the issue here. The following command completes fine for me:

artillery quick -d 60 -r 256 http://localhost:8080/

localhost:8080 is an nginx instance, the tests complete when left to run normally and when I kill nginx halfway through a run to simulate the target failing under load.

What does the normal Artillery output look like on your system? Do you see reported latency metrics for the full 60 seconds? Are there any errors reported? What does pgrep -lfa node output once Artillery is printing empty reports?

hassy avatar Jan 11 '17 10:01 hassy

@hassy From what I understand, this happens when the connection from artillery to the server is high.

endyjasmi avatar Jan 13 '17 03:01 endyjasmi

@endyjasmi Can you post a log? (Just the normal output from Artillery as it runs the test). Can you also check the CPU usage of node processes and check that there are two of them still running?

hassy avatar Jan 13 '17 14:01 hassy

I see the same problem.

Report for the previous 10s @ 2017-02-15T16:24:03.192Z
  Scenarios launched:  0
  Scenarios completed: 0
  Requests completed:  0
  Concurrent users:    65
  RPS sent: NaN
  Request latency:
    min: NaN
    max: NaN
    median: NaN
    p95: NaN
    p99: NaN
  Scenario duration:
    min: NaN
    max: NaN
    median: NaN
    p95: NaN
    p99: NaN

It ran against an environment that never went down.

The load generator machine did experience high cpu usage, but then it went down to normal levels.

portenez avatar Feb 15 '17 16:02 portenez

@portenez thanks, can you post the full log? Did you ever see metrics being reported while Artillery was running or always just NaN?

hassy avatar Feb 15 '17 18:02 hassy

We just figured it out. It was an exception being thrown during custom js code run via beforeRequest. We fixed the exception, and the test was able to complete.

So, I guess there's still a bug. That an exception in one of the preProcessors will cause the whole test to hang

portenez avatar Feb 15 '17 19:02 portenez

Hi folks, I'm seeing this too, but when I'm using a prepared Websocket test. It gets to the end of the test, everything's fine, cluster being tested is fine, and then Artillery just keeps doing this forever:

Report for the previous 10s @ 2017-02-16T17:36:42.428Z
  Scenarios launched:  0
  Scenarios completed: 0
  Requests completed:  0
  RPS sent: NaN
  Request latency:
    min: NaN
    max: NaN
    median: NaN
    p95: NaN
    p99: NaN
  Scenario duration:
    min: NaN
    max: NaN
    median: NaN
    p95: NaN
    p99: NaN

igorclark avatar Feb 17 '17 08:02 igorclark

@portenez thanks, that's something to fix. @igorclark does it happen every time or only sometimes?

A reproducible test case would be amazing, but until then the only option I have is to review the code and try to spot any potential causes.

hassy avatar Feb 18 '17 19:02 hassy

@hassy don't know about every time, but certainly 9 times out of 10.

Would love to provide a full test case including back-end, but it's not a public project. Here's the majority of the test plan though - don't know if it'll be much use but hope so :-)

config:
  target: "ws://<host-name>/"
  phases:
    -
      duration: 60
      arrivalRate: 100
  ws:
    # Ignore SSL certificate errors
    # - useful in *development* with self-signed certs
    rejectUnauthorized: false
scenarios:
  -
    engine: "ws"
    flow:
      -
        loop:
          -
            send: '{<json-message-1>}'
          -
            send: '{<json-message-2>}'
          -
            send: '{<json-message-3>}'
          -
            send: '{<json-message-4>}'
        count: 100

igorclark avatar Feb 21 '17 10:02 igorclark

@igorclark Thanks for that. Just to confirm - you don't see any error messages printed to the console at any point during a run that ends up spinning forever?

If you don't mind, next time that happens, does the following command print anything:

pgrep -lfa node | grep worker.js

hassy avatar Feb 21 '17 21:02 hassy

Hey @hassy - no, no Artillery error messages. There are sometimes a couple of "ECONNRESET" or "connection closed by other party" when the server is getting really hammered, but that's normal, no? And when it ends up just spinning, it doesn't have any other errors, just the report every 10s.

I'll do another run tomorrow and run the pgrep command to get you some output.

igorclark avatar Feb 21 '17 21:02 igorclark

same issue image image I set a very height arrivalRate and it seems run forever and never stop.

hello.yml:

config:
  target: 'http://www.example.com'
  http:
    pool: 10
  phases:
    - duration: 60
      arrivalRate: 2000
  defaults:
    headers:
      Connection: close

scenarios:
  - flow:
    - get:
        url: "/"

tiaod avatar Mar 09 '17 06:03 tiaod

Ten minutes later, it's still running, image

tiaod avatar Mar 09 '17 06:03 tiaod

Thanks @tiaod. Seems like high arrival rates have something to do with it but I was unable to reproduce previously. I'll give it another go though.

hassy avatar Mar 10 '17 19:03 hassy

@tiaod are you wrapping Artillery in another script? Your issue may be related to #264 then.

hassy avatar Mar 10 '17 19:03 hassy

@hassy Oh I forgot to mention that. You are right, I use child_proccess spanw to run Artillery. But on a lower arrivalRate (eg: 1000), it could stop normally.

tiaod avatar Mar 11 '17 02:03 tiaod

Hi @hassy I have the same issue when testing a socket io server with high arrivalRate. From what I see, there are some connections which never get closed. I think when the server unable to handle that much request, artillery will just wait forever. The sever also seems to ignore these connections while artillery just waits and sends nothing. Is there a mechanism to let artillery close these connections itself instead of waiting for response from server? I would also like to consider these scenarios as failure.

The same thing happens if I don't start the server. Artillery will keep waiting.

peara avatar Mar 27 '17 04:03 peara

@peara Thanks for the info. Artillery does try to time out connections - with an explicit timeout setting for HTTP and the default value used by the underlying library for Socket.io - something to look into to try to solve this. Cheers

hassy avatar Mar 27 '17 05:03 hassy

Unsure if relevant but thought it worth posting.

Recently I came across this issue as well and it seemed to be issues with target and the scenarios url.

example:

  target: 'http://myurlexample.com/'
  http:
    pool: 10
  phases:
    - duration: 60
      arrivalRate: 10

scenarios:
  - flow:
    - get:
        url: "preview/1/"

resulted in: the NaN issue as described by @igorclark

Report @ 11:40:26(+0100) 2017-09-28
  Scenarios launched:  100
  Scenarios completed: 0
  Requests completed:  0
  Concurrent users:    289
  RPS sent: NaN
  Request latency:
    min: NaN
    max: NaN
    median: NaN
    p95: NaN
    p99: NaN
  Scenario duration:
    min: NaN
    max: NaN
    median: NaN
    p95: NaN
    p99: NaN

but changing the target to myurlexample.com and scenarios url to /preview/1/ removed the NaN's from the results.

Summary report @ 11:42:41(+0100) 2017-09-28
  Scenarios launched:  600
  Scenarios completed: 600
  Requests completed:  600
  Concurrent users:    60
  RPS sent: 9.93
  Request latency:
    min: 95.5
    max: 228.7
    median: 108.5
    p95: 146.8
    p99: 182.9
  Scenario duration:
    min: 105
    max: 303
    median: 132
    p95: 191.5
    p99: 239.5
  Scenario counts:
    0: 600 (100%)
  Codes:
    200: 600

cwthompson avatar Sep 28 '17 10:09 cwthompson

We suffered this when using an invalid request URL (literal url). This issue describes the details https://github.com/shoreditch-ops/artillery/issues/512.

This would explain why prefixing the path worked for @cwthompson, as any URL which is not prefixed by / is treated as an absolute URL.

Limess avatar Aug 07 '18 13:08 Limess

Hey @hassy 👋

I'm experiencing the same symptoms.

Artillery: 1.6.0-28 Node.js: v12.2.0 OS: darwin/x64

My config looks something like:

config:
  target: "https://example.com"
  phases:
    - duration: 60
      arrivalRate: 20
  defaults:
    headers:
      x-api-key: 'XXXXXXX'
scenarios:
    flow:
      - get:
          url: "/example"
          qs:
            foo: "bar"

The issue is intermittent (occurring ~2/3rds of the time). Below are two runs, one run immediately after the other with no config changes.

Run no. 1 (error)

$ artillery run example.yml

Started phase 0, duration: 60s @ 16:57:05(+0100) 2019-05-20
Report @ 16:57:15(+0100) 2019-05-20
Elapsed time: 10 seconds
  Scenarios launched:  199
  Scenarios completed: 0
  Requests completed:  0
  RPS sent: 20.1
  Request latency:
    min: NaN
    max: NaN
    median: NaN
    p95: NaN
    p99: NaN

Run no. 2 (success)

$ artillery run example.yml

Started phase 0, duration: 60s @ 16:57:17(+0100) 2019-05-20
Report @ 16:57:27(+0100) 2019-05-20
Elapsed time: 10 seconds
  Scenarios launched:  199
  Scenarios completed: 86
  Requests completed:  86
  RPS sent: 20.12
  Request latency:
    min: 124.4
    max: 202.6
    median: 137.4
    p95: 182.5
    p99: 201.6
  Codes:
    200: 86

If I get to the bottom of it I'll post here. If you have any ideas about what I might be doing wrong, or if I can provide any more information please let me know!

Thanks

JoeScho avatar May 20 '19 16:05 JoeScho

I seem to be seeing the same issue, but have a bit more info that might be useful. First, see this screenshot, showing the issue with the hung test:

Screenshot_1

The log output on the service has stopped, all the received requests have completed successfully. The test runner itself is in the hung state and sending no more requests (the test should have ended at 1 minute).

The interesting part is that, when I kill the test runner, it seems like one last small batch of requests is fired off just before artillery exits:

Screenshot_2

Notice the last two requests in the server log, that are received immediately upon killing artillery. You can also see that these requests are then immediately aborted (the finished=false implying that the connection was closed on these requests before a response was sent, likely because artillery exited).

My guess would be that artillery is hung because it's waiting for these requests, but the requests were never actually sent. And then killing the process causes them to somehow get flushed out and finally sent.

kbjr avatar Dec 15 '19 21:12 kbjr

This is so annoying issue :sob:

I have artillery for stress tests and I want to find a limits of server - and artillery found it, but I am not able to get some reasonable output from artillery due this issue.

My setup - I run this 2x parallel in docker containers to avoid problems with CPU overload.

config:
  target: "ws://raspberry:3000/ws?templateId=1&username=JakeTheDog"
  phases:
    - duration: 120
      arrivalRate: 30
      name: "Stress creating connection"
  processor: "./utils/elasticsearch.js"

scenarios:
  - engine: "ws"
    flow:
      - send: "ping pong"
      - think: 1

The problem begins when the server begins to have data processing problems and starts to have timeouts and eventually crashes. After that, artillery starts show

Report @ 14:05:46(+0000) 2020-01-17
Elapsed time: 11 minutes, 10 seconds
  Scenarios launched:  0
  Scenarios completed: 0
  Requests completed:  0
  RPS sent: NaN
  Request latency:
    min: NaN
    max: NaN
    median: NaN
    p95: NaN
    p99: NaN

Any solution for that? It is so annoying due I send results from artillery to elasticsearch and now, it doesn't send anything.

vojtech-cerveny avatar Jan 17 '20 14:01 vojtech-cerveny

@hassy - Can you provide some information about this issue? Without it, it doesn't make any sense to use it :man_shrugging:

So please provide some information, if you care about that or this will be still open and we should look for another tool for stress testing.

vojtech-cerveny avatar Jan 27 '20 12:01 vojtech-cerveny

It's an extremely difficult issue to reproduce (and to create a reproducible test case for). In your case specifically @vojtech-cerveny, there's nothing unusual about the test script itself, so the issue is somewhere else, but there's no way to try to isolate what it might be unless you can provide more information.

hassy avatar Jan 27 '20 14:01 hassy

I can confirm that the issue is consistently reproducible with high arrival rate.

Artillery: 1.6.0-29
Artillery Pro: not installed (https://artillery.io/pro)
Node.js: v12.14.1
OS: linux/x64

The higher arrival rate, the longer it takes from Artillery to finish and print a report. E.g. with timeout = 30s it succeeds to finish in 30s with arrival rate = 10, but it takes 60s for arrival rate = 500

I'm running it on a pretty beefy machine that has a lot of free CPU, yet I see CPU usage warnings

whitebyte avatar Feb 13 '20 10:02 whitebyte

@whitebyte If Artillery still completes successfully, that's not an instance of the issue. It would make sense that the test takes a longer time to complete if you're creating more virtual users (50 times more if you're going from 10 to 500/second).

CPU warnings-wise - that also still makes sense. Artillery uses one CPU by default, so you're likely maxing a CPU, especially with a high arrival rate (each arrival is a whole new TCP connection by default).

hassy avatar Feb 13 '20 16:02 hassy

I'm using only one phase with a constant arrival rate. After some arrival rate threshold Artillery starts working indefinitely, eventually spitting NaNs

whitebyte avatar Feb 13 '20 17:02 whitebyte

Turns out that elapsed time is ignored during the run and only # of scenarios launched is taken into account. It looks like Artillery calculates the required number of scenarios to be launched before start, and then just sticks to this number, ignoring the actual time passed. Which is OK, but should be documented somewhere.

whitebyte avatar Feb 16 '20 16:02 whitebyte

Hi, I am running artillery too for load testing.

Artillery: 1.6.1
Node.js: v10.15.3
OS: linux/x64

I am getting the same issue. Did anyone able to resolve this issue? If yes, please share. I am attaching the screens from my trial run. Here's the config I used.

config:
    target: "someURL"
    ensure:
      maxErrorRate: 1
    phases:
      - duration: 10
        arrivalRate: 5
        rampTo: 10
        name: "Warming up"
      - duration: 20
        arrivalRate: 10
        rampTo: 50
        name: "Max load"
scenarios:
  - engine: "ws"
    flow:
      - send: "hello"
      - think: 3
      - send: "how are you?"

1 2

ihemantkumar avatar Jul 07 '20 16:07 ihemantkumar