NBomber icon indicating copy to clipboard operation
NBomber copied to clipboard

ClusterMode doesn't report/end correctly with NBomber Studio

Open MattKeenum opened this issue 6 months ago • 15 comments
trafficstars

It doesn't happen every time, but occasionally, when running in Cluster Mode, Nbomber Studio will report that the test is running, but never receives any updates, nor does it close when the test is finished.

This test has been "running" since yesterday (but was actually finished yesterday):

Image

MattKeenum avatar Apr 23 '25 13:04 MattKeenum

Hi @MattKeenum We noticed an issue where some agents became unavailable, which could disrupt reporting. We've implemented a fix—could you please give it a try? For this, please use these dependencies:

AntyaDev avatar May 07 '25 16:05 AntyaDev

@AntyaDev I have tried with the libraries suggested, but the problem still persists.

Image

For added clarity, I am running this in Kubernetes in cluster mode. In this example, I was using 19 Agents and 1 Coordinator. The job lasted about 30 minutes, but I never saw any stats and it shows that the test is still "running".

Image

MattKeenum avatar May 29 '25 13:05 MattKeenum

Hi @MattKeenum , Thanks. Very strange. I think we need more information from you.

AntyaDev avatar May 29 '25 13:05 AntyaDev

@MattKeenum Can you please check that the SessionID matches the session that you see in NBomber Studio?

AntyaDev avatar May 29 '25 13:05 AntyaDev

Yes, it looks like the session Id matches:

Image

MattKeenum avatar May 29 '25 17:05 MattKeenum

@MattKeenum, Does it always work this way for all tests, or is this behavior specific to the large load tests? What’s unclear to me is that the initial session data seems to be written to the database — we can see it in the UI — but for some reason, the real-time metrics aren't written at all, since they don’t show up. Could you try opening Chrome Developer Tools and check if there are any runtime errors when attempting to fetch the metrics via HTTP?

AntyaDev avatar May 30 '25 09:05 AntyaDev

@AntyaDev I tried with only 3 pods (1 coordinator and 2 agents) and it kind of worked. If I refreshed the page, I would momentarily get metrics as the test was running, but then it seems that a JavaScript function (I'm guessing) would fail and the metrics would go away. If I refreshed, I saw the metrics again, but only for a few seconds. Once the test completed, it showed as completed with all of the metrics as I would expect.

Image

I inspected with Chrome Dev Tools as it was running and nothing really stood out to me. There were a few errors that may be the cause of the invalid JS updates, but I'm not sure.

There was an error with JS needing to be enabled, but I confirmed that my browser has it enabled:

Image

Image

Next, I saw these issues:

Image

Lastly, there were some uncaught errors:

Image

Also, I watched the Network Tab, and all Status codes were success (200 range). I didn't see any network errors.


I will run this again, but with more pods to see if I can replicate the metrics not appearing at all / the test not "completing"

MattKeenum avatar May 30 '25 16:05 MattKeenum

I ran it again with 20 pods (1 Coordinator and 19 Agents). The session appears at first as always, but no metrics were ever loaded and when the test completed, NBomber Studio still showed that it was running. I inspected with Chrome Dev Tools again, but nothing was any different from the above post.

MattKeenum avatar May 30 '25 17:05 MattKeenum

@MattKeenum We’ll try to fix this bug. Could you please clarify – are the metrics missing in the 'Summary' tab in field 'Status code' only, or also on the 'Charts' tab?

OlenaKostash avatar Jun 02 '25 17:06 OlenaKostash

@OlenaKostash The metrics are missing from both:

Image

Image

MattKeenum avatar Jun 02 '25 18:06 MattKeenum

@MattKeenum When you run a load test that encounters such issues, have you checked the log file? Are there any errors in the logs?

OlenaKostash avatar Jun 02 '25 19:06 OlenaKostash

@OlenaKostash Yes, I reviewed the logs and there was an error reported that I have not seen before:

2025-05-30 16:54:01.969 +00:00 [INF] [ThreadId:27] Starting bombing... 2025-05-30 16:54:02.783 +00:00 [ERR] [ThreadId:8] 23505: duplicate key value violates unique constraint "nb_sessions_pkey"

DETAIL: Detail redacted as it may contain sensitive data. Specify 'Include Error Detail' in the connection string to include this information. Npgsql.PostgresException (0x80004005): 23505: duplicate key value violates unique constraint "nb_sessions_pkey"

DETAIL: Detail redacted as it may contain sensitive data. Specify 'Include Error Detail' in the connection string to include this information. at Npgsql.Internal.NpgsqlConnector.ReadMessageLong(Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean readingNotifications, Boolean isReadingPrependedMessage) at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder1.StateMachineBox1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token) at Npgsql.NpgsqlDataReader.<ReadMessage>g__ReadMessageSequential|49_0(NpgsqlConnector connector, Boolean async) at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken) at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken) at Npgsql.NpgsqlCommand.ExecuteReader(Boolean async, CommandBehavior behavior, CancellationToken cancellationToken) at Npgsql.NpgsqlCommand.ExecuteReader(Boolean async, CommandBehavior behavior, CancellationToken cancellationToken) at Npgsql.NpgsqlCommand.ExecuteScalar(Boolean async, CancellationToken cancellationToken) at RepoDb.DbConnectionExtension.InsertAsyncInternalBase[TEntity,TResult](IDbConnection connection, String tableName, TEntity entity, IEnumerable1 fields, String hints, Nullable1 commandTimeout, String traceKey, IDbTransaction transaction, ITrace trace, IStatementBuilder statementBuilder, CancellationToken cancellationToken) at NBomber.Sinks.Timescale.TimescaleDbSink.Start(SessionStartInfo sessionInfo) Exception data: Severity: ERROR SqlState: 23505 MessageText: duplicate key value violates unique constraint "nb_sessions_pkey" Detail: Detail redacted as it may contain sensitive data. Specify 'Include Error Detail' in the connection string to include this information. SchemaName: public TableName: nb_sessions ConstraintName: nb_sessions_pkey File: nbtinsert.c Line: 666 Routine: _bt_check_unique


I see the same error in the agents as well.

MattKeenum avatar Jun 02 '25 19:06 MattKeenum

@MattKeenum Thank you a lot for your contribution. We’ve already found and fixed this bug. We're still working on other possible issues. I’ll let you know once there’s an update, most likely tomorrow we will ship the fix

OlenaKostash avatar Jun 02 '25 20:06 OlenaKostash

@MattKeenum, Hi, we have released a Studio update. Please use: NBomber Studio 0.3.0 latest release of NBomber.Sinks.Timescale (0.8.0) latest release of NBomber (6.0.2) the only thing, when you run NBomber in Cluster mode, NBomber Studio doesn't display status code tables. We know about this issue, and going to fix it soon but all scenario stats works correctly

OlenaKostash avatar Jun 06 '25 15:06 OlenaKostash

Hi @MattKeenum We added a bug related task to fix displaying StatusCodes for Cluster Mode https://github.com/PragmaticFlow/NBomber/issues/849

AntyaDev avatar Jun 06 '25 15:06 AntyaDev

@MattKeenum, Hi, we have released a Studio update. Please use: NBomber Studio 0.3.0 latest release of NBomber.Sinks.Timescale (0.8.0) latest release of NBomber (6.0.2) the only thing, when you run NBomber in Cluster mode, NBomber Studio doesn't display status code tables. We know about this issue, and going to fix it soon but all scenario stats works correctly

This is working as expected. I ran several scenarios with the number of agents ranging from 2 to 300 with all of them closing at the end of the test as we would expect.

MattKeenum avatar Jul 07 '25 17:07 MattKeenum