rabbitmq-server icon indicating copy to clipboard operation
rabbitmq-server copied to clipboard

Make CI known flakes and errors

Open lhoguin opened this issue 1 year ago • 10 comments

This ticket is meant to track down flakes and errors that happen in Make CI (currently enabled in main and pull requests.

OTP-27 bugs

  • [ ] inetrc Kernel parameter doesn't accept atoms anymore on 27 (resolution ongoing in https://github.com/erlang/otp/issues/8899)
  • [x] compatibility errors between the MQTT lib used by java_SUITE of rabbitmq_mqtt, and Erlang/OTP 27.1.1 (specifically) - (resolution in https://github.com/erlang/otp/issues/8908 fixed in 27.1.2)

Flakes

  • [ ] Failed to create SSL certificates - multiple test suites, one example in comment
  • [ ] rabbit > amqp_system > access_failure - details in comment
  • [ ] rabbit > metrics_SUITE > connection_metric_count_test - details in comment
  • [ ] rabbit > per_node_limit_SUITE > channel_consumers_limit - details in comment
  • [ ] rabbitmq_amqp_client > management_SUITE > cluster_size_3 > queue_topology - details in comment
  • [ ] rabbitmq_amqp_client > management_SUITE > cluster_size_3 > classic_queue_stopped - details in comment
  • [ ] rabbitmq_cli sometimes fails with 1383 tests, 5 failures - the failures are all tests that set the disk free limit. It is not yet known whether they are proper errors, normal flakes or caused by differences in GH runners
  • [ ] rabbitmq_federation > exchange_SUITE > rolling_upgrade > child_id_format - timetrap timeout
  • [ ] rabbitmq_federation > queue_SUITE > classic_queue > without_disambiguate > cluster_size_1 > dynamic_plugin_stop_start - details in comment
  • [x] rabbitmq_management > clustering_SUITE > non_parallel_tests > queue_on_other_node - details in comment
  • [x] rabbitmq_management > clustering_prop_SUITE > non_parallel_tests > prop_connection_channel_counts_test - details in comment
  • [ ] rabbitmq_mqtt > parallel-ct-set-1 > mqtt_shared_SUITE > cluster_size_3 > v4 rabbit_mqtt_qos0_queue_kill_node - test failure (details in comment) leads to subsequent tests being unable to continue and to the job timing out after 30 minutes

lhoguin avatar Oct 01 '24 09:10 lhoguin