cloudberry icon indicating copy to clipboard operation
cloudberry copied to clipboard

[Bug] unstable resource group cases: resgroup_cancel_terminate_concurrency/resgroup_dumpinfo/resgroup_views

Open avamingli opened this issue 1 year ago • 8 comments

Cloudberry Database version

main

What happened

https://github.com/cloudberrydb/cloudberrydb/actions/runs/11102859132/job/30844343864

DIFF FILE: ../gpdb_src/src/test/isolation2/regression.diffs
----------------------------------------------------------------------

diff -I HINT: -I CONTEXT: -I GP_IGNORE: -U3 /code/gpdb_src/src/test/isolation2/expected/resgroup/resgroup_cancel_terminate_concurrency.out /code/gpdb_src/src/test/isolation2/results/resgroup/resgroup_cancel_terminate_concurrency.out
--- /code/gpdb_src/src/test/isolation2/expected/resgroup/resgroup_cancel_terminate_concurrency.out	2024-09-30 17:27:33.490609064 +0800
+++ /code/gpdb_src/src/test/isolation2/results/resgroup/resgroup_cancel_terminate_concurrency.out	2024-09-30 17:27:33.497609032 +0800
@@ -293,9 +293,10 @@
 1q: ... <quitting>
 2q: ... <quitting>
 DROP ROLE role_concurrency_test;
-DROP
+DETAIL:  owner of table pg_temp_27.tmp
+ERROR:  role "role_concurrency_test" cannot be dropped because some objects depend on it
 DROP RESOURCE GROUP rg_concurrency_test;
-DROP
+ERROR:  resource group is used by at least one role
 
 DROP VIEW rg_concurrency_view;
 DROP
diff -I HINT: -I CONTEXT: -I GP_IGNORE: -U3 /code/gpdb_src/src/test/isolation2/expected/resgroup/resgroup_dumpinfo.out /code/gpdb_src/src/test/isolation2/results/resgroup/resgroup_dumpinfo.out
--- /code/gpdb_src/src/test/isolation2/expected/resgroup/resgroup_dumpinfo.out	2024-09-30 17:28:21.174385431 +0800
+++ /code/gpdb_src/src/test/isolation2/results/resgroup/resgroup_dumpinfo.out	2024-09-30 17:28:21.176385421 +0800
@@ -43,7 +43,7 @@
 SELECT dump_test_check();
  dump_test_check 
 -----------------
- t               
+ f               
 (1 row)

What you think should happen instead

Pretty sure #649 has nothing to do with resource group. It seems a unstable case of resource group.

How to reproduce

https://github.com/cloudberrydb/cloudberrydb/actions/runs/11102859132/job/30844343864

Operating System

CI OS

Anything else

No response

Are you willing to submit PR?

  • [ ] Yes, I am willing to submit a PR!

Code of Conduct

avamingli avatar Sep 30 '24 11:09 avamingli

Failed case in another PR, memory_usage in resgroup/resgroup_views.out https://github.com/cloudberrydb/cloudberrydb/actions/runs/11136689360/job/30949248752?pr=653

/code/gpdb_src/src/test/isolation2/results/resgroup/resgroup_views.out
--- /code/gpdb_src/src/test/isolation2/expected/resgroup/resgroup_views.out	2024-10-02 10:39:17.270264381 +0800
+++ /code/gpdb_src/src/test/isolation2/results/resgroup/resgroup_views.out	2024-10-02 10:39:17.271264371 +0800
@@ -14,7 +14,7 @@
 select groupname , groupid , cpu_usage , memory_usage from gp_toolkit.gp_resgroup_status_per_host s join gp_segment_configuration c on s.hostname=c.hostname and c.content=-1 and role='p' where groupname='default_group';
  groupname     | groupid | cpu_usage | memory_usage 
 ---------------+---------+-----------+--------------
- default_group | 6437    | 0.00      | 0.00         
+ default_group | 6437    | 0.00      | 0.01         
 (1 row)
 
 select * from gp_toolkit.gp_resgroup_role where rrrolname='gpadmin';


real	6m5.137s
user	0m21.717s
sys	0m4.744s

avamingli avatar Oct 02 '24 03:10 avamingli

memory usage is acceptable between 0.00 ~ 0.01 . so let we change the .sql/.out

jiaqizho avatar Oct 08 '24 02:10 jiaqizho

Has this issue been resolved? Please confirm @avamingli , if so, I will close it, thanks!

my-ship-it avatar Dec 03 '24 01:12 my-ship-it

Has this issue been resolved? Please confirm @avamingli , if so, I will close it, thanks!

Sure, haven't seen it any more.

avamingli avatar Dec 03 '24 02:12 avamingli

https://github.com/apache/cloudberry/actions/runs/12193381532/job/34015800430?pr=725

diff -I HINT: -I CONTEXT: -I GP_IGNORE: -U3 /__w/cloudberry/cloudberry/src/test/isolation2/expected/resgroup/resgroup_cpu_max_percent.out /__w/cloudberry/cloudberry/src/test/isolation2/results/resgroup/resgroup_cpu_max_percent.out
--- /__w/cloudberry/cloudberry/src/test/isolation2/expected/resgroup/resgroup_cpu_max_percent.out	2024-12-05 21:30:09.592031381 -0800
+++ /__w/cloudberry/cloudberry/src/test/isolation2/results/resgroup/resgroup_cpu_max_percent.out	2024-12-05 21:30:09.596031363 -0800
@@ -794,7 +794,7 @@
 1:SELECT verify_cpu_usage('rg1_cpu_test', 10, 2);
  verify_cpu_usage 
 ------------------
- t                
+ f                
 (1 row)
 1:SELECT verify_cpu_usage('rg2_cpu_test', 20, 2);
  verify_cpu_usage 

avamingli avatar Dec 06 '24 07:12 avamingli

https://github.com/apache/cloudberry/actions/runs/15105659312/job/42454608346

This problem has been happening more often recently.

I saw this in other PRs/commits on main branch.

Run # Search for regression.diffs recursively
Found regression.diffs at: ./src/test/isolation2/regression.diffs
diff -I HINT: -I CONTEXT: -I GP_IGNORE: -U3 /__w/cloudberry/cloudberry/src/test/isolation2/expected/resgroup/resgroup_cpu_max_percent.out /__w/cloudberry/cloudberry/src/test/isolation2/results/resgroup/resgroup_cpu_max_percent.out
--- /__w/cloudberry/cloudberry/src/test/isolation2/expected/resgroup/resgroup_cpu_max_percent.out	2025-05-[18](https://github.com/apache/cloudberry/actions/runs/15105659312/job/42454608346#step:18:19) 23:34:39.937741504 -0700
+++ /__w/cloudberry/cloudberry/src/test/isolation2/results/resgroup/resgroup_cpu_max_percent.out	2025-05-18 23:34:39.945741526 -0700
@@ -224,7 +224,7 @@
 SELECT verify_cpu_usage('rg1_cpu_test', 90, 10);
  verify_cpu_usage 
 ------------------
- t                
+ f                
 (1 row)
 
 GP_IGNORE:-- start_ignor

avamingli avatar May 19 '25 07:05 avamingli

If the test case is not stable, how about we disable it?

my-ship-it avatar May 19 '25 07:05 my-ship-it

https://github.com/apache/cloudberry/actions/runs/15127934989/job/42523879030 one more failed

avamingli avatar May 20 '25 05:05 avamingli