balsa icon indicating copy to clipboard operation
balsa copied to clipboard

statement timeout and 'Forgot to call Stop()?' occurred when executing python run.py --run Balsa_JOBRandSplit --local

Open PengJiazhen408 opened this issue 2 years ago • 0 comments

I tried to run this project and found some problems. Following the instructions in README.md, I installed requirements and used this line of command to run. python run.py --run Balsa_JOBRandSplit --local An error occurred when I tried to run it the first time, with following traceback:

Waiting on Ray tasks...value_iter=41
ray.get(tasks) received exception: ray::ExecuteSql() (pid=33391, ip=192.168.199.173)
  File "run.py", line 172, in ExecuteSql
    remote=not use_local_execution)
  File "/data/pjz_data/optimizer/balsa/balsa/util/postgres.py", line 89, in ExplainAnalyzeSql
    remote=remote)
  File "/data/pjz_data/optimizer/balsa/balsa/util/postgres.py", line 216, in _run_explain
    timeout_ms, cursor, remote)
  File "/data/pjz_data/optimizer/balsa/balsa/util/postgres.py", line 248, in _run_explain
    return pg_executor.Execute(s, verbose, geqo_off, timeout_ms, cursor)
  File "/data/pjz_data/optimizer/balsa/pg_executor/pg_executor/pg_executor.py", line 163, in Execute
    _SetGeneticOptimizer('default', cursor)
  File "/data/pjz_data/optimizer/balsa/pg_executor/pg_executor/pg_executor.py", line 102, in _SetGeneticOptimizer
    cursor.execute('set geqo = {};'.format(flag))
psycopg2.errors.QueryCanceled: canceling statement due to statement timeout
Canceling Ray tasks.
Retrying PlanAndExecute() (max_retries=3).
Traceback (most recent call last):
  File "run.py", line 1507, in PlanAndExecute
    refs = ray.get(tasks)
  File "/home/zju/anaconda3/envs/balsa/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/home/zju/anaconda3/envs/balsa/lib/python3.7/site-packages/ray/worker.py", line 1713, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(QueryCanceled): ray::ExecuteSql() (pid=33391, ip=192.168.199.173)
  File "run.py", line 172, in ExecuteSql
    remote=not use_local_execution)
  File "/data/pjz_data/optimizer/balsa/balsa/util/postgres.py", line 89, in ExplainAnalyzeSql
    remote=remote)
  File "/data/pjz_data/optimizer/balsa/balsa/util/postgres.py", line 216, in _run_explain
    timeout_ms, cursor, remote)
  File "/data/pjz_data/optimizer/balsa/balsa/util/postgres.py", line 248, in _run_explain
    return pg_executor.Execute(s, verbose, geqo_off, timeout_ms, cursor)
  File "/data/pjz_data/optimizer/balsa/pg_executor/pg_executor/pg_executor.py", line 163, in Execute
    _SetGeneticOptimizer('default', cursor)
  File "/data/pjz_data/optimizer/balsa/pg_executor/pg_executor/pg_executor.py", line 102, in _SetGeneticOptimizer
    cursor.execute('set geqo = {};'.format(flag))
psycopg2.errors.QueryCanceled: canceling statement due to statement timeout
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "run.py", line 2155, in <module>
    app.run(Main)
  File "/home/zju/anaconda3/envs/balsa/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/zju/anaconda3/envs/balsa/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
   File "run.py", line 2151, in Main
    agent.Run()
  File "run.py", line 2100, in Run
    has_timeouts = self.RunOneIter()
  File "run.py", line 1841, in RunOneIter
    is_test=False)
  File "run.py", line 1520, in PlanAndExecute
    max_retries=max_retries - 1)
  File "run.py", line 1401, in PlanAndExecute
    self.timer.Start('plan_test_set' if is_test else 'plan')
  File "/data/pjz_data/optimizer/balsa/train_utils.py", line 231, in Start
    assert self.curr_stage is None, 'Forgot to call Stop()?'
AssertionError: Forgot to call Stop()?
Traceback (most recent call last):
  File "run.py", line 1507, in PlanAndExecute
    refs = ray.get(tasks)
  File "/home/zju/anaconda3/envs/balsa/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/home/zju/anaconda3/envs/balsa/lib/python3.7/site-packages/ray/worker.py", line 1713, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(QueryCanceled): ray::ExecuteSql() (pid=33391, ip=192.168.199.173)
  File "run.py", line 172, in ExecuteSql
    remote=not use_local_execution)
  File "/data/pjz_data/optimizer/balsa/balsa/util/postgres.py", line 89, in ExplainAnalyzeSql
    remote=remote)
  File "/data/pjz_data/optimizer/balsa/balsa/util/postgres.py", line 216, in _run_explain
    timeout_ms, cursor, remote)
  File "/data/pjz_data/optimizer/balsa/balsa/util/postgres.py", line 248, in _run_explain
    return pg_executor.Execute(s, verbose, geqo_off, timeout_ms, cursor)
  File "/data/pjz_data/optimizer/balsa/pg_executor/pg_executor/pg_executor.py", line 163, in Execute
    _SetGeneticOptimizer('default', cursor)
  File "/data/pjz_data/optimizer/balsa/pg_executor/pg_executor/pg_executor.py", line 102, in _SetGeneticOptimizer
    cursor.execute('set geqo = {};'.format(flag))
psycopg2.errors.QueryCanceled: canceling statement due to statement timeout
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "run.py", line 2155, in <module>
    app.run(Main)
  File "/home/zju/anaconda3/envs/balsa/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/zju/anaconda3/envs/balsa/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "run.py", line 2151, in Main
    agent.Run()
  File "run.py", line 2100, in Run
    has_timeouts = self.RunOneIter()
  File "run.py", line 1841, in RunOneIter
    is_test=False)
  File "run.py", line 1520, in PlanAndExecute
    max_retries=max_retries - 1)
  File "run.py", line 1401, in PlanAndExecute
    self.timer.Start('plan_test_set' if is_test else 'plan')
  File "/data/pjz_data/optimizer/balsa/train_utils.py", line 231, in Start
    assert self.curr_stage is None, 'Forgot to call Stop()?'
AssertionError: Forgot to call Stop()?

PengJiazhen408 avatar May 09 '23 02:05 PengJiazhen408