incubator-gluten icon indicating copy to clipboard operation
incubator-gluten copied to clipboard

[VL][CI] Enable Celeborn tests & Gluten CPP tests

Open PHILO-HE opened this issue 1 year ago • 15 comments

What changes were proposed in this pull request?

Simply a follow-up for https://github.com/apache/incubator-gluten/pull/4936.

PHILO-HE avatar Mar 26 '24 01:03 PHILO-HE

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

github-actions[bot] avatar Mar 26 '24 01:03 github-actions[bot]

Run Gluten Clickhouse CI

github-actions[bot] avatar Mar 26 '24 01:03 github-actions[bot]

Run Gluten Clickhouse CI

github-actions[bot] avatar Mar 26 '24 02:03 github-actions[bot]

Run Gluten Clickhouse CI

github-actions[bot] avatar Mar 26 '24 04:03 github-actions[bot]

Run Gluten Clickhouse CI

github-actions[bot] avatar Mar 26 '24 13:03 github-actions[bot]

Run Gluten Clickhouse CI

github-actions[bot] avatar Mar 27 '24 08:03 github-actions[bot]

Run Gluten Clickhouse CI

github-actions[bot] avatar Mar 27 '24 09:03 github-actions[bot]

Run Gluten Clickhouse CI

github-actions[bot] avatar Mar 27 '24 12:03 github-actions[bot]

Run Gluten Clickhouse CI

github-actions[bot] avatar Mar 28 '24 02:03 github-actions[bot]

Run Gluten Clickhouse CI

github-actions[bot] avatar Mar 28 '24 05:03 github-actions[bot]

Run Gluten Clickhouse CI

github-actions[bot] avatar Mar 28 '24 06:03 github-actions[bot]

Run Gluten Clickhouse CI

github-actions[bot] avatar Mar 28 '24 06:03 github-actions[bot]

Run Gluten Clickhouse CI

github-actions[bot] avatar Mar 28 '24 07:03 github-actions[bot]

Hi @kerwin-zk, I'm re-enabling celeborn test in the new CI. The below error is reported when stopping worker/master. Did you encounter this issue before?

waiting for worker graceful shutdown, wait for 599s
waiting for worker graceful shutdown, wait for 600s
Failed to stop server(pid=6960) after 600s
Error: Process completed with exit code 1.

PHILO-HE avatar Mar 28 '24 07:03 PHILO-HE

waiting for worker graceful shutdown, wait for 599s waiting for worker graceful shutdown, wait for 600s Failed to stop server(pid=6960) after 600s Error: Process completed with exit code 1.

@PHILO-HE Usually it is because there is a shuffle that has not yet ended when stopping, so it will wait for 600s.

kerwin-zk avatar Mar 28 '24 08:03 kerwin-zk

There's one line of error log, is this related? /opt/celeborn/conf/celeborn-env.sh: line 1: -e: command not found

zhouyuan avatar Apr 02 '24 00:04 zhouyuan

https://github.com/apache/incubator-gluten/issues/4917

github-actions[bot] avatar Apr 02 '24 01:04 github-actions[bot]

There's one line of error log, is this related? /opt/celeborn/conf/celeborn-env.sh: line 1: -e: command not found

@PHILO-HE @zhouyuan This exception seems to be a problem with the command written in celeborn-env.sh. It is recommended to print out the contents of celeborn-env.sh and check it.

kerwin-zk avatar Apr 02 '24 02:04 kerwin-zk

There's one line of error log, is this related? /opt/celeborn/conf/celeborn-env.sh: line 1: -e: command not found

@PHILO-HE @zhouyuan This exception seems to be a problem with the command written in celeborn-env.sh. It is recommended to print out the contents of celeborn-env.sh and check it.

@kerwin-zk, thanks for your suggestion! I will check that in local docker. FYI. just created a separate pr to enable celeborn test: https://github.com/apache/incubator-gluten/pull/5247

PHILO-HE avatar Apr 02 '24 02:04 PHILO-HE