gazelle_plugin icon indicating copy to clipboard operation
gazelle_plugin copied to clipboard

[V1.4.0] gazelle plugin crash

Open Manoj-red-hat opened this issue 2 years ago • 9 comments

Describe the bug

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGILL (0x4) at pc=0x00007f2687f73fa4, pid=27656, tid=0x00007f269e6f9700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_201-b09) (build 1.8.0_201-b09)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.201-b09 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libspark_columnar_jni.so+0x3a5fa4]  exprs::IfNode::InitAsDefaultInstance()+0x24
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /tmp/hadoop-manojk/nm-local-dir/usercache/manojk/appcache/application_1658472520631_0009/container_1658472520631_0009_01_000003/hs_err_pid27656.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

To Reproduce Steps to reproduce the behavior:

Expected behavior A clear and concise description of what you expected to happen.

Additional context Add any other context about the problem here.

Manoj-red-hat avatar Jul 22 '22 14:07 Manoj-red-hat

FYI .. @PHILO-HE

Manoj-red-hat avatar Jul 22 '22 14:07 Manoj-red-hat

Hi @Manoj-red-hat, I guess it is related to protobuf. You can try to build the project again with -Dbuild_protobuf=ON to make sure a consistent protobuf version is used. If the issue still exists, please let me know your SQL statement for reproducing.

PHILO-HE avatar Jul 25 '22 03:07 PHILO-HE

I am trying above with https://github.com/oap-project/gazelle_plugin/releases/download/v1.4.0/gazelle-plugin-1.4.0-spark-3.2.1.jar , its getting crashed

please let me know your SQL statement for reproducing. even a simple select statement is crashing

As per your suggestion, now I am trying to build jar on my local environment will update you futher

Manoj-red-hat avatar Jul 25 '22 05:07 Manoj-red-hat

@weiting-chen, please help follow up. I guess we can provide conda env. package to help user quickly deploy gazelle for evaluation.

PHILO-HE avatar Jul 26 '22 02:07 PHILO-HE

@Manoj-red-hat Could you help to check if the issue is happened "during" the query or "after" the query. I would like to identify if this is a known issue or not. Also let me know which OS version you are running?

weiting-chen avatar Jul 26 '22 02:07 weiting-chen

The release in https://github.com/oap-project/gazelle_plugin/releases/download/v1.4.0/gazelle-plugin-1.4.0-spark-3.2.1.jar is using Ubuntu 20.04,there may be some dependency libraries issues when using other OS such as centos. It is better to compile by yourself to make sure all the dependency libraries can be found in your system.

weiting-chen avatar Jul 26 '22 02:07 weiting-chen

@PHILO-HE @weiting-chen

I tried conda env but its also not work out.

image

Machine Info OS: Ubuntu 18.04.6 LTS x86_64 CPU: Intel i5-8400 (6) @ 4.000GHz

Finally I git clone and used below command, mvn -Phadoop-3.2,spark-3.2 package -DskipTests -Dcheckstyle.skip -Dmaven.test.skip

Now I am able to use gazelle plugin.

Doing some debugging regarding TPCH-Q1

Just wondering why we are doing filtering  on conditional filter operator,
When that can be handled by parquet arrow dataset reader, it will save our
projection time 

Its only possible if input batches are from native parquet reader, but if
we are able to do that it gonna give us speed up

Any way for demonstrating above, I am creating standalone Q1 C++ gazelle test case.

Manoj-red-hat avatar Jul 26 '22 12:07 Manoj-red-hat

@Manoj-red-hat Could you help to check if the issue is happened "during" the query or "after" the query. I would like to identify if this is a known issue or not. Also let me know which OS version you are running?

During the query

Machine Info OS: Ubuntu 18.04.6 LTS x86_64 CPU: Intel i5-8400 (6) @ 4.000GHz

Manoj-red-hat avatar Jul 26 '22 12:07 Manoj-red-hat

@PHILO-HE @weiting-chen

I tried conda env but its also not work out.

image

Machine Info OS: Ubuntu 18.04.6 LTS x86_64 CPU: Intel i5-8400 (6) @ 4.000GHz

Finally I git clone and used below command, mvn -Phadoop-3.2,spark-3.2 package -DskipTests -Dcheckstyle.skip -Dmaven.test.skip

Now I am able to use gazelle plugin.

Doing some debugging regarding TPCH-Q1

Just wondering why we are doing filtering  on conditional filter operator,
When that can be handled by parquet arrow dataset reader, it will save our
projection time 

Its only possible if input batches are from native parquet reader, but if
we are able to do that it gonna give us speed up

Any way for demonstrating above, I am creating standalone Q1 C++ gazelle test case.

Any issues when you are running TPC-H Q1 with Gazelle? I assume you have run it successfully, right?

weiting-chen avatar Aug 02 '22 00:08 weiting-chen

@weiting-chen sorry for late reply got stuck in some-other work, anyways thanks for your support and indeed its a great project. Now I am exploring gazelle on tpcds side

Any issues when you are running TPC-H Q1 with Gazelle? I assume you have run it successfully, right?

Everything looks fine on TPCH.

Now I switched to TPCDS and started validating that benchmark There are few problems there, reporting them in seperate issue. like wrong result in Q4, will keep reporting if I face more

Manoj-red-hat avatar Aug 26 '22 08:08 Manoj-red-hat