Hongze Zhang
Hongze Zhang
@zml1206 Can you add one line in PR description to demonstrate the way to enable this cost model? Thanks.
> root cause is Spark merge two parquet part file's schema when `spark.sql.parquet.mergeSchema=true`, file1 schema is `s struct`, file2 schema is `s struct`, merged schema is `s struct`. > >...
@taiyang-li Just out of my curiosity, are there more sources about the benefit of `Gluten + CH + JIT` comparing with `Gluten + CH`? Or it's also an experimental work...
> The libvelox.so is missing links for folly, protobuf, and arrow (from nm -u) @PHILO-HE And do you have docker environment set? If yes I'd recommend you to follow https://github.com/apache/incubator-gluten/tree/main/tools/gluten-te/centos/examples/buildhere-veloxbe-portable-libs...
> It seems like we'll need a third-party lib jar with these libraries, in addition to a jar with gluten/velox Usually only `libgluten.so` / `libvelox.so` are needed when using static...
> So maybe the PR is still useful. I am thinking about using the similar mechanism with the one added in https://github.com/apache/incubator-gluten/pull/6009 as a general solution. So we don't have...
> > So maybe the PR is still useful. > > I am thinking about using the similar mechanism with the one added in #6009 as a general solution. So...
> @zhztheplayer can you check how the memory is allocated during the conversion? Where the arrow memory is allocated? how many memcpy during the conversion? Is there onheap=>offheap copy? @boneanxs...
> This pr uses ArrowFieldWriter to do the conversion, it's wildly used by pyspark, so reusing it here should be safe. Ahh then it's fine enough. Some of the code...
I think it's OK to have it disabled by default. @boneanxs Can you add a CI case for the feature to run TPC-H / TPC-DS tests? Example: https://github.com/apache/incubator-gluten/blob/8ab9b012db7ebbd7110ba1288b5c4e8a702ccc09/.github/workflows/velox_docker.yml#L293 You can...