adam icon indicating copy to clipboard operation
adam copied to clipboard

Performance issue in TreeRegionJoin

Open fnothaft opened this issue 8 years ago • 2 comments

Crosslinking to https://github.com/bigdatagenomics/avocado/issues/202. I'm seeing really odd performance issues in Avocado when using the Broadcast region join code. E.g., multiple order of magnitude slowdowns.

fnothaft avatar Jan 24 '17 05:01 fnothaft

@fnothaft is this still an issue? The linked issue in Avocado was closed.

heuermh avatar Jan 09 '18 19:01 heuermh

Yes, this is still an issue. We can rewrite the region join APIs to use Spark SQL, which appears to yield a large performance gain. Moving to Spark SQL would also make #1728 easier to close.

fnothaft avatar Jan 09 '18 19:01 fnothaft