adam
adam copied to clipboard
Performance issue in TreeRegionJoin
Crosslinking to https://github.com/bigdatagenomics/avocado/issues/202. I'm seeing really odd performance issues in Avocado when using the Broadcast region join code. E.g., multiple order of magnitude slowdowns.
@fnothaft is this still an issue? The linked issue in Avocado was closed.
Yes, this is still an issue. We can rewrite the region join APIs to use Spark SQL, which appears to yield a large performance gain. Moving to Spark SQL would also make #1728 easier to close.