velox
velox copied to clipboard
Improve OutputBufferManager initialization
Starting from C++11, the C++ standard guarantees that the initialization of function-local static variables is thread-safe. This is better than using a global mutex, especially for subsequent getInstance() calls. This is because the overhead of using a static local variable only needs to do a simple check to see if the variable has already been initialized, while for the global mutex case, all getInstance() calls need to aquire this lock exclusively.
Deploy Preview for meta-velox canceled.
| Name | Link |
|---|---|
| Latest commit | 9d64dd414bb37083c57e58f741e756648e057107 |
| Latest deploy log | https://app.netlify.com/sites/meta-velox/deploys/67cfa214aac4a400088538f4 |
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@yingsu00 CI is red. Please, take a look. We need a green CI to merge.
@yingsu00 CI is red. Please, take a look. We need a green CI to merge.
CI is green now. Thanks for porting it.
Thanks for working on this!
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@yingsu00 Seeing failures in Prestissimo. If this change is not backwards compatible it needs to be fixed.
fbcode/github/presto-trunk/presto-native-execution/presto_cpp/main/PeriodicTaskManager.cpp:239:55: error: no member named 'lock' in 'std::shared_ptr<facebook::velox::exec::OutputBufferManager>'
239 | velox::exec::OutputBufferManager::getInstance().lock()->numBuffers());
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
buck-out/v2/gen/fbcode/4ac7a641bc650342/velox/common/base/__velox_stats_reporter__/buck-headers/velox/common/base/StatsReporter.h:208:43: note: expanded from macro 'RECORD_METRIC_VALUE'
208 | reporter->addMetricValue((key), ##__VA_ARGS__); \
| ^~~~~~~~~~~
1 error generated.
@yingsu00 Seeing failures in Prestissimo. If this change is not backwards compatible it needs to be fixed.
fbcode/github/presto-trunk/presto-native-execution/presto_cpp/main/PeriodicTaskManager.cpp:239:55: error: no member named 'lock' in 'std::shared_ptr<facebook::velox::exec::OutputBufferManager>' 239 | velox::exec::OutputBufferManager::getInstance().lock()->numBuffers()); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ buck-out/v2/gen/fbcode/4ac7a641bc650342/velox/common/base/__velox_stats_reporter__/buck-headers/velox/common/base/StatsReporter.h:208:43: note: expanded from macro 'RECORD_METRIC_VALUE' 208 | reporter->addMetricValue((key), ##__VA_ARGS__); \ | ^~~~~~~~~~~ 1 error generated.
I'll send a PR to Presto repo to fix this.
@yingsu00 Seeing failures in Prestissimo. If this change is not backwards compatible it needs to be fixed.
fbcode/github/presto-trunk/presto-native-execution/presto_cpp/main/PeriodicTaskManager.cpp:239:55: error: no member named 'lock' in 'std::shared_ptr<facebook::velox::exec::OutputBufferManager>' 239 | velox::exec::OutputBufferManager::getInstance().lock()->numBuffers()); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ buck-out/v2/gen/fbcode/4ac7a641bc650342/velox/common/base/__velox_stats_reporter__/buck-headers/velox/common/base/StatsReporter.h:208:43: note: expanded from macro 'RECORD_METRIC_VALUE' 208 | reporter->addMetricValue((key), ##__VA_ARGS__); \ | ^~~~~~~~~~~ 1 error generated.I'll send a PR to Presto repo to fix this.
Draft Presto PR: https://github.com/prestodb/presto/pull/23970. How do you want to proceed? Once this Velox PR is merged, I can add an advance Velox commit in https://github.com/prestodb/presto/pull/23970. The "advance Velox" will need the hash of this commit.
@yingsu00 We cannot merge Velox PR that breaks Prestissimo. The changes need to be made in backward-compatible manner.
@yingsu00 Some examples: #5953 #6459
@yingsu00 Some examples: #5953 #6459
@mbasmanova Thanks for the examples, but could you elaborate a bit how to follow these examples? Since OutputbufferManager::getInstance() now returns shared_ptr, all callsites, including the ones in Presto, needs to remove the lock() from OutputbufferManager::getInstance().lock(). Did you mean you wanted to keep it returning weak_ptr? Returning weak_ptr is not quite appropriate here, as @Yuhta suggested. It makes sense to keep any class fields to this object a weak_ptr, e.g. in PartitionedOutput const std::weak_ptr<exec::OutputBufferManager> bufferManager_;, but it's better to return a shared_ptr for OutputbufferManager::getInstance().
Also I don't quite understand why merging this PR would break Presto given that Presto needs to advance Velox version to pick up this change. Presto won't break unless someone else advances Velox for Presto without merging https://github.com/prestodb/presto/pull/23970 together. To avoid that situation, I will add "Advance Velox" change to https://github.com/prestodb/presto/pull/23970 once I get the hash after this change is merged to Velox. Do you think if it works this way? Or do you have other suggestions? Appreciate it if you could elaborate more. Thanks!
@ManManson @mbasmanova @Yuhta Can anyone provide some advice how to proceed? Thank you!
The most straightforward way would be to split the work into several pieces as follows:
- Introduce
std::shared_ptr<...> getInstanceStrong()and convert uses that are strictly confined to Velox internals, but leave public APIs (which are used by Prestissimo) as is. - Upgrade Velox version in Presto C++ and convert all references to
OutputBufferManagerto usegetInstanceStrong(). This ensures that we don't have references to the old method signature anymore. - Make old
getInstance()to also returnstd::shared_ptr<...>(sogetInstance()andgetInstanceStrong()become the same thing). - Upgrade Velox version in Presto C++ once again and convert all uses of
getInstanceStrong()togetInstance(). - Now we are free to finally drop the
getInstanceStrong()method.
The most straightforward way would be to split the work into several pieces as follows:
- Introduce
std::shared_ptr<...> getInstanceStrong()and convert uses that are strictly confined to Velox internals, but leave public APIs (which are used by Prestissimo) as is.- Upgrade Velox version in Presto C++ and convert all references to
OutputBufferManagerto usegetInstanceStrong(). This ensures that we don't have references to the old method signature anymore.- Make old
getInstance()to also returnstd::shared_ptr<...>(sogetInstance()andgetInstanceStrong()become the same thing).- Upgrade Velox version in Presto C++ once again and convert all uses of
getInstanceStrong()togetInstance().- Now we are free to finally drop the
getInstanceStrong()method.
Cool! Thanks for the idea @ManManson. I have updated this PR to the state of 1) as you suggested. Please re-review.
@ManManson @Yuhta Will you folks be able to review this PR again?
Thank you @ManManson ! @Yuhta @mbasmanova Do you have other comments? If not I'm marking this PR to be "ready-to-merge" again. Once this one is merged I'll send follow up PRs as @ManManson suggested.
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@yingsu00 Ying, would you rebase so we can try landing this PR?
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@yingsu00 Thank you for rebasing. Seeing CI failures. Would you take a look?
Details
@mbasmanova Thanks for importing it! The test failures seems to be caused by https://github.com/facebookincubator/velox/pull/11737 which was exported and not reviewed. https://github.com/facebookincubator/velox/pull/11743 is supposed to fix it. We can wait it to be merged first then I'll rebase again.
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
CI is still red.
hi @mbasmanova All tests pass now except the Facebook internal ones:
Facebook Internal - Builds & Tests: 3 warning_suppressed, 3 warning, 6 warning_preexisting, 94 warning_nosignal, 893 advice_no_signal 3 warning_suppressed, 3 warning, 6 warning_preexisting, 94 warning_nosignal, 893 advice_no_signal internally at Meta. warning: infer-checkers-fbcode-diff-analysis-with-buck
I'm unable to view them though. Would it be possible to show me what the warnings are? I can fix them once I know what's wrong. Thank you!
@yingsu00 I just noticed this has been sitting here a while, could you rebase your change? I can reimport it then and see what issues if any there are in our internal tests.
@yingsu00 Can you fix conflicts if you are still interested in landing this change ?
CI has some failures. I wonder if they are related. @yingsu00 can you please confirm?
@majetideepak There were unrelated issues in the CI pipeline. It is clean now.
@pedroerp can you please help merge? Thanks.