MDEV-36737: Research and Estimation for Adapting VIDEX to MariaDB
Description
This PR introduces VIDEX as a new storage engine plugin for MariaDB. VIDEX is a Disaggregated and Extensible Virtual Index Engine designed to perform efficient and accurate what-if analysis for tasks like index recommendation.
The VIDEX architecture is composed of two core parts:
- VIDEX-Optimizer: Implemented as a MariaDB storage engine, it hooks into the query optimizer and simulates the behavior and cost models of other engines (in this PR, InnoDB). It allows developers and DBAs to evaluate the impact of potential indexes on query plans without the overhead of building them on actual data.
- VIDEX-Statistic-Server: This is a decoupled service that handles complex statistical computations. The VIDEX-Optimizer forwards requests for cardinality and Number of Distinct Values (NDV) estimation to this server via an HTTP protocol. This design allows users to plug in their own estimation models—from simple heuristics on sampled data to sophisticated AI-powered algorithms—and deploy them independently.
The statistic server can be implemented in any language or framework. A reference implementation using Python/Flask has already been merged in bytedance/videx#47.
As discussed, this PR contains the implementation for the VIDEX-Optimizer. The corresponding VIDEX-Statistic-Server (developed mainly in Python) will be submitted in a follow-up PR.
As tested on the TPC-H benchmark, VIDEX is capable of producing query plans that are 100% identical to those from MariaDB's native InnoDB engine. The detailed results can be found in the description of bytedance/videx#47.
Features
- Statistics Service Integration: Communicates with VIDEX statistics server through HTTP
- Query Optimizer Support: Implements key interfaces such as
records_in_range,info_low, etc. - InnoDB Compatibility: Simulates InnoDB's cost model and cardinality estimation behavior
- Pluggable Architecture: Supports both dynamic loading and static linking
File Structure
storage/videx/
├── ha_videx.cc # Main storage engine implementation file
├── videx_utils.cc # Utility function implementation file
├── videx_utils.h # Utility function header file
├── CMakeLists.txt # CMake build configuration file
└── mysql-test/ # Test suite directory
└── videx/
├── suite.opt # Test suite configuration options
├── include/ # Test include files directory
│ └── have_videx.inc # VIDEX engine availability check
├── create-table-and-index.test # Table creation and index test
├── create-table-and-index.result # Table creation and index test expected results
├── set-debug-skip-http.test # Debug variable setting test
└── set-debug-skip-http.result # Debug variable setting test expected results
Configuration Variables
debug_skip_http: skip HTTP requests for debuggingserver_ip: VIDEX server addressoptions: connection options in JSON format
SET SESSION debug_skip_http = TRUE;
SET SESSION server_ip = 'your_own_ip';
SET SESSION options = '{"timeout": 30, "retry": 3}';
How to start VIDEX-Server
see videx/PR-47: Implemented VIDEX-Server on MariaDB
mysql-test
Test Case 1: create-table-and-index.test
Validates basic VIDEX engine functionality:
- Table creation and deletion
- Index creation and management
- Primary key and foreign key support
Test Case 2: set-debug-skip-http.test
Validates debugging functionality:
DEBUG_SKIP_HTTPvariable setting- HTTP skip logic
- EXPLAIN query execution
Build Configuration
Dependencies
- libcurl: HTTP client library
- zlib: Compression support
CMake Options
PLUGIN_VIDEX=YES: Enable VIDEX plugin (default)PLUGIN_VIDEX=STATIC: Static linkingPLUGIN_VIDEX=DYNAMIC: Dynamic loading
Compilation Command Example
cmake -DPLUGIN_VIDEX=YES \
-DCMAKE_CXX_COMPILER=/usr/bin/clang++ \
-G Ninja --fresh \
-S /path/to/mariadb \
-B /path/to/build
Contributor Information
- Authors: Haibo Yang, Rong Kang
Future Plans
- [ ] Add trace recording in HTTP requests for tracking
- [ ] Complete column_bitmaps_signal to support indexed virtual columns for VIDEX engine
- [ ] Improve server-side cardinality precision by introducing AI model
Summary
The VIDEX storage engine provides MariaDB with a flexible and extensible statistics information management solution. Through external statistics services, it achieves high compatibility with InnoDB while maintaining architectural flexibility and maintainability. This plugin is particularly suitable for scenarios requiring rapid iteration of statistics strategies or deployment of distributed statistics services.
Release Notes
Added a VIDEX engine in storage/videx to support what-if analysis for index strategies, integrates AI-based cardinality and NDV (number of distinct values) estimation algorithms
How can this PR be tested?
TODO: modify the automated test suite to verify that the PR causes MariaDB to behave as intended. Consult the documentation on "Writing good test cases".
If the changes are not amenable to automated testing, please explain why not and carefully describe how to test manually.
Basing the PR against the correct MariaDB version
- [x] This is a new feature or a refactoring, and the PR is based against the
11.8tag. - [ ] This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.
PR quality check
- [ ] I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
- [x] For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.
Hi @svoj @gl-sergei @kr11,
This PR implemented the Videx storage engine with the Optimizer part. For videx server usage, you can refer to bytedance/videx#47. The server will be covered in a future PR.
One of the CI tests is currently failing — could you please help check it, or try re-running the workflow?
Thanks a lot!
@YoungHypo regarding failing test: please disregard it. It is unrelated to this PR, we will sort it out when we're ready to merge.
Thanks for your feedback @svoj. We’ve completed the changes: all non-Videx code has been removed, and the core files containes only two parts, ha_videx.cc and videx_utils. The mysql-test directory has also been simplified as you suggested, and the commits have been squashed. Please let me know if further adjustments are needed.
storage/videx/
├── ha_videx.cc # Main storage engine implementation file
├── videx_utils.cc # Utility function implementation file
├── videx_utils.h # Utility function header file
├── CMakeLists.txt # CMake build configuration file
└── mysql-test/ # Test suite directory
└── videx/
├── suite.opt # Test suite configuration options
├── include/ # Test include files directory
│ └── have_videx.inc # VIDEX engine availability check
├── create-table-and-index.test # Table creation and index test
├── create-table-and-index.result # Table creation and index test expected results
├── set-debug-skip-http.test # Debug variable setting test
└── set-debug-skip-http.result # Debug variable setting test expected results
Thanks for your feedback @svoj ! I've removed some unnecessary code from ha_videx.cc and updated suite.opt and suite.pm to support dynamic builds. I also updated the tests and results to align with the current Videx code.
In my local testing, cmake -DPLUGIN_VIDEX=NO resulted in skipped, while setting DPLUGIN_VIDEX to YES/DYNAMIC/STATIC all passed.
Currently, the tests set skip_http to True, so they do not depend on the Videx server. In the next PR, we plan to discuss how to introduce a Python-based server implementation. Look forward to your further review and guidance. Thanks again!
BTW, once it’s ready to be merged, I’ll squash all commits into one again.
Thanks @svoj. I’ve removed videx_debug_skip_http and now use videx_server_ip to control server enable/disable. I’ve also updated the mysql-test. If it’s merge-ready, I’ll squash the commits.
@YoungHypo I believe it is in a good enough shape for the merge, so please do squash. Still other developers may request some extra changes in the meantime.
@YoungHypo it'd also be good to reword main commit description so that it says something like:
MDEV-36737: Research and Estimation for Adapting VIDEX to MariaDB
VIDEX is a Disaggregated and Extensible Virtual Index Engine designed
to perform efficient and accurate what-if analysis for tasks like
index recommendation.
I’ve squashed the commits and updated the description. Thanks again @svoj for your guidance and feedback — it’s truly great to have your help.
@YoungHypo I believe it is in a good enough shape for the merge, so please do squash. Still other developers may request some extra changes in the meantime.
@svoj May I ask what the next steps in the review process will be? We’re planning to share our current progress on Jira and Zulip — do you think that’s a good idea? Really looking forward to receiving feedback from other developers.
@YoungHypo I believe it is in a good enough shape for the merge, so please do squash. Still other developers may request some extra changes in the meantime.
@svoj May I ask what the next steps in the review process will be? We’re planning to share our current progress on Jira and Zulip — do you think that’s a good idea? Really looking forward to receiving feedback from other developers.
I will ask some other developers for feedback here. Feel free to share our current progress via jira/zulip.
@vuvova, @spetrunia I believe initial VIDEX version is in a good enough shape. I aim to get it merged to 11.8, disabled by default, plugin marked as experimental. Do you have any suggestions/objections? Will you want to review this PR too?
Hi @svoj , just following up on the status of this PR. I was wondering if there’s been any further feedback from other developers. I’m happy to help with any changes if needed — looking forward to the next steps!
Hi @YoungHypo. No feedback yet. I will be trying to get things rolling. In the meantime, unless absolutely necessary, it'd be good to keep this PR intact, no need to perform merges. So that we can anchor to certain revision. We can update the tree when we're ready to merge. It'd be good to revert recent merge to rev 2f8993d72d12faaedfa6d5749f4bdb297e49eced, as it was at the time I approved it.
Hi @YoungHypo. No feedback yet. I will be trying to get things rolling. In the meantime, unless absolutely necessary, it'd be good to keep this PR intact, no need to perform merges. So that we can anchor to certain revision. We can update the tree when we're ready to merge. It'd be good to revert recent merge to rev 2f8993d, as it was at the time I approved it.
Thanks @svoj! I've reset the branch to commit 2f8993d as suggested
Hi @svoj , the branch has been reset as you suggested. It looks like the CI workflow is now awaiting approval (1 workflow awaiting approval). Could you please approve the workflow to get it running?
@kr11 done, though it was just Windows on ARM, rather minor builder.
@YoungHypo, just wanted to say, that we're still testing VIDEX — a couple of developers have it installed and run various queries. Unfortunately, it's not very visible in the PR, but we are working on it
@vuvova @svoj @kr11 Thank you very much for the update, and really happy to hear that everything is moving forward.
VIDEX - PR 47 already includes the installation and execution steps for VIDEX in MariaDB and its dependency (Statistic Server), as well as the TPC-H benchmark results. If anything in the PR description is unclear, we can continue the discussion either here in this PR or in the Zulip channel. Please feel free to let me know if there’s anything I can assist with.
Hi, @svoj
The new commit-5a7b576 fixed a gcc linking error (undefined symbol: VidexJsonItem::add_property_nonanadd_property_nonan definition from the .cc to the header file. Clang silently ignored this issue, but GCC requires template definitions to be visible at compile time.
Last minute packaging fix. Can I get you to correct this as follows:
Note on CI resutls for 4217 that the deb-autobake builders are failing.
to correct:
- storage/videx/CMakeLists.txt append "COMPONENT videx-engine" to "MYSQL_ADD_PLUGIN"
- debian/control - add a
Package: mariadb-plugin-videx
Architecture: any
Depends: mariadb-server (= ${server:Version}),
${misc:Depends},
${shlibs:Depends}
Description: ....
Notably don't depend explictly on curl, the others are a mistake that I'll correct soon.
- add debian/mariadb-plugin-videx.install
etc/mysql/mariadb.conf.d/videx.cnf
usr/lib/mysql/plugin/ha_videx.so
Last minute packaging fix. Can I get you to correct this as follows:
Note on CI resutls for 4217 that the deb-autobake builders are failing.
to correct:
- storage/videx/CMakeLists.txt append "COMPONENT videx-engine" to "MYSQL_ADD_PLUGIN"
- debian/control - add a
Package: mariadb-plugin-videx Architecture: any Depends: mariadb-server (= ${server:Version}), ${misc:Depends}, ${shlibs:Depends} Description: ....Notably don't depend explictly on curl, the others are a mistake that I'll correct soon.
- add debian/mariadb-plugin-videx.install
etc/mysql/mariadb.conf.d/videx.cnf usr/lib/mysql/plugin/ha_videx.so
Hi Daniel @grooverdan ,
Thank you for the detailed and clear guidance on the packaging fixes. We have pushed the changes in the latest commit 2eb8329.
Specifically, we have:
- Appended
COMPONENT videx-engineto theMYSQL_ADD_PLUGINmacro instorage/videx/CMakeLists.txt. - Added a new package definition for
mariadb-plugin-videxindebian/control. As requested, we have relied on${shlibs:Depends}for automatic dependency detection and did not add an explicit dependency oncurl. - Created the new
debian/mariadb-plugin-videx.installfile to include the plugin's.soand.cnffiles in the package.
Please let me know if any further adjustments are needed.
Hi, @grooverdan cc @svoj
The latest commit has successfully resolved the Debian packaging failures.
We also reviewed the CI results and noticed a failure in the amd64-ubuntu-2204-debug-ps job related to main.debug_sync. As the log shows, the test passed on retry. We thought it might be a flaky test.
Given that our changes are confined to the storage/videx engine and do not alter core server logic, we are not sure whether this failure is related to this PR. We would appreciate your expert opinion on the matter.
Retrying test main.debug_sync, attempt(2/3)...
worker[02] > Restart - not started
main.debug_sync w2 [ retry-pass ] 79
Retrying test main.debug_sync, attempt(3/3)...
main.debug_sync w2 [ retry-pass ] 85
...
Failing test(s): main.debug_sync
@kr11 yes, main.debug_sync failure is unrelated and was fixed in da5cffeab3694546b44f200ee9bd3cc2c8f2b278 and 692f32852bcb78b4eb7cc27a03e2f070b83fc115. Please disregard it, alas we have many sporadic failures.