ibis
ibis copied to clipboard
feat(E6data): initial implemantation for E6data SQL Analytics platform
Description of changes
This pull request aims to integrate E6data, a distributed SQL analytics engine, with Ibis. E6data is also designed for high-performance analytics on large-scale data, and this integration will allow Ibis users to leverage E6data's capabilities seamlessly.
Key changes and considerations:
-
E6data Backend Implementation:
- A new E6data backend class was added, inheriting from SQLBackend.
- Implemented connection handling, query execution, and result fetching specific to E6data.
- Adapted the backend to handle E6data's unique features, such as catalog support and cluster management.
-
SQL Dialect Customization:
- Created an E6data dialect class extending from MySQL, as E6data shares similarities with MySQL syntax.
- Customized the Tokenizer to use double quotes for identifiers.
- Modified the Generator to map certain data types (VARCHAR, CHAR, TEXT) to STRING, aligning with E6data's type system.
- Custom TRANSFORMS for specific SQL functions like concat and length were added.
-
Compiler Modifications:
- Updated E6DataCompiler to use the new E6data dialect and E6DataType for type mapping.
- Retained existing rewrites, including a custom limit rewrite, to ensure compatibility with E6data's query execution model.
-
Connection String and Authentication:
- Implemented support for E6data's connection string format, including catalog name, secure connection, auto-resume, and cluster UUID parameters.
-
Schema and Metadata Handling:
- Adapted schema retrieval and table listing functions to work with E6data's multi-level hierarchy (catalog, database, table).
-
Testing: I need some guidance on how to go about adding tests for the integration.
-
Documentation: Could you give me some pointers on how to add relevant documentation?
-
Dependencies:
- Currently, the platform is not open for public access; existing users require authentication keys to use the analytics engine. I would like guidance on enabling the maintainers to test and provide credentials for automated tests once they are supported. We're also working on a Mini-kube-based single-node testing infrastructure, which might make adding testing automation for the CI easier.
I'm new to the Ibis community, and this PR could be much better. I appreciate the time and guidance from maintainers in improving it further. Any comments are welcome; again, I appreciate your time and patience.
- Testing: I need some guidance on how to go about adding tests for the integration.
The best place to start is to try and run pytest -m e6data, assuming you have a way to access it from your development host.
- Documentation: Could you give me some pointers on how to add relevant documentation?
Take a look at the individual backend docs pages in docs/backends/. Those are a good place to start with backend-specific docs.
- Dependencies:
- Currently, the platform is not open for public access; existing users require authentication keys to use the analytics engine. I would like guidance on enabling the maintainers to test and provide credentials for automated tests once they are supported. We're also working on a Mini-kube-based single-node testing infrastructure, which might make adding testing automation for the CI easier.
We'll need to get whatever credentials/auth information is needed to login into a GitHub Actions secret. Let's chat in a DM on Zulip about this.
Hey @cpcloud ,
I've addressed most of your comments apart from the tests and documentation. Please let me know if these changes look good. I'll be working on writing the tests now.
We'll need to get whatever credentials/auth information is needed to login into a GitHub Actions secret. Let's chat in a DM on Zulip about this.
I've Dm'ed you on Zulip, kindly need your assistance.
Please create a new PR if you're still interested in working on this.