amundsen
amundsen copied to clipboard
feat: Introduce Nebula Metadata Proxy and Databuilder
Metadata: Nebula Proxy Databuilder:
- Nebula Extractor
- Nebula Search Data Extractor
- Nebula CSV Loader
- Nebula CSV Publisher
- Nebula Serializer
- Nebula Sample Data Loader
Summary of Changes
see #1816
Tests
All New things were UT covered.
Documentation
docker-compose -f docker-amundsen-nebula.yml build
docker-compose -f docker-amundsen-nebula.yml up -d
cd databuilder
python3 -m venv venv
source venv/bin/activate
pip3 install --upgrade pip
pip3 install -r requirements.txt
python3 setup.py install
python3 example/scripts/sample_data_loader_nebula.py # this is necessary to trigger schema creation
CheckList
Make sure you have checked all steps below to ensure a timely review.
- [x] PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
- In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.
- [x] PR includes a summary of changes.
- [x] PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
- [x] In case of new functionality, my PR adds documentation that describes how to use it.
- All the public functions and the classes in the PR contain docstrings that explain what it does
note: now the search is broken in the recent rebase: ff1c42e, maybe it's related to recently merged search change, will look into it later
2022-05-15T11:16:51+0000.802 [ERROR] es_proxy_v2.execute_queries:311 (1:Thread-21) - Failed to execute ES search queries. TransportError(N/A, 'index_not_found_exception')
is there a chance we could minimize this by reusing neo4j classes for some components? basically if neo4j speaks opencypher and nebula speaks it too can we reuse stuff like neo4j extractor, neo4j search data extractor, neo4j metadata proxy (but differently configured) etc? I understand how this might be difficult for write operations (but maybe not impossible) but read could surely reuse neo4j components?
I like the idea of nebula as alternative to neo4j especially that it also speaks opencypher but I'd like to know how this can be achieved with reusing neo4j stuff as much as possible. do we need so much new code or can this be avoided?
is there a chance we could minimize this by reusing neo4j classes for some components? basically if neo4j speaks opencypher and nebula speaks it too can we reuse stuff like neo4j extractor, neo4j search data extractor, neo4j metadata proxy (but differently configured) etc? I understand how this might be difficult for write operations (but maybe not impossible) but read could surely reuse neo4j components?
I like the idea of nebula as alternative to neo4j especially that it also speaks opencypher but I'd like to know how this can be achieved with reusing neo4j stuff as much as possible. do we need so much new code or can this be avoided?
Thanks @mgorsk1 for your time to look into the proposal!
Indeed, I also had seen yet another backend storage increases the burden introducing new features during the implementation of the reference PR for the proposal, and I just told myself to keep eye on all PRs after it's merged and lift it from my own efforts then.
While, as you pointed, it doesn't scale at all, and it in big chance is a good opportunity to make cypher-based backend with some level of abstractions to share codes when possible.
I will take this context and purpose in mind and see what could be done on the refactor.
There are some challenges that nebula only support OpenCypher as a dialect and reusing query string itself isn't directly possible(see here), while the mindset to per each read functions are similar, thus, find a way to decouple cypher-speaking DB implementation from code to configurations looks possible(and worth it).
Thanks.
Closing as abandoned
thanks @Golodhros , When the RFC was settled, I'll reopen the PR. https://github.com/amundsen-io/rfcs/pull/48 BR//Wey