amundsen feat: Introduce Nebula Metadata Proxy and Databuilder

Metadata: Nebula Proxy Databuilder:

Nebula Extractor
Nebula Search Data Extractor
Nebula CSV Loader
Nebula CSV Publisher
Nebula Serializer
Nebula Sample Data Loader

Summary of Changes

see #1816

Tests

All New things were UT covered.

Documentation

docker-compose -f docker-amundsen-nebula.yml build
docker-compose -f docker-amundsen-nebula.yml up -d

cd databuilder
python3 -m venv venv
source venv/bin/activate
pip3 install --upgrade pip
pip3 install -r requirements.txt
python3 setup.py install
python3 example/scripts/sample_data_loader_nebula.py # this is necessary to trigger schema creation

CheckList

Make sure you have checked all steps below to ensure a timely review.

[x] PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
- In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.
[x] PR includes a summary of changes.
[x] PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
[x] In case of new functionality, my PR adds documentation that describes how to use it.
- All the public functions and the classes in the PR contain docstrings that explain what it does

Apr 15 '22 10:04 wey-gu

note: now the search is broken in the recent rebase: ff1c42e, maybe it's related to recently merged search change, will look into it later

2022-05-15T11:16:51+0000.802 [ERROR] es_proxy_v2.execute_queries:311 (1:Thread-21) - Failed to execute ES search queries. TransportError(N/A, 'index_not_found_exception')

May 15 '22 08:05 wey-gu

is there a chance we could minimize this by reusing neo4j classes for some components? basically if neo4j speaks opencypher and nebula speaks it too can we reuse stuff like neo4j extractor, neo4j search data extractor, neo4j metadata proxy (but differently configured) etc? I understand how this might be difficult for write operations (but maybe not impossible) but read could surely reuse neo4j components?

I like the idea of nebula as alternative to neo4j especially that it also speaks opencypher but I'd like to know how this can be achieved with reusing neo4j stuff as much as possible. do we need so much new code or can this be avoided?

May 16 '22 18:05 mgorsk1

is there a chance we could minimize this by reusing neo4j classes for some components? basically if neo4j speaks opencypher and nebula speaks it too can we reuse stuff like neo4j extractor, neo4j search data extractor, neo4j metadata proxy (but differently configured) etc? I understand how this might be difficult for write operations (but maybe not impossible) but read could surely reuse neo4j components?

I like the idea of nebula as alternative to neo4j especially that it also speaks opencypher but I'd like to know how this can be achieved with reusing neo4j stuff as much as possible. do we need so much new code or can this be avoided?

Thanks @mgorsk1 for your time to look into the proposal!

Indeed, I also had seen yet another backend storage increases the burden introducing new features during the implementation of the reference PR for the proposal, and I just told myself to keep eye on all PRs after it's merged and lift it from my own efforts then.

While, as you pointed, it doesn't scale at all, and it in big chance is a good opportunity to make cypher-based backend with some level of abstractions to share codes when possible.

I will take this context and purpose in mind and see what could be done on the refactor.

There are some challenges that nebula only support OpenCypher as a dialect and reusing query string itself isn't directly possible(see here), while the mindset to per each read functions are similar, thus, find a way to decouple cypher-speaking DB implementation from code to configurations looks possible(and worth it).

Thanks.

May 17 '22 10:05 wey-gu

Closing as abandoned

Feb 06 '23 22:02 Golodhros

thanks @Golodhros , When the RFC was settled, I'll reopen the PR. https://github.com/amundsen-io/rfcs/pull/48 BR//Wey

Feb 07 '23 01:02 wey-gu

amundsen amundsen copied to clipboard

feat: Introduce Nebula Metadata Proxy and Databuilder

Summary of Changes

Tests

Documentation

CheckList

amundsen
amundsen copied to clipboard