amundsen icon indicating copy to clipboard operation
amundsen copied to clipboard

feat: Introduce Nebula Metadata Proxy and Databuilder

Open wey-gu opened this issue 2 years ago • 3 comments

Metadata: Nebula Proxy Databuilder:

  • Nebula Extractor
  • Nebula Search Data Extractor
  • Nebula CSV Loader
  • Nebula CSV Publisher
  • Nebula Serializer
  • Nebula Sample Data Loader

Summary of Changes

see #1816

Tests

All New things were UT covered.

Documentation

docker-compose -f docker-amundsen-nebula.yml build
docker-compose -f docker-amundsen-nebula.yml up -d

cd databuilder
python3 -m venv venv
source venv/bin/activate
pip3 install --upgrade pip
pip3 install -r requirements.txt
python3 setup.py install
python3 example/scripts/sample_data_loader_nebula.py # this is necessary to trigger schema creation

CheckList

Make sure you have checked all steps below to ensure a timely review.

  • [x] PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
  • [x] PR includes a summary of changes.
  • [x] PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
  • [x] In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does

wey-gu avatar Apr 15 '22 10:04 wey-gu

note: now the search is broken in the recent rebase: ff1c42e, maybe it's related to recently merged search change, will look into it later

2022-05-15T11:16:51+0000.802 [ERROR] es_proxy_v2.execute_queries:311 (1:Thread-21) - Failed to execute ES search queries. TransportError(N/A, 'index_not_found_exception')

wey-gu avatar May 15 '22 08:05 wey-gu

is there a chance we could minimize this by reusing neo4j classes for some components? basically if neo4j speaks opencypher and nebula speaks it too can we reuse stuff like neo4j extractor, neo4j search data extractor, neo4j metadata proxy (but differently configured) etc? I understand how this might be difficult for write operations (but maybe not impossible) but read could surely reuse neo4j components?

I like the idea of nebula as alternative to neo4j especially that it also speaks opencypher but I'd like to know how this can be achieved with reusing neo4j stuff as much as possible. do we need so much new code or can this be avoided?

mgorsk1 avatar May 16 '22 18:05 mgorsk1

is there a chance we could minimize this by reusing neo4j classes for some components? basically if neo4j speaks opencypher and nebula speaks it too can we reuse stuff like neo4j extractor, neo4j search data extractor, neo4j metadata proxy (but differently configured) etc? I understand how this might be difficult for write operations (but maybe not impossible) but read could surely reuse neo4j components?

I like the idea of nebula as alternative to neo4j especially that it also speaks opencypher but I'd like to know how this can be achieved with reusing neo4j stuff as much as possible. do we need so much new code or can this be avoided?

Thanks @mgorsk1 for your time to look into the proposal!

Indeed, I also had seen yet another backend storage increases the burden introducing new features during the implementation of the reference PR for the proposal, and I just told myself to keep eye on all PRs after it's merged and lift it from my own efforts then.

While, as you pointed, it doesn't scale at all, and it in big chance is a good opportunity to make cypher-based backend with some level of abstractions to share codes when possible.

I will take this context and purpose in mind and see what could be done on the refactor.

There are some challenges that nebula only support OpenCypher as a dialect and reusing query string itself isn't directly possible(see here), while the mindset to per each read functions are similar, thus, find a way to decouple cypher-speaking DB implementation from code to configurations looks possible(and worth it).

Thanks.

wey-gu avatar May 17 '22 10:05 wey-gu

Closing as abandoned

Golodhros avatar Feb 06 '23 22:02 Golodhros

thanks @Golodhros , When the RFC was settled, I'll reopen the PR. https://github.com/amundsen-io/rfcs/pull/48 BR//Wey

wey-gu avatar Feb 07 '23 01:02 wey-gu