cartography
cartography copied to clipboard
Migrate to Neo4j 4.x
This PR "lifts" support (pun intended) for Neo4j to the 4.x line as the v3.5 has a current "End of Support" date of Novemer 28, 2021.
This PR touches a LOT of files, so I fully expect it to take some time for review. It passes integration tests on my machine, but might need some testing in the wild. (Another item missing is a migration guide for existing 3.5 databases, but that could be provided in a separate PR.)
Updating the Driver
The Neo4j Python Driver is updated to the latest 4.2 driver. The neobolt
dependency is dropped as it is no longer required due to Neo4j Exception classes being included in the driver package nowadays.
Cypher Updates
All Cypher templates are updated to use the new parameter syntax of $
-prefixed naming instead of the previous {}
syntax. This small syntax change results a LOT of the codebase being touched for very small changes.
As background, the {}
syntax was deprecated in Neo4j 3.5 and support was dropped in 4.0. See https://neo4j.com/docs/cypher-manual/4.2/syntax/parameters/#cypher-parameters-introduction for more details.
Note: As of the time of this commit, it was required to manually install version 1.25.x of the google-api-core package as the 1.26-dev version being installed was breaking test execution. No changes were made to setup.py to accommodate this as it's outside the scope of this PR to address Google dependencies.
Thanks for letting us know the deadline for EOL support and for getting started! Really appreciate the work here.
Will need help to review and test.
This PR "lifts" support (pun intended)
<3
I'm going to do some manual testing this week and push any tweaks to my branch as needed. I only have a GCP environment to test in, but should be able to uncover some low hanging fruit issues common to all environments after the 4.x upgrade.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
What is the status of this pull request? Is support of Neo4J v4 will be added to Cartography?
@pvasilevich Yes - we are targeting before the EOL date toward the end of the year.
Hey @voutilad, thanks again for the work you've done with this. We're going to try to get this in around the end of November/beginning of December.
There's been a bunch of new code updates since you first submitted this (and there probably will continue to be code updates between now and December) - what's the best way for us to help update the branch? Should we submit PRs against your branch? I don't want to make you be the only one taking this on :)
@achantavy I believe I set the PR to be editable by contributors to the cartography project. That should allow you go push changes to the branch associated with the PR in the cartography project itself.
I just wanted to remind about this task (it was originally created to be finished before 28 November). However, according to https://neo4j.com/developer/kb/neo4j-supported-versions/ support of 3.5 version is extended to 27th May 2022.
Do you need any help with this PR? Maybe it will be better to refactor these changes and introduce support of Neo4J gradually? I mean, support both Neo4J 3.5 and 4x simultaneously. In this case I could be beta tester of 4.x version support in my own environment.
@pvasilevich thanks for the bump and for keeping track of cartography progress. Work tends to slow down around this time of year, so it's good to know that the EOL deadline has been extended.
Maybe it will be better to refactor these changes and introduce support of Neo4J gradually?
That's an option, although we'd want to avoid having fragmentation with 3.5x support in some modules and 4.x in others. It also doesn't help that our project has gotten big with a lot of copy-pasted ingestion query boilerplate.
I think we have a couple of paths.
A.) Use Dave's approach in this PR to hand edit each query to work on 4.x.
B.) Introduce a set of generalized query functions to abstract this away for intel module writers. This will handle transaction retries for reliability, perform neo4j unwind + merge
queries for speed, and perform any necesary 4.x specific behaviors. I got started with this in https://github.com/lyft/cartography/pull/629 and https://github.com/lyft/cartography/pull/631 but did not finish. There will be one PR to introduce the query functions, and then each intel module can adopt them one by one so that PR reviews are as clean and easy as possible.
I prefer option B but am open to other ideas. In any case, expect to see movement on this after the holiday season here in the US.
My vote is for Option B. This makes life easy for incorporating the Neo4j v4.x support without repeating a lot of boilerplate code.
With just a couple of months left for Neo4j 3.5 to reach end of life, I was wondering whether there's been any movement in this space?
Just wanted to mention, I tried the branch that this PR is based on, and I had errors related to index creation. I managed to fix this by changing all the statements in indexes.cypher
to use IF NOT EXISTS
, e.g.
CREATE INDEX IF NOT EXISTS FOR (x:AWSAccount) ON (x.id);
instead of
CREATE INDEX ON :AWSAccount(id);
+1 to this. Unfortunately, we're unable to use Cartography with any non-EOL'd version of Neo4j now :/. Is there anything we can do to help move this along?
+1 to this. Unfortunately, we're unable to use Cartography with any non-EOL'd version of Neo4j now :/. Is there anything we can do to help move this along?
Hey, @andrewnicolalde! We're planning the upgrade for Cartography OSS very soon, but for now you can try our auto-syntax upgrader. We've had a lot of success with it internally. You just need to have the Neo4j 4.x database, and set an environment variable or use a cli switch to use it.
This effort is being tracked in #914