cartography icon indicating copy to clipboard operation
cartography copied to clipboard

Feature Request: Add Support for Amazon Neptune (and other openCypher-compatible graph databases)

Open triggan opened this issue 5 months ago • 4 comments

Title: Add Support for Amazon Neptune (and other openCypher-compatible graph databases)

Description:

I would like to propose adding support for Amazon Neptune as an alternative graph database backend for Cartography, while maintaining full backward compatibility with existing Neo4j deployments. This enhancement would enable users to leverage AWS-managed graph database services without requiring code changes to their existing Cartography workflows.

Background:

Amazon Neptune supports the Neo4j Bolt protocol and openCypher query language, making it compatible with Cartography's existing graph operations. Many organizations already using AWS infrastructure would benefit from being able to use Neptune as their graph database backend, taking advantage of AWS's managed service benefits (automated backups, scaling, security, etc.) while maintaining their existing Cartography investment.

Proposed Enhancement

The enhancement would introduce database backend selection through CLI parameters while preserving all existing functionality:

  1. Database Type Selection: Add a --graph-database-type parameter to choose between neo4j (default) and neptune (or adding additional names for any future graph database support)
  2. Unified Connection Parameters: Introduce generic --graph-* parameters alongside existing --neo4j-* parameters for backward compatibility
  3. Neptune-Specific Authentication: Neptune would require AWS IAM Authentication when used against Neptune clusters with IAM Authentication enabled. Most of this integration is already available in the AWS modules within the project. Some additional support may require use of botocore directly.
  4. Transparent Operation: Leverage Neptune's Neo4j Bolt driver compatibility to minimize code changes.

Example Usage

Existing Neo4j usage (unchanged)

  cartography --neo4j-uri bolt://localhost:7687

New Neptune usage

  cartography --graph-database-type neptune \
    --graph-uri bolt://neptune-cluster.amazonaws.com:8182

Implementation Considerations

Areas Requiring Changes

  1. CLI Parameter Structure (cartography/cli.py) - Add database type selection parameter. An alternative approach would be to parse the URI and look for the use of neptune.amazonaws.com and infer the use of Neptune. Though this approach may not be conducive for support of other graph databases. - Add unified graph connection parameters. - Maintain backward compatibility with existing --neo4j-* parameters.
  2. Connection Management (cartography/sync.py) - Implement database driver factory pattern - Add Neptune-specific connection logic (IAM auth, encryption settings)
  3. Indexes - Neptune automatically indexes all components of the graph and does not allow for user defined indexes. The commands at (cartograph/data/indexes.cypher) would not be used.
  4. Config class (cartography/config.py)
    • Addition of graph-* fields
    • neo4j-uri field would need to become optional w/ introduction of graph-uri field

Dependencies

  • Addition of AWS SDK dependencies (botocore on top of existing boto3 support) for IAM authentication
  • Continued use of existing neo4j driver (compatible with both databases)

Benefits

  • Zero Breaking Changes: Existing Neo4j users experience no disruption
  • Minimal Code Impact: Leverages Neptune's Neo4j Bolt compatibility and openCypher support
  • AWS Integration: Native support for AWS-managed graph database services
  • Future Extensibility: Framework could support additional openCypher-compatible databases
  • Cost Optimization: Enables use of AWS managed services with automated operations

Future Extensibility

This approach would establish a foundation for supporting other openCypher-compatible graph databases, such as: memgraph, FalkorDB, KuzuDB, etc.

Request for Feedback

I recognize this is a significant enhancement that would impact core connection management. I'm happy to:

  • Provide more detailed technical specifications if helpful
  • Assist with implementation if the maintainers are interested
  • Test against Neptune instances during development
  • Assist with on-going maintenance of Neptune support
  • Help with documentation updates

I believe this enhancement would significantly expand Cartography's deployment options while maintaining its excellent existing functionality. Thank you for considering this feature request, and I appreciate all the work that has gone into making Cartography such a valuable tool for infrastructure analysis.

Additional Context: I am part of the Amazon Neptune development team at AWS and we've seen significant interest in using graphs for various security workloads. We've previously integrated in similar libraries such as Altimeter from Tableau and the AWS Workload Discovery solution. Cartography has the added benefit that it includes so many other Intel modules for other infrastructure besides just AWS.

triggan avatar Jun 28 '25 21:06 triggan

Hey @triggan, thanks for the thorough write-up!

We'd absolutely love some assistance. I'll write some technical considerations off the top of my head:

  1. we have an orm. we can add neptune support now if wanted Like what Kunaal said on Slack, Cartography has a lightweight ORM, so it should be straightforward to support a different graph database backend to any plugin as long as it uses the ORM. There are still many plugins that still need to be refactored, but I don't think this is a blocking reason: we could have Neptune support only for modules that use the ORM.

  2. we need subqueries

To support Neptune, we need support for some cypher features that may not be in the openCypher/ISO GQL/whatever-the-standard-is spec. cartography's ORM generates cypher queries so that a node type can declare what other types it can attach to without needing to special-case too many situations. For example, EC2 instances may relate to security groups, EBS volumes, or instance profiles. The ORM builds a query that ingests EC2 instance data and attempts to create all defined relationships. However, in practice, not all EC2 instances will have every relationship defined—some may lack an instance profile, for example. If we generate a flat Cypher query using just MERGE statements, encountering a missing value can cause the query to fail early or skip processing the rest of that instance’s data.

To avoid this problem, we use subqueries with OPTIONAL MATCH. This allows each relationship clause to run independently and gracefully handle missing data. It ensures that ingestion continues even if some relationships aren’t present, and we don’t have to worry about the order in which relationship merge statements appear. This pattern is essential for processing partially complete records in a robust way, and it’s why support for subqueries is critical for Cartography to run on Neptune.

You can see this test case. It demonstrates how our ORM model constructs a Cypher query that ingests a list of dictionaries and creates the appropriate relationships based on the model’s relationship declarations.

  1. misc: memgraph does work

For Memgraph, Daniel experimented with that and it basically works out of the box: https://gigi.nullneuron.net/gigilabs/migrating-cartography-to-memgraph/


I'm happy to get into even more detail to spec out the changes to CLI and config. What does your timeline look like?

achantavy avatar Jun 29 '25 02:06 achantavy

Hey @triggan, how far did you get on this, anything I can help with?

achantavy avatar Jul 10 '25 04:07 achantavy

This would be a very helpful addition!

mrpackethead avatar Sep 23 '25 01:09 mrpackethead

At this point I'm running a cron job to export Neo4J to Neptune after running Cartography. All that to get out of the freemium Neo4J ecosystem

Molaire avatar Oct 23 '25 16:10 Molaire