age icon indicating copy to clipboard operation
age copied to clipboard

Request for comments on using Apache AGE

Open dpdjvhxm opened this issue 2 years ago • 6 comments

This RFC seeks comments on the following aspects of Apache AGE:

  • Integration with existing systems (e.g., PostgreSQL)
  • Performance considerations and optimization
  • Scalability and handling of large datasets
  • Use cases and potential applications within our project
  • Compatibility with other tools and systems
  • Security implications and best practices
  • Maintenance and support requirements

dpdjvhxm avatar Jan 29 '24 01:01 dpdjvhxm

I think it would be fair to say that having something like a "Cypher Query Tool" in the PgAdmin would be extremely helpful. Integrating it with Age graph viewer certainly would advance the product to the level, somewhat, expected of the well behaving graph database product. On the performance side, I have recently applied plpgsql procedure with for loop over dataset with 4 properties to create approximately 120000 records for a labeled graph node. It takes, roughly, 1 millisecond on Windows 10 core intel laptop and SSD drive to create a record. In all 120,000 rows complete roughly in 2 minutes. In my humble opinion this is extremely slow. This means that 1 billion row table will take 1 million seconds to commit, this is 11 days. Having a scalable way to commit table worth of records in standard rowset postgres store to graph representation may be a nice improvement. A fairly common speed, to keep in mind, 25,000 rows / second for simple straightforward adhock script with just "CREATE" statement, and for a bulk the more or less common is a few million rows committed / second ( depending on how the connection is handled, for instance, TCP based connection vs. Shared Memory connection with Shared Memory being thought of as significantly faster one ). Please note, I am purposefully avoiding any complex architecture such as clusters. A competitive chart of graph database record ingestion speeds can be easily searched using your favorite search engine. There are well established "number sense" values to look for.

MironAtHome avatar Mar 23 '24 11:03 MironAtHome

@MironAtHome, have you, by any chance, applied this same test to another graph database tool, such as Neo4j, for instance?

markgomer avatar Mar 25 '24 15:03 markgomer

@markgomer not really in a shareable way, some work I did related to different graph db engines and it's tied to specific projects. I do have an esoteric ETL benchmark under works that I plan to use as a standard benchmark. Will share once I have completed its comparison across various graph engines. Here its legacy overview, since then FAA data has mutated a lot and framework needs to be reworked to retain relevance to real world data. I see it as a project in an of itself, so, its not something with a quick turnaround timeframe. But I did follow on your ask and performed search on graph engine performance comparison and found quite a few links. Unfortunately nothing looked like a "bulk load time" that I could share here. Will update if something comes my way.

MironAtHome avatar Mar 28 '24 14:03 MironAtHome

Thanks for your effort @MironAtHome! Please do share any findings here when you have it!

markgomer avatar Apr 03 '24 19:04 markgomer

This issue is stale because it has been open 45 days with no activity. Remove "Abondoned" label or comment or this will be closed in 7 days.

github-actions[bot] avatar May 19 '24 00:05 github-actions[bot]

This issue is stale because it has been open 60 days with no activity. Remove "Abondoned" label or comment or this will be closed in 14 days.

github-actions[bot] avatar Jul 20 '24 00:07 github-actions[bot]