pygraphistry icon indicating copy to clipboard operation
pygraphistry copied to clipboard

Feature request on native support for Dgraph database.

Open all-seeing-code opened this issue 4 years ago • 14 comments

I see the project already supports Neo4j and Tiger graph for accessing data. I was wondering if folks would be interested in supporting Dgraph (github) as well which is an open-source fast, distributed, and transactional database. I work with Dgraph happy to help with this addition to pygraphistry. Some of our users have showed interest in visualization capabilities that pygraphistry offers.

all-seeing-code avatar Jun 04 '20 10:06 all-seeing-code

Hi @anurags92 !

Happy to help get that landed, especially as we're getting to launch a cloud tier that will help them get going faster. Our plugins are fairly short in practice, basically we just need to implement some methods dgraph_auth(), dgraph_query_to_nodes_and_edges_dataframes() , and ideally via the optimized Apache Arrow binary format.

Maybe we collaborate via a https://colab.research.google.com/ notebook on the above methods for some dgraph sandbox DB, and I shepherd it into the main release from there?

lmeyerov avatar Jun 08 '20 07:06 lmeyerov

Hi @lmeyerov, apologies for the delay. Been keeping really busy. Maybe I didn't understand, but do you want me to expose a sandbox db in one of the notebooks over there?

all-seeing-code avatar Jun 22 '20 19:06 all-seeing-code

Hi @anurags92 ! Yes, having a live dgraph instance + colab notebook to collaborate on would help, and especially an example of going from query to dataframe with it.

As some recent updates on our side that should make this easier & faster:

  • Graphistry Hub launched, starting with free dev API accounts: https://www.graphistry.com/get-started

  • Our 2.0 upload API is out, which supports fast binary uploads via Apache Arrow:

-- https://www.graphistry.com/blog/graphistry-2-29-5-upload-100x-more-rapids-0-13-learnrapids-com-and-more -- REST: https://hub.graphistry.com/docs/api/#upload2 -- ... via Python: https://github.com/graphistry/pygraphistry/blob/master/graphistry/plotter.py#L700 + https://github.com/graphistry/pygraphistry/blob/master/graphistry/arrow_uploader.py (auto-coercions when doing graphistry.edges(pd.read_csv('...')), graphistry.edges(cudf.read_csv('...')), etc.)

My thinking is we start with a simple one via ^^^ for PyGraphistry, just need a notebook sandbox we can collaborate in, and then look at another popular client like JS.

lmeyerov avatar Jun 26 '20 17:06 lmeyerov

@anurags92 Just wanted to ping on this.

A good first step may be a sample notebook of doing a notebook of dgraph query -> graphistry viz, even before being built-in

lmeyerov avatar Aug 05 '20 22:08 lmeyerov

@lmeyerov Apologies for being MIA. I had taken a hiatus from dgraph. Since I am back, I am looking to clear out old items. This looks like an easy win. I have a dgraph setup with DQL query. Since we first discussed on this, dgraph is now available on cloud via it's own offering at cloud.dgraph.io. Let me know if you'd still be interested in collaborating on landing this in.

all-seeing-code avatar Feb 05 '23 10:02 all-seeing-code

Great, this would be of high interest!

Things are advancing a bit as we prep for ChatGPT support + our no-code SaaS launch, but in both cases, the work starts with the above. For a dgraph cloud demo dataset, can you start a notebook that does query-> node+edge pandas dataframe? If an official dgraph python maintains that step, even better!

lmeyerov avatar Feb 06 '23 15:02 lmeyerov

  • @DataBoyTX for visibility

lmeyerov avatar Feb 06 '23 15:02 lmeyerov

@lmeyerov I have a collab notebook setup here. It has very minimal data and query. We can start working on this. Let me know the next steps.

all-seeing-code avatar Feb 13 '23 10:02 all-seeing-code

I made a pass last weekend but wasn't able to get data out of the db instance in the google colab -- let me pass to folks here to see if someone can riff on it. I think the next step is drop the results into a dataframe:

Does dgraph support introspection of db schema for datatypes, vs just json? Ex: to know some field is a timestamp. Ultimately, we want to get the data into pandas/cudf node/edges dataframes, ideally with apache arrow conformant datatypes for fast & safe processing. Maybe there's another way manually, or a client library for other users doing dgraph <> dataframes we can align on (as we found simplifies stability long-term)?

lmeyerov avatar Feb 21 '23 17:02 lmeyerov

Hey Everyone. I'm with Dgraph also and am exploring using Graphistry for visualization of a large Dgraph cluster at an upcoming conference. I've managed to get @anurags92 's notebook (I made a copy) connecting to a Dgraph cloud instance and I managed to transform the Dgraph query result into your

{
  "graph": [
     ...
  ],
  "bindings": {
     ...
  },
  "labels": [
    ...
  ]
}

format that I've seen in the docs with regard to graphing JSON. I couldn't find a pygraphistry function however that would render this JSON format. Probably something obvious....

If you want to see the updated notebook, I've opened it up here: https://colab.research.google.com/drive/1EDv8IFNI-A6cqqbVArGNGyEMOK6BYZ0i?usp=sharing

Note that this notebook uses getpass for obtaining both the Dgraph cloud api key and the Graphistry account password, so you'll need to hit me up privately if you want to run it.

We'd be delighted to hop on a call with Graphistry to iron this out and maybe explore how we can get a native Dgraph data connector integrated with pygraphistry.

matthewmcneely avatar Mar 24 '23 21:03 matthewmcneely

@matthewmcneely awesome

If you're in pandas, you can directly load JSON in if it's already flat:

import pandas as pd

people = [ {'first_name': 'a', 'age': 20}, {'first_name': 'bb', 'age': 30} ]
nodes_df = pd.DataFrame(people)

#repeat for edges
edges_df = ...

import graphistry
graphistry.nodes(nodes_df, 'name').edges(edges_df, 'user_1_name', 'user_2_name').plot()

There are a bunch of flattening tricks if the data isn't flat yet, e.g., https://towardsdatascience.com/how-to-convert-json-into-a-pandas-dataframe-100b2ae1e0d8

Note: Underneath, graphistry will covert the pandas dataframe to Apache Arrow pre-upload, so for bigger graphs, this ends up being quite fast for the python->graphistry side, even on 1M row files, and I'm guessing there are similar tricks to make dgraph->python snappy too!

Ping leo@<our site . com> + tcook@, and we can help out?

lmeyerov avatar Mar 24 '23 21:03 lmeyerov

@lmeyerov

Thanks for your guidance. I was able to get a basic graph working. I had to write a custom python parser to extract nodes and edges from our JSON results. It's no where near complete, but a good starting point. BTW, the notebook is updated.

image

matthewmcneely avatar Mar 25 '23 19:03 matthewmcneely

Awesome!

I should also share, when dgraph returns "just" a table, vs a node table & edge table, another cool binding here can be the .hypergraph():


df = pd.read_csv('logs.csv')

# Extract & connect unique values from columns src_ip, dst_ip, alert col
# and choose whether to make a node for each row or not.
# Remaining table columns appear as attributes
g1 = graphistry.hypergraph(
  df,
  ['src_ip', 'dst_ip', 'alert'],
  direct=True
)['graph']

g1.plot()

# Control options like which edges to generate and which IDs live in the same namespace
g2 = graphistry.hypergraph(
  df,
  ['src_ip', 'dst_ip', 'alert'],
  direct=True,
  opts={
    'CATEGORIES': {
        'ip': ['src_ip', 'dst_ip']
    },
    'EDGES': {
        'src_ip': ['dst_ip'],
        'alert': ['src_ip', 'dst_ip']
    }
})['graph']

g2.plot()

lmeyerov avatar Mar 25 '23 23:03 lmeyerov

This may be more interesting to reexamine as louie.ai goes to more cohorts & reaches GA

lmeyerov avatar Jul 23 '23 09:07 lmeyerov