aws-sdk-pandas
aws-sdk-pandas copied to clipboard
Add ability to write DF as a bulk load job in Amazon Neptune
Is your idea related to a problem? Please describe. Add ability to load/update data from a data frame via the bulk loader.
Describe the solution you'd like The fastest way to load/update data in Neptune is to use the Bulk Loader. I would like to see a method for the Neptune integration that would take a dataframe, write it out to the supported file type (CSV/n-quads/n-triples/TTL) for the data model (LPG/RDF) and then trigger and monitor a bulk load of this process. P.S. Please do not attach files as it's considered a security risk. Add code snippets directly in the message body as much as possible.
Great idea! Where are files usually staged for such an operation? Is it S3 like for Redshift COPY?
This is something my team would be interested in as well. The Neptune Bulk Loader is all about S3 https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load.html
There's basically 2 APIs
- One to request the loading. This returns a load id
- One to check the load status given a load id
Based on https://github.com/awslabs/aws-data-wrangler/blob/main/awswrangler/neptune/client.py, I can envision implementing the methods there.
Here's my attempt to test if things would work having already initialized a client
- and they do
Create a load request - a potential implementation for load
method
data = {
"source" : "<s3 path>",
"format" : "nquads",
"iamRoleArn" : "<role arn>",
"mode": "AUTO",
"region" : "us-west-2",
"failOnError" : "TRUE",
"parallelism" : "MEDIUM"
}
url = f"https://{client.host}:{client.port}/loader"
req = client._prepare_request("POST", url, data=data)
res = client._http_session.send(req)
Query load status - a potential implementation for load_status
method
load_id = res.json()["payload"]["loadId"]
urlStatus = f"https://{client.host}:{client.port}/loader/{load_id}"
reqStatus = client._prepare_request("GET", urlStatus, data="")
resStatus = client._http_session.send(reqStatus)
resStatus.json()
Does make sense to me 👍
Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 7 days it will automatically be closed.
Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 7 days it will automatically be closed.