aws-sdk-pandas icon indicating copy to clipboard operation
aws-sdk-pandas copied to clipboard

Documentation missing information on required schema for dataframe storage in DynamoDB

Open fsiler opened this issue 4 years ago • 4 comments

Is your idea related to a problem? Please describe. The examples provided for DynamoDB are far too brief to be of actual use. It would be easiest if there was a function create_table or whatever that could take a dataframe as input and create an appropriate storage table. At minimum, please provide a few examples of dataframes and their corresponding table setups. I get large stacktraces that don't actually help me get to the root of data modeling:

Traceback (most recent call last):
  File "/Users/fms/.pyenv/versions/3.9.6/lib/python3.9/site-packages/flask/app.py", line 2070, in wsgi_app
    response = self.full_dispatch_request()
  File "/Users/fms/.pyenv/versions/3.9.6/lib/python3.9/site-packages/flask/app.py", line 1515, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/Users/fms/.pyenv/versions/3.9.6/lib/python3.9/site-packages/flask/app.py", line 1513, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/fms/.pyenv/versions/3.9.6/lib/python3.9/site-packages/flask/app.py", line 1499, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "/Users/fms/build/fatpitch/app.py", line 113, in puts
    wr.dynamodb.put_df(df=df, table_name=USERS_TABLE, boto3_session=boto3_session)
  File "/Users/fms/.pyenv/versions/3.9.6/lib/python3.9/site-packages/awswrangler/dynamodb/_write.py", line 146, in put_df
    put_items(items=items, table_name=table_name, boto3_session=boto3_session)
  File "/Users/fms/.pyenv/versions/3.9.6/lib/python3.9/site-packages/awswrangler/dynamodb/_write.py", line 183, in put_items
    _validate_items(items=items, dynamodb_table=dynamodb_table)
  File "/Users/fms/.pyenv/versions/3.9.6/lib/python3.9/site-packages/awswrangler/dynamodb/_utils.py", line 54, in _validate_items
    raise exceptions.InvalidArgumentValue("All items need to contain the required keys for the table.")
awswrangler.exceptions.InvalidArgumentValue: All items need to contain the required keys for the table.

Describe the solution you'd like Ideally, create_table and create_table_json functions which would take a dataframe as input and either create the table or give the necessary schema information. Documentation of how dataframes should be laid out, and how the tables can be laid out to correspond with this. A few examples. Useful error messages such as "I didn't find a 'PK' entry in your dataframe, so I can't index it against this DynamoDB table."

fsiler avatar Sep 01 '21 05:09 fsiler

The error message means that your df is missing a column that you defined as a key when creating the table. Here's an example including creating the table:

import boto3
import awswrangler as wr
import pandas as pd

# Define df
df = pd.DataFrame({
    "key": [1, 2],
    "value": ["foo", "boo"]
})

# Create table
dynamo = boto3.client("dynamodb")
dynamo.create_table(
    TableName="test",
    KeySchema=[{"AttributeName": "key", "KeyType": "HASH"}],
    AttributeDefinitions=[{"AttributeName": "key", "AttributeType": "N"}],
    ProvisionedThroughput={
        'ReadCapacityUnits': 10,
        'WriteCapacityUnits': 10
    })

# Insert
wr.dynamodb.put_df(df=df, table_name="test")

I agree having a create_table function, and perhaps even allow creating the table on wr.dynamodb.put_df & others, if the table doesn't exist, would be convenient. Note you still would have to define the keys though so there's no way around slightly painful data modelling stage. Something along the lines of:

wr.dynamodb.create_table(
    table_name="test", 
    keys=[{"AttributeName": "key", "KeyType": "HASH"}],
)

wr.dynamodb.put_df(
    df=df, 
    table_name="test", 
    keys=[{"AttributeName": "key", "KeyType": "HASH"}], # Create table with following keys if doesn't exist
)
wr.dynamodb.put_json
...

kukushking avatar Sep 06 '21 16:09 kukushking

This issue requires triage and should be assigned.

github-actions[bot] avatar Mar 10 '22 18:03 github-actions[bot]

Is help still needed for that?

snikolakis avatar Aug 05 '22 10:08 snikolakis

@snikolakis New methods for dynamo mentioned by @kukushking above are not implemented or on the roadmap at this time, so any contributions would be welcomed!

malachi-constant avatar Aug 05 '22 15:08 malachi-constant