aws-sdk-go-v2 icon indicating copy to clipboard operation
aws-sdk-go-v2 copied to clipboard

marshal dynamodb results directly to JSON

Open jonsmirl opened this issue 6 years ago • 10 comments

When writing lambda functions that query dynamodb I often simply send this data back to the lamdba caller as a JSON result. I would find it useful to have marshalling functions that go directly from the dynamodb result to json (in addition to the existing ones). This would optionally let me skip a lot of pointless conversions and memory allocations building the intermediate go structure which is promptly destroyed one line later in my code. Note, I'm not bringing back just a single item, usually it is an array of several thousand items.

jonsmirl avatar Feb 26 '18 15:02 jonsmirl

Thanks for posting this feature request, @jonsmirl. Are you looking for the DynamoDB API calls to return a JSON string of the result directly, instead of the AttributeValues that are normally returned?Or are you looking for a utility that allows you to convert AttributeValue collections to the DynamoDB JSON representation?

jasdel avatar Feb 26 '18 18:02 jasdel

Does it work like this?

http request goes off to DynamoDB and the results come back as JSON. Go then turns that JSON into a structure like this: { "ConsumedCapacity": null, "Count": 23, "Items": [ { "thing": { "B": null, "BOOL": null, "BS": null, "L": null, "M": null, "N": null, "NS": null, "NULL": null, "S": "70b2d28ecf7e01dd3a4ba88da3ddde44", "SS": null }, I marshal it with a structure definition and it turns in to: error := dynamodbattribute.UnmarshalListOfMaps(result.Items, &items) [ { "timestamp": 1519510162585, "type": "clip", "thing": "70b2d28ecf7e01dd3a4ba88da3ddde44" }, { Which I then turn back into json with this: res, _ := json.Marshal(items)


Two optimizations seem possible

  1. Leave the data in JSON when it comes back from dynamodb. Let me get access to it without doing any transforms to it. The idea here is to make it easy to just send the result set on out to the caller.

  2. Leave the data in JSON until dynamodbattribute.UnmarshalListOfMaps(result.json, &items) is called. ie avoid that first expanded representation and make the Unmarshal functions directly work on json.

Also, parallel optimizations can be applied to writing data to DynamoDB. My incoming data is already in a JSON string, it would be great if I could simply pass that string into DynamoDB unchanged.

My Go code is used to validate the user and to ensure they don't read/write from unauthorized places (I use CognitoIDs). The Go code doesn't touch the data in either direction.

On the plus side I am converting these functions from Javascript (where they run 500-1000ms) over to Go which so far is running in the 200-300ms range. Getting rid of pointless shuffling of the data by Go might reduce my run-time down to 100ms.

jonsmirl avatar Feb 26 '18 18:02 jonsmirl

A function that would convert AttributeValue collections directly to a valid JSON string would be very handy.

The JSON strings can be passed to all manner of library functions and other things -- without the need to pass thru Go specific struct definitions.

My current need is simply to pass the retrieved data via a .JSON file -- the Go code never uses the content of the object at all.

warrenstephens avatar Aug 29 '19 16:08 warrenstephens

Thanks for the update @jonsmirl and @warrenstephens. Since the AttributeValue type is used within as a field within a JSON document, there is not a clean way to inject the AttributeValue as a JSON string/byte slice.

Since we have a good understanding of the use case we can use this issue to track design proposals for this feature. There are a few ways I think this could be implemented with varying levels of "completeness"

Ideas:

1.) Add custom member to AttributeValue

Adding a custom member, (e.g. Raw) to the AttributeValue that can contain a JSON byte slice. This new member would always be used if not-nil. Serializing the AttributeValue to the request would write the Raw member instead of serializing the AttributeValue's other members.

Reading a response from DynamoDB, the Raw member would contain the response's raw value for the AttributeValue.

This would be limited to AttributeValue type only, does not cover map[string]AttributeValue, nor []AttributeValue. Requires the user to split a JSON document's map/list into separate AttributeValues.

2.) Add Additional byte slice API input/output parameters

For every API input/output that contains an AttributeValue parameter the SDK's generator would also generate a new Raw<ParamName> parameter that contains a raw JSON byte slice for the JSON document to be sent.

The second option seems to be more of a sledge hammer to the problem, and it would be better to find a cleaner, more concise solution to this issue.

jasdel avatar Sep 19 '19 23:09 jasdel

Any updates on this issue? Approach #1 would greatly increase the performance of some applications I maintain.

tarndt avatar Mar 10 '20 22:03 tarndt

I'm also interested on the resolution of this enhancement. thank you.

neoramos avatar Mar 31 '20 12:03 neoramos

Any updates on this? I'm interested as well.

codeherk avatar Feb 13 '21 04:02 codeherk

I think a better example would be that json.Unmarshal of DynamoDB JSON would result in a map[string]types.AttributeValue because then one can use attributevalue.UnmarshalMap will get you your go type which can be json.Marshaled to normal JSON.

I had to port dynamodbattribute from v1 SDK so I could get my go type from a DynamoDB Streams Event

marc-ostrow avatar Feb 13 '21 20:02 marc-ostrow

Hi all, I have discussed with the team, and we are planning to implement this. However, there is no timeline for this yet, I will comment if we have any update.

vudh1 avatar Apr 22 '22 17:04 vudh1

There is already a function https://github.com/aws/aws-sdk-go-v2/blob/98519e67e1d575b00ed51b831a877a4b9bbf6a43/service/dynamodb/serializers.go#L3374 that does almost what we need.

This can be copied into a public function. This should only take a couple of days to implement.

ryan-ju avatar Jul 26 '22 11:07 ryan-ju

This is a critical path feature that should already be supported.

duaneking avatar Nov 25 '22 20:11 duaneking

I would love to see some traction on this, but want to revise the ideas proposed earlier: https://github.com/aws/aws-sdk-go-v2/issues/134#issuecomment-533351383

My use case would prefer to not allocate everything in memory, but rather accept an io.Writer because I prefer to process everything in a stream. It looks to me like there is only one thing blocking this:

  • ~smithyjson.newValue could be made public~ can be done with NewEncoder().Value
  • awsAwsjson10_serializeDocumentAttributeUpdates could be made public

Those honestly appear to be the only blockers. I may just fork the repo and make those changes to unblock myself

rw-access avatar Jan 23 '23 23:01 rw-access

Respectfully, while the Writer interface is commonly used for a lot of things in go and that is kind of a golang thing, I don't believe that that alone should block this.

First version doing allocations is fine, as long as the interface allows updates and optimizations later. If the initial requirement is that it support the Write interface to cll Write(*), then provide an option for that, but don't focus too much on optimization prematurely. It's better to get the interfaces correct, first.

The initial implementation could be as simple as a strings.Builder consuming the write of the json to return the result.

Separation of concerns is also an issue, and I don't believe that overloading concerns is going to be useful here. It might be better for backwards compatibility to have additional API's.

duaneking avatar Jan 26 '23 17:01 duaneking

Chiming in on this given the amount of attention/upvotes it's received in the past. As an FYI, jasdel@ and vudh1@ are no longer on the project, so the only context I have is what I can read here.

I'd like to highlight a few very salient points from the discussion so far-

  • @jonsmirl There are real use cases, for operations which return structured data in some form that is useful/valid outside of a deserialization context, that would be served by being able to skip SDK deserialization and instead access that data directly.
    • I would assert this extends beyond just dynamodb scan results, although I can't necessarily provide a concrete example at this time.
    • The way the response (or parts of it) is exposed is ultimately at the whim of the service and will vary therein. Cloudformation templates, for example, are emitted into the response type directly as a string, so the use case described here doesn't require any additional SDK support.
  • @rw-access io.Reader/Writer and friends are ubiquitous in the language and are certainly the most idiomatic when working with io/buffers/etc. We've certainly seen this come up in other areas, for example in the FR to add better support for copying to Writers w/ transfer manager (#2036)

Given those points I think the way I'd like us to support this is through something entirely generic like the following:

// WithHijackToReader skips deserialization and instead passes the response
// body to the provided reader.
func WithHijackToReader(r *io.ReadCloser) func(*middleware.Stack) error

// WithHijackToWriter skips deserialization and instead directly copies
// the contents of the response bodyto the provided writer.
func WithHijackToWriter(r io.Writer) func(*middleware.Stack) error

// example
var body io.ReadCloser
svc.Scan(context.Background(), &dynamodb.ScanInput{
    // ...
}, dynamodb.WithAPIOptions(WithHijackToReader(&body)))

These would exist as escape hatches which you could apply to any SDK operation. WithHijackToWriter is mostly a convenience, the real value-add lies in the ability to break out of normal deserialization flow and work with the response body directly.

A crude version of this is actually entirely possible today. You can just blow out the Deserialize part of the operation and replace it with something that steals the response body:

func withHijackResponseBody(body *io.ReadCloser) func(*dynamodb.Options) {
    return func(o *dynamodb.Options) {
        o.APIOptions = append(o.APIOptions, func(s *middleware.Stack) error {
            s.Deserialize.Clear()
            // implementation of hijackResponseBody is omitted but it essentially just plumbs the body
            // from out.RawResponse back to the iface ptr
            return s.Deserialize.Add(&hijackResponseBody{body: body}, middleware.After)
        })
    }
}

I can't necessarily sanction this as a workaround at this time- will get into it below, but if you do this today you're liable to bypass a set of behavior that you really shouldn't.

There are definitely a few things as-written that prevent us from supporting this today:

  • Success and error deserialization right now are coupled within the same middleware. If you remove one, you remove the other. I suspect we'd want to preserve error response deserialization with these new APIs.
  • Features that need to "middle-man" the response body, such as dynamodb checksum response validation and s3 flexible checksums would likely be broken.
  • The "hijack" would be all-in: header-bound fields of the response wouldn't be deserialized anymore.
    • Also, in taking ownership of the response body you get everything it contains. e.g. for Scan, you'd have to fish the actual results out of the Items field in your code that consumes the body, and they'd be in their raw "union" form.
  • Right now the Deserialize step MUST return a pointer to the expected output type, or the type assertion at the end of the operation will panic. There's no way to generically do that for every response type in the proposed API, I expect we could change how the outer type assertion is performed to remove the need for this, but there are behavioral differences to be considered there.

Some of the points above could probably be written off as risks/tradeoffs accepted due to the (imo) advanced nature of this use case, others would have to be addressed before we implement.

lucix-aws avatar Sep 19 '23 16:09 lucix-aws

Hi all,

We have recently discussed this as a team. At this time we don't plan on supporting this functionality as it's out of the SDK's scope.

Ran~

RanVaknin avatar Feb 15 '24 21:02 RanVaknin

This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.

github-actions[bot] avatar Feb 15 '24 21:02 github-actions[bot]