aws-sdk-ios icon indicating copy to clipboard operation
aws-sdk-ios copied to clipboard

Unable to read S3 Select Records Payload stream

Open CSolanaM opened this issue 4 years ago • 3 comments

State your question

I'm trying to use the S3 Select feature directly from an iOS app without success. I've been reading the docs and trying different things, as I see the result is an event stream object that handles the chunked transfer. I've been able to get it in Node and Python, but not in iOS, I'm aware it's missing something that reads every received chunk, since the Payload is an AWSS3SelectObjectContentEventStream, but I'm a bit lost in the documentation for iOS, couldn't see anything after several days of Googling, trial and error, etc.

Also tried with the AWS CLI, works perfectly (I believe it uses the Python SDK under the hood). The same in the S3 Select console (web dashboard).

Any clues?

Thank you.

Which AWS Services are you utilizing?

AWS S3, S3 Select

Provide code snippets (if applicable)

This is my test approach in Swift, not working:

        // AWSServiceManager.default().defaultServiceConfiguration has been initialized
        let request: AWSS3SelectObjectContentRequest = AWSS3SelectObjectContentRequest()
        request.bucket = "mybucket-dev"
        request.key = "myfolder/mydata.json.gz"
        request.expression = "SELECT s.title FROM s3object[*][*] s limit 1"
        request.expressionType = .sql
        let inputSerialization = AWSS3InputSerialization()
        inputSerialization?.compressionType = .gzip
        let jsonInput = AWSS3JSONInput()
        jsonInput?.types = .document
        inputSerialization?.json = jsonInput
        request.inputSerialization = inputSerialization
        let outputSerialization = AWSS3OutputSerialization()
        let jsonOutput = AWSS3JSONOutput()
        jsonOutput?.recordDelimiter = ","
        outputSerialization?.json = jsonOutput
        request.outputSerialization = outputSerialization
        AWSS3.default().selectObjectContent(request) { output, error in
            if error != nil {
                debugPrint(error)
            } else {
                debugPrint(output)
            }
        }

The output in console is:

Optional(<AWSS3SelectObjectContentOutput: 0x6000024c84e0> {
    payload = "<AWSS3SelectObjectContentEventStream: 0x60000289bf00> {\n}";
})

Also debugged and tried to print values for output.payload, output.payload.records, etc.. but they are all nil

If I use Proxyman to intercept the response (also tried to use the plain REST API with the same result, but I'm aware that Postman doesn't support chunked responses), I receive this. So the SDK receives some data for sure:

���ر���Uت�,�
:message-type���event�:event-type���Records
:content-type���application/octet-stream{"title":"The title that I want"},�:�0���ط���C�"l

:message-type���event�:event-type���Stats
:content-type��text/xml<Stats xmlns=""><BytesScanned>164558</BytesScanned><BytesProcessed>655360</BytesProcessed><BytesReturned>108</BytesReturned></Stats>�U�U���8���(ءئ�ش
:message-type���event�:event-type���Endد�س�

My test in a Node.js Lambda, working:

    const params = {
        Bucket: "mybucket-dev",
        Key: "myfolder/mydata.json.gz",
        ExpressionType: "SQL",
        Expression: "select s.title from S3Object[*][*] s limit 10",
        InputSerialization: {
            CompressionType: "GZIP",
            JSON: {
                Type: "DOCUMENT"
            },
        },
        OutputSerialization: {
            JSON: {
                RecordDelimiter: ","
            }
        }
    }
    return s3.selectObjectContent(params).promise()
        .then(data => {
            const records = []
            return new Promise((resolve) => {
                data.Payload.on("data", ({ Records, End }) => {
                    if (Records) {
                        records.push(Records.Payload)
                    } else if (End) {
                        let result = Buffer.concat(records).toString()
                        result = `[${result.replace(/\,$/, '')}]`
                        resolve(JSON.parse(result))
                    }
                });
            })
        }).catch(error => {
            return error
        })

My test in Python, working as well:

import boto3

s3 = boto3.client('s3',
                  endpoint_url='https://s3.myregion.amazonaws.com',
                  aws_access_key_id='myaccesskey',
                  aws_secret_access_key='mysecretkey',
                  region_name='myregion')

r = s3.select_object_content(
    Bucket='mybucket-dev',
    Key='myfolder/mydata.json.gz',
    ExpressionType='SQL',
    Expression="select s.title from s3object[*][*] s limit 5",
    InputSerialization={
        'JSON': {
            "Type": "DOCUMENT",
        },
        'CompressionType': 'GZIP',
    },
    OutputSerialization={'JSON': {}},
)

for event in r['Payload']:
    if 'Records' in event:
        records = event['Records']['Payload'].decode('utf-8')
        print(records)
    elif 'Stats' in event:
        statsDetails = event['Stats']['Details']
        print("Stats details bytesScanned: ")
        print(statsDetails['BytesScanned'])
        print("Stats details bytesProcessed: ")
        print(statsDetails['BytesProcessed'])

Environment(please complete the following information):

  • SDK Version: 2.23.3
  • Dependency Manager: Swift Package Manager
  • Swift Version : 5.0

Device Information (please complete the following information):

  • Device: Simulator
  • iOS Version: 14.4
  • Specific to simulators: iPhone 12 Pro Max (or any other)

CSolanaM avatar Apr 12 '21 09:04 CSolanaM

Hi @CSolanaM ,

Thank you for reporting the issue. Will you be able to provide more debug information? You can enable verbose logs by the below code to see if there are any error printed to the XCode logs:

import AWSCore

AWSDDLog.sharedInstance.logLevel = .verbose
AWSDDLog.sharedInstance.add(AWSDDTTYLogger())

royjit avatar Jan 25 '22 01:01 royjit

I am guessing the response format for SelectObjectContent is not support in the iOS SDK, https://docs.aws.amazon.com/AmazonS3/latest/API/RESTSelectObjectAppendix.html

royjit avatar Jan 25 '22 02:01 royjit

Taking this as feature request to add proper parsing of streamed response from S3 SelectObjectContent

royjit avatar Feb 04 '22 16:02 royjit

Note: The response format (as outlined in the documentation linked above) is event stream encoded. We have an existing implementation for encoding / decoding in AWSTranscribeStreamingEventDecoder, which should mostly just work but may require some tweaks.

That said, adding this would be a pretty large lift. Considering the minimal community interest in this request and AWS SDK for Swift supporting this feature, it's unlikely we'll add this to AWS SDK for iOS.

atierian avatar Dec 15 '23 18:12 atierian