moto icon indicating copy to clipboard operation
moto copied to clipboard

Support for S3 select?

Open JPFrancoia opened this issue 4 years ago • 6 comments

It looks like the S3 select feature isn't supported by moto at the moment. Moto version: moto==1.3.16.

Here is a small snippet of code to demonstrate:

import boto3
import awswrangler as wr
import pandas as pd
from moto import mock_s3, mock_glue


@mock_s3
@mock_glue
def test_s3_select():

    s3_client = boto3.client("s3", region_name="us-east-1")
    s3_client.create_bucket(Bucket="bucket")

    wr.s3.to_csv(df=pd.DataFrame({'col': [1, 2, 3]}), path="s3://bucket/prefix/test.csv")

    r = s3_client.select_object_content(
        Bucket="bucket",
        Key="prefix/test.csv",
        ExpressionType="SQL",
        Expression="select count(*) from s3object",
        InputSerialization={
            "CSV": {"FileHeaderInfo": "Use", "AllowQuotedRecordDelimiter": True}
        },
        OutputSerialization={"CSV": {}},
    )

    print(r)

test_s3_select()

It just creates a bucket, and upload a dummy csv file into the bucket using aws-data-wrangler.

This code works on a real csv file stored in a real bucket, but not when S3 is mocked with moto. Here is the stacktrace I get:

Traceback (most recent call last):
  File "test_select.py", line 34, in <module>
    test_s3_select()
  File "/Users/jpfrancoia/Library/Caches/pypoetry/virtualenvs/glue-jobs-KVdNkci1-py3.6/lib/python3.6/site-packages/moto/core/models.py", line 100, in wrapper
    result = func(*args, **kwargs)
  File "/Users/jpfrancoia/Library/Caches/pypoetry/virtualenvs/glue-jobs-KVdNkci1-py3.6/lib/python3.6/site-packages/moto/core/models.py", line 100, in wrapper
    result = func(*args, **kwargs)
  File "test_select.py", line 26, in test_s3_select
    OutputSerialization={"CSV": {}},
  File "/Users/jpfrancoia/Library/Caches/pypoetry/virtualenvs/glue-jobs-KVdNkci1-py3.6/lib/python3.6/site-packages/botocore/client.py", line 316, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/Users/jpfrancoia/Library/Caches/pypoetry/virtualenvs/glue-jobs-KVdNkci1-py3.6/lib/python3.6/site-packages/botocore/client.py", line 622, in _make_api_call
    operation_model, request_dict, request_context)
  File "/Users/jpfrancoia/Library/Caches/pypoetry/virtualenvs/glue-jobs-KVdNkci1-py3.6/lib/python3.6/site-packages/botocore/client.py", line 641, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "/Users/jpfrancoia/Library/Caches/pypoetry/virtualenvs/glue-jobs-KVdNkci1-py3.6/lib/python3.6/site-packages/botocore/endpoint.py", line 102, in make_request
    return self._send_request(request_dict, operation_model)
  File "/Users/jpfrancoia/Library/Caches/pypoetry/virtualenvs/glue-jobs-KVdNkci1-py3.6/lib/python3.6/site-packages/botocore/endpoint.py", line 137, in _send_request
    success_response, exception):
  File "/Users/jpfrancoia/Library/Caches/pypoetry/virtualenvs/glue-jobs-KVdNkci1-py3.6/lib/python3.6/site-packages/botocore/endpoint.py", line 256, in _needs_retry
    caught_exception=caught_exception, request_dict=request_dict)
  File "/Users/jpfrancoia/Library/Caches/pypoetry/virtualenvs/glue-jobs-KVdNkci1-py3.6/lib/python3.6/site-packages/botocore/hooks.py", line 356, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/Users/jpfrancoia/Library/Caches/pypoetry/virtualenvs/glue-jobs-KVdNkci1-py3.6/lib/python3.6/site-packages/botocore/hooks.py", line 228, in emit
    return self._emit(event_name, kwargs)
  File "/Users/jpfrancoia/Library/Caches/pypoetry/virtualenvs/glue-jobs-KVdNkci1-py3.6/lib/python3.6/site-packages/botocore/hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File "/Users/jpfrancoia/Library/Caches/pypoetry/virtualenvs/glue-jobs-KVdNkci1-py3.6/lib/python3.6/site-packages/botocore/retryhandler.py", line 183, in __call__
    if self._checker(attempts, response, caught_exception):
  File "/Users/jpfrancoia/Library/Caches/pypoetry/virtualenvs/glue-jobs-KVdNkci1-py3.6/lib/python3.6/site-packages/botocore/retryhandler.py", line 251, in __call__
    caught_exception)
  File "/Users/jpfrancoia/Library/Caches/pypoetry/virtualenvs/glue-jobs-KVdNkci1-py3.6/lib/python3.6/site-packages/botocore/retryhandler.py", line 269, in _should_retry
    return self._checker(attempt_number, response, caught_exception)
  File "/Users/jpfrancoia/Library/Caches/pypoetry/virtualenvs/glue-jobs-KVdNkci1-py3.6/lib/python3.6/site-packages/botocore/retryhandler.py", line 317, in __call__
    caught_exception)
  File "/Users/jpfrancoia/Library/Caches/pypoetry/virtualenvs/glue-jobs-KVdNkci1-py3.6/lib/python3.6/site-packages/botocore/retryhandler.py", line 223, in __call__
    attempt_number, caught_exception)
  File "/Users/jpfrancoia/Library/Caches/pypoetry/virtualenvs/glue-jobs-KVdNkci1-py3.6/lib/python3.6/site-packages/botocore/retryhandler.py", line 359, in _check_caught_exception
    raise caught_exception
  File "/Users/jpfrancoia/Library/Caches/pypoetry/virtualenvs/glue-jobs-KVdNkci1-py3.6/lib/python3.6/site-packages/botocore/endpoint.py", line 197, in _do_get_response
    responses = self._event_emitter.emit(event_name, request=request)
  File "/Users/jpfrancoia/Library/Caches/pypoetry/virtualenvs/glue-jobs-KVdNkci1-py3.6/lib/python3.6/site-packages/botocore/hooks.py", line 356, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/Users/jpfrancoia/Library/Caches/pypoetry/virtualenvs/glue-jobs-KVdNkci1-py3.6/lib/python3.6/site-packages/botocore/hooks.py", line 228, in emit
    return self._emit(event_name, kwargs)
  File "/Users/jpfrancoia/Library/Caches/pypoetry/virtualenvs/glue-jobs-KVdNkci1-py3.6/lib/python3.6/site-packages/botocore/hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File "/Users/jpfrancoia/Library/Caches/pypoetry/virtualenvs/glue-jobs-KVdNkci1-py3.6/lib/python3.6/site-packages/moto/core/models.py", line 323, in __call__
    request, request.url, request.headers
  File "/Users/jpfrancoia/Library/Caches/pypoetry/virtualenvs/glue-jobs-KVdNkci1-py3.6/lib/python3.6/site-packages/moto/s3/responses.py", line 958, in key_or_control_response
    response = self._key_response(request, full_url, headers)
  File "/Users/jpfrancoia/Library/Caches/pypoetry/virtualenvs/glue-jobs-KVdNkci1-py3.6/lib/python3.6/site-packages/moto/s3/responses.py", line 1122, in _key_response
    return self._key_response_post(request, body, bucket_name, query, key_name)
  File "/Users/jpfrancoia/Library/Caches/pypoetry/virtualenvs/glue-jobs-KVdNkci1-py3.6/lib/python3.6/site-packages/moto/s3/responses.py", line 1654, in _key_response_post
    self._set_action("KEY", "POST", query)
  File "/Users/jpfrancoia/Library/Caches/pypoetry/virtualenvs/glue-jobs-KVdNkci1-py3.6/lib/python3.6/site-packages/moto/s3/responses.py", line 524, in _set_action
    self.data["Action"] = ACTION_MAP[action_resource_type][method]["DEFAULT"]
KeyError: 'DEFAULT'

Is this expected? If yes, do you know if it's possible to support this feature?

Kind regards

JPFrancoia avatar Sep 09 '20 09:09 JPFrancoia

Hi @JPFrancoia, what version of moto are you on? The expected behaviour for the latest version would be to throw a NotYetImplemented-exception, to make it more obvious what's going on.

It should be possible to support this though, with the limitation that it would always return a static result (regardless of the data/SQL input).

Marking it as an enhancement.

bblommers avatar Sep 09 '20 10:09 bblommers

Hi @bblommers , thanks for your answer. I amended my question to include moto's version (it's the latest on pypi at the moment).

It should be possible to support this though, with the limitation that it would always return a static result (regardless of the data/SQL input).

Just to make sure I understand, in this case the SQL query is a simple count. Will I get a number back? What about the serialization formats?

JPFrancoia avatar Sep 09 '20 11:09 JPFrancoia

Hi @JPFrancoia, the exact implementation details will depend on whoever wants to tackle this, and how specific they want to make it. As a community-driven project, anyone can send a PR with this feature - so I can't say how many SQL features/functions will be included.

bblommers avatar Sep 10 '20 07:09 bblommers

Some thoughts on this: S3 uses PartiQL to execute these statements

DynamoDB now also supports the ability to query data using PartiQL (using the new execute_statement())

Supposedly it's also used in Redshift and QLDR.

Support for PartiQL in moto would be very useful, considering the range of services. As far as I know the only client is currently written in Kotlin, so there's no easy integration though.

bblommers avatar Dec 05 '20 11:12 bblommers

Any chances this one is in the roadmap? Would be a helpful testing enhancement for the project I'm working on.

ospikovets avatar Apr 11 '22 10:04 ospikovets

Any chances this one is in the roadmap? Would be a helpful testing enhancement for the project I'm working on.

  • 1 ??

ketan-nabera avatar Jun 30 '22 05:06 ketan-nabera

An initial, experimental implementation of S3 Select is now available as of moto >= 4.1.6.dev16.

@JPFrancoia Your example above now returns the correct result. Note that this feature includes a new dependency, so make sure that you're installing moto[s3] or moto[all] to automatically pull this in.

Not all Select-features are covered yet - considering the amount of possible queries, I expect to only add support for them on a case-by-case basis. I know that this is a popular feature, so if anyone runs into an issue with a specific query, please open a new issue for that - that is going to make it much easier to track.

bblommers avatar Mar 21 '23 21:03 bblommers