aws-sdk-js
aws-sdk-js copied to clipboard
Query JSON content from S3 not working with where clause
Describe the bug
I'm trying to query some json content from s3 bucket it works fine when I select *
from the bucket but once I specify LIMIT
id do nothing returns whole data and when specify WHERE
it returns empty response.
-
data
[ { "id": 1, "name": "abcd", "dataIds": [ 2, 1, 2 ], "attributes": [ "name", "picUrl", "email", "linkedinUrl", "job" ] }, { "id": 2, "name": "zxy", "dataIds": [ 2, 1, 1 ] } ]
-
aws s3api select-object-content --bucket e-commerce-uploads --key Content.json --expression "select * from s3object data where data.id = 1" --expression-type 'SQL' --input-serialization '{"JSON": {"Type":"DOCUMENT"}, "CompressionType": "NONE"}' --output-serialization '{"JSON": {}}' "json_output.json"
Expected Behavior
When the query is like
SELECT * from s3object data where data.id = 1
I should get
{ "id": 1, "name": "abcd", "dataIds": [ 2, 1, 2 ], "attributes": [ "name", "picUrl", "email", "linkedinUrl", "job" ] }
or even when specify limit like this
select * from s3object limit 1
I should get one object
{ "id": 1, "name": "abcd", "dataIds": [ 2, 1, 2 ], "attributes": [ "name", "picUrl", "email", "linkedinUrl", "job" ] }
Current Behavior
returns empty file.
Reproduction Steps
aws s3api select-object-content --bucket e-commerce-uploads --key Content.json --expression "select * from s3object data where data.id = 1" --expression-type 'SQL' --input-serialization '{"JSON": {"Type":"DOCUMENT"}, "CompressionType": "NONE"}' --output-serialization '{"JSON": {}}' "json_output.json"
aws s3api select-object-content --bucket e-commerce-uploads --key Content.json --expression "select * from s3object data LIMIT 1" --expression-type 'SQL' --input-serialization '{"JSON": {"Type":"DOCUMENT"}, "CompressionType": "NONE"}' --output-serialization '{"JSON": {}}' "json_output.json"
** DB.js**
import { S3 } from '@aws-sdk/client-s3';
class S3DB {
constructor(s3Region) {
this.region = s3Region;
this.client = new S3({ region: s3Region });
}
query(queryString, bucketName, fileName) {
console.log('query string', queryString);
return new Promise((resolve, reject) => {
this.client.selectObjectContent({
Bucket: bucketName,
Key: fileName,
ExpressionType: 'SQL',
Expression: queryString,
InputSerialization: {
JSON: {
Type: "DOCUMENT"
},
CompressionType: 'NONE'
},
OutputSerialization: {
JSON: {
RecordDelimiter: ' '
}
}
}).then(async ({ $metadata, Payload }) => {
console.log("meta data===>", $metadata);
console.log("Payload", Payload);
let array = [];
for await (const data of Payload) {
console.log(data);
if (data.Records)
array.push(Buffer.from(data.Records.Payload).toString('utf8'));
}
console.log('result array', array);
});
});
}
}
export default S3DB;
handler
const bucketName = "e-commerce-uploads";
const fileName = "Content.json";
const region = 'eu-west-1';
const db = new DB(region);
db.query(`SELECT * FROM S3Object[*] AS data WHERE id = 1`, bucketName, fileName)
.then(data => {
console.log('query data', data);
}).catch(err => {
console.log('query error', err);
});
Possible Solution
No response
Additional Information/Context
No response
SDK version used
"@aws-sdk/client-s3": "^3.110.0" aws-cli/2.7.3 Python/3.9.11 Windows/10 exe/AMD64 prompt/off
Environment details (OS name and version, etc.)
Windows 10
@IIslam thanks for opening this issue, I see you are using V3 of the SDK, can you share what error you get when you run selectobjectcontent
operation.
Hi @ajredniwja I didn't get any errors but I get no data when I add the where clause
Hi @ajredniwja I'm waiting for your reply.