Can't access objects that have slashes in the key (ID)
Hello,
I'm testing out the new version, specifically the cloudfront-enabled version and I'm running into a wall. Logs are all showing this kind of error below. I deployed from the SAR. Just a basic deployment to test.
2022-07-11T21:16:44.995Z 5a2df24f-5d7d-43f8-95c8-40646f2eb2c1 ERROR NotFound: null at Request.extractError (/opt/nodejs/node_modules/aws-sdk/lib/services/s3.js:691:35) at Request.callListeners (/opt/nodejs/node_modules/aws-sdk/lib/sequential_executor.js:106:20) at Request.emit (/opt/nodejs/node_modules/aws-sdk/lib/sequential_executor.js:78:10) at Request.emit (/opt/nodejs/node_modules/aws-sdk/lib/request.js:686:14) at Request.transition (/opt/nodejs/node_modules/aws-sdk/lib/request.js:22:10) at AcceptorStateMachine.runTo (/opt/nodejs/node_modules/aws-sdk/lib/state_machine.js:14:12) at /opt/nodejs/node_modules/aws-sdk/lib/state_machine.js:26:10 at Request.<anonymous> (/opt/nodejs/node_modules/aws-sdk/lib/request.js:38:9) at Request.<anonymous> (/opt/nodejs/node_modules/aws-sdk/lib/request.js:688:12) at Request.callListeners (/opt/nodejs/node_modules/aws-sdk/lib/sequential_executor.js:116:18) { code: 'NotFound', region: null, time: 2022-07-11T21:16:44.990Z, requestId: 'C0C0V2MZNE9SSCKT', extendedRequestId: 'm2x+Yj50FwGfFicNhs+U2JYmw/Pe4EUuwqgMDc1TD6W3QGbA7CUVHenECjtDJ8fHouYtYp2kg3o=', cfId: undefined, statusCode: 404, retryable: false, retryDelay: 8.697074109187763 }
Thanks,
Joseph
What URL are you requesting?
- If it's
/iiif/2, try an actual image request. The formerOKresponse to the root resource broke when I switched from API Gateway to Function URL. - If it's an image request (either
info.jsonor an actual image response), make sure the image file corresponding to the ID you passed in actually exists in the S3 bucket you specified when you deployed.
I'm going to put in a fix for both of these – the former should give an OK response and the latter should return a 404 Not Found instead of failing the way it does.
Splitting into issues #74 and #75 and closing.
Hi Michael,
After some testing, it appears the issue is with accessing objects that have slashes in the key, and I am properly encoding them.
So, in the pre-lambda url version which I'm also running, this works: https://d2lezwyxk8gi6a.cloudfront.net/iiif/2/ad-test%2Fph_000207/info.json
But it doesn't work with the new version: https://bocmgyklbvlgynuva5z4kwvqny0ybsxd.lambda-url.us-east-1.on.aws/iiif/2/ad-test%2Fph_000207/info.json but this object without slashes does work https://bocmgyklbvlgynuva5z4kwvqny0ybsxd.lambda-url.us-east-1.on.aws/iiif/2/US.NNFIT.SC.8.1.2.1941-001/info.json
I wonder if this is something inherent in the Lambda URL approach because it doesn't look like anything changed regarding slashes in the code itself.
Thanks,
Joseph
Also, occurs in both cloudfront-enabled and not. It's just that I was first testing in the cloudfront version...
Hello again,
After some more testing, I've confirmed what's going. It seems that the Lambda URL converts the encoded slashes to real slashes before it ever gets to the function. So even though the initial URL is https://bocmgyklbvlgynuva5z4kwvqny0ybsxd.lambda-url.us-east-1.on.aws/iiif/2/ad-test%2Fph_000207/info.json the path in the event is /iiif/2/ad-test/ph_000207/info.json.
I've reopened this issue as a bug and renamed it to reflect what's actually going on. This is similar to (but not exactly the same as) #52, and I believe they both need to be fixed at the node-iiif level.
I have a good idea of how to fix this reliably, but it might take me a few days to find time for it.
Ok, thanks so much.
Just curious, are you thinking in the parse_url function, instead of getting id by popping off the last segment before region, getting the string between region and .../iiif/2/?
Edit: though, I guess that would presume that iiif/2 is the prefix, so maybe that wouldn't work
In node-iiif, I was trying not to dictate the path prefix, even though serverless-iiif always uses /iiif/2/. The fix (which is almost done) involves node-iiif using a default path prefix of /iiif/2/ while allowing a different one to be specified in the constructor.
The PR is in. As soon as samvera-labs/node-iiif#23 is reviewed and approved for merging, I'll be able to release v4.0.2 with the fix.
Excellent, thanks so much
Update: I was able to add CORS entries for the function URL and things work great.
Question though: is there a reason the manifest uses the function URL rather than the cloudfront one? If it did then one would only need to configure CORS on the cloudfront end point.
Thanks for the fix @mbklein. The keys I was using also comprised of slashes and this latest release addresses that.
A couple of changes I had to make to get things working for me are the following:
1 . Update resolvers.js Make the lambda handle non-tif files
const defaultStreamLocation = (id) => {
const key = id + '.tif';
return { Bucket: sourceBucket, Key: key };
};
to
const defaultStreamLocation = (id) => {
return { Bucket: sourceBucket, Key: id };
};
2. Update index.js to enable CORS
const makeResponse = (result, event) => {
const { isBase64 } = helpers;
const base64 = isBase64(result);
const content = base64 ? result.body.toString('base64') : result.body;
return {
statusCode: 200,
headers: {
'Content-Type': result.contentType,
'Access-Control-Allow-Origin': event.headers.origin,
'Access-Control-Allow-Credentials': 'true'
},
isBase64Encoded: base64,
body: content
};
};
3. Add a Response headers policy to the Cloudfront distribution created to satisfy CORS
So now my front end calls the end point at https://xxxxxxxx.cloudfront.net/iiif/2/path/to/file/in/s3/info.json that resolves fine but the @id in the manifest refers to @id: https://xxxxxxxx.lambda-url.us-east-1.on.aws/iiif/2//path/to/file/in/s3".
Not quite sure why it refers to the lambda-url rather than the cloudfront one...but then when my viewer starts asking for tiles I get a CORS error
Access to XMLHttpRequest at 'https://xxxxxxxx.lambda-url.us-east-1.on.aws/iiif/2//path/to/file/in/s3/4096,0,305,1024/153,/0/default.jpg' from origin 'https://mydomain.com' has been blocked by CORS policy: The 'Access-Control-Allow-Origin' header contains multiple values 'https://mydomain.com, *', but only one is allowed.
Am I doing something wrong here? Any pointers very much appreciated.
.tifis going to remain the default for the moment, but I've created #76 that should give you the flexibility you need without changing code.- CORS handling was removed from
index.js(in favor of letting the function URL config handle the most common case) in 4c9c741 as a fix for #71. If you want to add CORS back into the function, you'll also need to remove it from the template. This will also fix themultiple valuesissue in your third point. - Ah, this is a regression. I'll add a bug issue and fix the
@idissue as soon as I get a chance.
Thanks @mbklein.
Question: would it be problematic to turn off public access to the cache bucket? I know I can override the template myself, just wondering if having it be default would work for your common use case. So you adding the following to the template.
PublicAccessBlockConfiguration:
BlockPublicAcls : true
BlockPublicPolicy : true
IgnorePublicAcls : true
RestrictPublicBuckets : true
The cache bucket was never supposed to be publicly accessible. If it is, that's a bug. It should only be accessible by the CloudFront Origin Access Identity by policy.
Please see this comment for a discussion of the @id hostname issue. I had run into this before, but forgot what the real issue is (because we use a custom hostname).
Thanks @mbklein.
Question: would it be problematic to turn off public access to the cache bucket? I know I can override the template myself, just wondering if having it be default would work for your common use case. So you adding the following to the template.
Added in 5be56ee
Thanks @mbklein. Question: would it be problematic to turn off public access to the cache bucket? I know I can override the template myself, just wondering if having it be default would work for your common use case. So you adding the following to the template.
Added in 5be56ee
Thanks for adding this @mbklein. One another nicety would be to add a comment for the distribution just so it displays more obviously in the AWS console.
I added the following to the template:
CachingEndpoint:
Type: "AWS::CloudFront::Distribution"
Properties:
DistributionConfig:
Comment: !Sub "${AWS::StackName} Cloudfront Distribution"
Hi @mbklein,
Back to the original slash issue, everything is working (thanks so much!!), but I did notice that the id in the info.json file that gets written includes the decoded slashes. That's probably an easy fix (just re-encode the id when writing that file in iiif-processor), but I'm not sure if it's really a big issue because everything works for the application either way, whether they're decoded or encoded. I just happened to notice it because it didn't jive with the authentication function I was working on, but I was able get that working on my end.
But for instance if you make a request at .../iiif/2/ad-test%2Fph_000198/info.json the info.json reads "@id":".../iiif/2/ad-test/ph_000198"
Thanks,
Joseph