django-sendfile2
django-sendfile2 copied to clipboard
Optimize using the library with Nginx as a proxy to AWS S3
Hello,
Introduction
Many years ago I made some changes for the sole purpose of integrating this module into both personal and professional projects that are running on Amazon Web Services and using S3 as a backend for the media assets.
This module helped me to build a secured yet scalable and efficient assets storage by proxyfing S3 with Nginx. The applications are offloading data transfers to Nginx thanks to X-Sendfile header.
In such a setup, trivial operations such as computing mime type or file size requires unnecessary roundtrips (and data transfer) from S3 to the servers. I investigated the issue and found a neat way to optimize such a setup.
Basically I had two means to optimize the storage :
- Precompute values (e.g. mimetype) and store into the application's DB or set define it in code.
- Let the backend (Nginx + S3) do their job (return HTTP 404 if file is missing, add Content-Type, Length headers, etc).
Actually there two projects running in production (Django and Flask) with this module (my version of it).
Link
You can found the original commits here : https://github.com/davidfischer-ch/django-sendfile (branch dev).
Changes
Various features coming from 25/12/2017:
- Add check_exist parameter
- Avoid I/O ops when not necessary
- Make Content headers optional: Web Engine can also manage them!
And some tiny cleanups.
TODO
- Adding some documentation
- Adding some more tests
Contributions to both are welcome!
I am glad the project is maintained and covered with tests !
This is powered by Django REST Framework and deliver both pictures and videos (HTTP streaming):
...
sendfile_image_args = {
'check_exist': False,
'encoding': None,
'filesize': None,
'mimetype': 'image/jpeg'
}
class PictureDownloadView(base.LoggedAPIView):
def get(self, request, picture_pk):
picture = get_object_or_404(models.Picture.objects.for_user(request.user), pk=picture_pk)
picture.downloaded_by.add(request.user)
return sendfile(
request,
picture.filename_absolute,
attachment=True,
**base.sendfile_image_args)
class PictureThumbnailView(base.LoggedAPIView):
def get(self, request, picture_pk, picture_checksum):
# Checksum not used since its only relevant for cache invalidation
picture = get_object_or_404(models.Picture.objects.for_user(request.user), pk=picture_pk)
return sendfile(
request,
picture.thumbnail_absolute,
attachment_filename=False,
**base.sendfile_image_args)
...
@davegaeddert , out of curiosity, since you are using S3, why are you still using django-xsendfile when you can use a pre-signed URL and eliminate binary data transfer by your infrastructure entirely?
See example of release asset in GitHub:
$ curl 'https://github.com/moggers87/django-sendfile2/releases/download/v0.6.0/django-sendfile2-0.6.0.tar.gz' -I
HTTP/1.1 302 Found
date: Sat, 28 Nov 2020 17:49:28 GMT
content-type: text/html; charset=utf-8
server: GitHub.com
status: 302 Found
vary: X-PJAX, Accept-Encoding, Accept, X-Requested-With, Accept-Encoding
location: https://github-production-release-asset-2e65be.s3.amazonaws.com/126026074/ef7b7180-b0e7-11ea-95cf-db432d93813a?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20201128%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20201128T174928Z&X-Amz-Expires=300&X-Amz-Signature=aeb75cb5d6190aae13c1808c31dd219ef522b3d83beae34051337fb4026e24af&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=126026074&response-content-disposition=attachment%3B%20filename%3Ddjango-sendfile2-0.6.0.tar.gz&response-content-type=application%2Foctet-stream
cache-control: no-cache
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 1; mode=block
referrer-policy: no-referrer-when-downgrade
expect-ct: max-age=2592000, report-uri="https://api.github.com/_private/browser/errors"
content-security-policy: default-src 'none'; base-uri 'self'; block-all-mixed-content; connect-src 'self' uploads.github.com www.githubstatus.com collector.githubapp.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com cdn.optimizely.com logx.optimizely.com/v1/events wss://alive.github.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com; frame-ancestors 'none'; frame-src render.githubusercontent.com; img-src 'self' data: github.githubassets.com identicons.github.com collector.githubapp.com github-cloud.s3.amazonaws.com *.githubusercontent.com; manifest-src 'self'; media-src 'none'; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; worker-src github.com/socket-worker.js gist.github.com/socket-worker.js
Set-Cookie: _gh_sess=2UauZhgJ1BrtF8UKHq4YKLsvDg6ZMAGgTJO6slfP2OYyePT8gAuM2RbJFc6uyGb4HGgBJZxDXDSbrr1QeHEpYLDHvF5NvRJFYbXJb9sRP77kvRJVM0df4NvzoD5JNeC8GtgXW52G6dIq%2BaZl1h9vzWN%2BMC2rZ1oUU2IdjAwh%2BbGe0ApqCAfROgBK1sjE4uZUmzUvUasmrOsFGT%2Fb5piIsjOPtilL06GUfMiQ66ze4Nfv%2Bq6qKHqhBWzELdMhKqhyv8w%2BaDf%2F9XmBIdfDhkri6A%3D%3D--fpTwqiEkBJ0ntjD1--8ktqEONCNT4GO%2FTeURlMtQ%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
Set-Cookie: _octo=GH1.1.1596684094.1606585767; Path=/; Domain=github.com; Expires=Sun, 28 Nov 2021 17:49:27 GMT; Secure; SameSite=Lax
Set-Cookie: logged_in=no; Path=/; Domain=github.com; Expires=Sun, 28 Nov 2021 17:49:27 GMT; HttpOnly; Secure; SameSite=Lax
Content-Length: 655
X-GitHub-Request-Id: 9DAE:ED58:760B954:9B1369A:5FC28DA7
Hello @ad-m,
The Website is hosting more than 25'000 pictures and 2'000 videos of my private collection. There are more than 60 family and friends and none of them can access to exactly the same albums (pictures and videos).
The access policy is based on albums and granted to users by group.
So basically in order to guarantee a fine grained access to pictures and videos, I had to compute for every requested file if the user is granted access and deliver it with Nginx. First version (back in 2016) was hosted in a dedicated server not on AWS. I took this path and never had any scaling issues so never looked back to play with presigned URLs.
However I implemented it for the video streaming part.
Thank you.
Any feedback welcome ...
Those quote changes still haven't been removed yet :cat:
Codecov Report
Merging #45 (a337404) into main (fdf4d72) will decrease coverage by
1.08%
. The diff coverage is76.47%
.
@@ Coverage Diff @@
## main #45 +/- ##
==========================================
- Coverage 81.95% 80.88% -1.08%
==========================================
Files 7 7
Lines 133 136 +3
Branches 17 20 +3
==========================================
+ Hits 109 110 +1
+ Misses 19 18 -1
- Partials 5 8 +3
Impacted Files | Coverage Δ | |
---|---|---|
django_sendfile/utils.py | 94.66% <75.00%> (-2.52%) |
:arrow_down: |
django_sendfile/backends/xsendfile.py | 100.00% <100.00%> (ø) |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more