django-sendfile2 icon indicating copy to clipboard operation
django-sendfile2 copied to clipboard

Optimize using the library with Nginx as a proxy to AWS S3

Open davidfischer-ch opened this issue 3 years ago • 7 comments

Hello,

Introduction

Many years ago I made some changes for the sole purpose of integrating this module into both personal and professional projects that are running on Amazon Web Services and using S3 as a backend for the media assets.

This module helped me to build a secured yet scalable and efficient assets storage by proxyfing S3 with Nginx. The applications are offloading data transfers to Nginx thanks to X-Sendfile header.

In such a setup, trivial operations such as computing mime type or file size requires unnecessary roundtrips (and data transfer) from S3 to the servers. I investigated the issue and found a neat way to optimize such a setup.

Basically I had two means to optimize the storage :

  • Precompute values (e.g. mimetype) and store into the application's DB or set define it in code.
  • Let the backend (Nginx + S3) do their job (return HTTP 404 if file is missing, add Content-Type, Length headers, etc).

Actually there two projects running in production (Django and Flask) with this module (my version of it).

Link

You can found the original commits here : https://github.com/davidfischer-ch/django-sendfile (branch dev).

Changes

Various features coming from 25/12/2017:

  • Add check_exist parameter
  • Avoid I/O ops when not necessary
  • Make Content headers optional: Web Engine can also manage them!

And some tiny cleanups.

TODO

  • Adding some documentation
  • Adding some more tests

Contributions to both are welcome!

davidfischer-ch avatar Nov 27 '20 21:11 davidfischer-ch

I am glad the project is maintained and covered with tests !

davidfischer-ch avatar Nov 27 '20 22:11 davidfischer-ch

This is powered by Django REST Framework and deliver both pictures and videos (HTTP streaming):

...

sendfile_image_args = {
    'check_exist': False,
    'encoding': None,
    'filesize': None,
    'mimetype': 'image/jpeg'
}

class PictureDownloadView(base.LoggedAPIView):

    def get(self, request, picture_pk):
        picture = get_object_or_404(models.Picture.objects.for_user(request.user), pk=picture_pk)
        picture.downloaded_by.add(request.user)
        return sendfile(
            request,
            picture.filename_absolute,
            attachment=True,
            **base.sendfile_image_args)


class PictureThumbnailView(base.LoggedAPIView):

    def get(self, request, picture_pk, picture_checksum):
        # Checksum not used since its only relevant for cache invalidation
        picture = get_object_or_404(models.Picture.objects.for_user(request.user), pk=picture_pk)
        return sendfile(
            request,
            picture.thumbnail_absolute,
            attachment_filename=False,
            **base.sendfile_image_args)

...

davidfischer-ch avatar Nov 27 '20 22:11 davidfischer-ch

@davegaeddert , out of curiosity, since you are using S3, why are you still using django-xsendfile when you can use a pre-signed URL and eliminate binary data transfer by your infrastructure entirely?

See example of release asset in GitHub:

$ curl 'https://github.com/moggers87/django-sendfile2/releases/download/v0.6.0/django-sendfile2-0.6.0.tar.gz' -I
HTTP/1.1 302 Found
date: Sat, 28 Nov 2020 17:49:28 GMT
content-type: text/html; charset=utf-8
server: GitHub.com
status: 302 Found
vary: X-PJAX, Accept-Encoding, Accept, X-Requested-With, Accept-Encoding
location: https://github-production-release-asset-2e65be.s3.amazonaws.com/126026074/ef7b7180-b0e7-11ea-95cf-db432d93813a?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20201128%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20201128T174928Z&X-Amz-Expires=300&X-Amz-Signature=aeb75cb5d6190aae13c1808c31dd219ef522b3d83beae34051337fb4026e24af&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=126026074&response-content-disposition=attachment%3B%20filename%3Ddjango-sendfile2-0.6.0.tar.gz&response-content-type=application%2Foctet-stream
cache-control: no-cache
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 1; mode=block
referrer-policy: no-referrer-when-downgrade
expect-ct: max-age=2592000, report-uri="https://api.github.com/_private/browser/errors"
content-security-policy: default-src 'none'; base-uri 'self'; block-all-mixed-content; connect-src 'self' uploads.github.com www.githubstatus.com collector.githubapp.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com cdn.optimizely.com logx.optimizely.com/v1/events wss://alive.github.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com; frame-ancestors 'none'; frame-src render.githubusercontent.com; img-src 'self' data: github.githubassets.com identicons.github.com collector.githubapp.com github-cloud.s3.amazonaws.com *.githubusercontent.com; manifest-src 'self'; media-src 'none'; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; worker-src github.com/socket-worker.js gist.github.com/socket-worker.js
Set-Cookie: _gh_sess=2UauZhgJ1BrtF8UKHq4YKLsvDg6ZMAGgTJO6slfP2OYyePT8gAuM2RbJFc6uyGb4HGgBJZxDXDSbrr1QeHEpYLDHvF5NvRJFYbXJb9sRP77kvRJVM0df4NvzoD5JNeC8GtgXW52G6dIq%2BaZl1h9vzWN%2BMC2rZ1oUU2IdjAwh%2BbGe0ApqCAfROgBK1sjE4uZUmzUvUasmrOsFGT%2Fb5piIsjOPtilL06GUfMiQ66ze4Nfv%2Bq6qKHqhBWzELdMhKqhyv8w%2BaDf%2F9XmBIdfDhkri6A%3D%3D--fpTwqiEkBJ0ntjD1--8ktqEONCNT4GO%2FTeURlMtQ%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
Set-Cookie: _octo=GH1.1.1596684094.1606585767; Path=/; Domain=github.com; Expires=Sun, 28 Nov 2021 17:49:27 GMT; Secure; SameSite=Lax
Set-Cookie: logged_in=no; Path=/; Domain=github.com; Expires=Sun, 28 Nov 2021 17:49:27 GMT; HttpOnly; Secure; SameSite=Lax
Content-Length: 655
X-GitHub-Request-Id: 9DAE:ED58:760B954:9B1369A:5FC28DA7

ad-m avatar Nov 28 '20 17:11 ad-m

Hello @ad-m,

The Website is hosting more than 25'000 pictures and 2'000 videos of my private collection. There are more than 60 family and friends and none of them can access to exactly the same albums (pictures and videos).

The access policy is based on albums and granted to users by group.

So basically in order to guarantee a fine grained access to pictures and videos, I had to compute for every requested file if the user is granted access and deliver it with Nginx. First version (back in 2016) was hosted in a dedicated server not on AWS. I took this path and never had any scaling issues so never looked back to play with presigned URLs.

However I implemented it for the video streaming part.

Thank you.

davidfischer-ch avatar Nov 28 '20 19:11 davidfischer-ch

Any feedback welcome ...

davidfischer-ch avatar Dec 14 '20 14:12 davidfischer-ch

Those quote changes still haven't been removed yet :cat:

moggers87 avatar Dec 14 '20 17:12 moggers87

Codecov Report

Merging #45 (a337404) into main (fdf4d72) will decrease coverage by 1.08%. The diff coverage is 76.47%.

@@            Coverage Diff             @@
##             main      #45      +/-   ##
==========================================
- Coverage   81.95%   80.88%   -1.08%     
==========================================
  Files           7        7              
  Lines         133      136       +3     
  Branches       17       20       +3     
==========================================
+ Hits          109      110       +1     
+ Misses         19       18       -1     
- Partials        5        8       +3     
Impacted Files Coverage Δ
django_sendfile/utils.py 94.66% <75.00%> (-2.52%) :arrow_down:
django_sendfile/backends/xsendfile.py 100.00% <100.00%> (ø)

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Jun 20 '23 22:06 codecov[bot]