django-seo-js icon indicating copy to clipboard operation
django-seo-js copied to clipboard

Django with django-seo-js returning "invalid code length" error to browser and curl

Open jdotjdot opened this issue 9 years ago • 2 comments

Hey,

All of the sudden, Google, Facebook, and other crawlers started reporting my website as being unavailable. We're using django-seo-js==0.2.4 and the paid version of Prerender.io. Testing using _escaped_fragment_ in the browser and using cURL showed that responses are for some reason returning invalid: image

curl: (61) Error while processing content unencoding: invalid code lengths set

I made no changes to anything affecting Prerender.io nor this configuration of this library; this just started happening out of the blue.

Doing some deeper walkthroughs through the code, Prerender.io appears to cache the content correctly, and calling self.backend.get_response_for_url(url) also returns a response with the correctly rendered HTML content, including getting the response from Prerender and transforming the requests response into a Django HttpResponse object.

When that gets returned, though, for some reason both the browser and curl think it's invalid.

I've done plenty of debugging but I'm a bit at a loss here; all I can think of is that base.py:56 is too naive with r['content-length'] = len(response.content), or it's some type of gzip issue, where somehow headers or encodings or getting passed on that shouldn't be.

Ultimately, though, my site is currently not crawlable, and that's obviously a major issue for us.

jdotjdot avatar Apr 09 '16 20:04 jdotjdot

Some more research on this is showing that it might be because django-seo-js depends on requests 2.2.1, which is an older version of requests. It may be incompatible with the current requests 2.9.1.

jdotjdot avatar Apr 09 '16 21:04 jdotjdot

It ended up turning out that the issue was django_seo_js is passing on a Content-Encoding header from PrerenderIO, which is causing all the problems.

Subclassing with the below code fixed it:

from django.http import HttpResponse
from django_seo_js.backends import PrerenderIO
from django_seo_js.backends.base import RequestsBasedBackend, IGNORED_HEADERS

class FixedRequestsBasedBackend(RequestsBasedBackend):

    def build_django_response_from_requests_response(self, response):
        # Key difference -- we're excluding "content-encoding" from the response
        r = HttpResponse(response.content)
        for k, v in response.headers.items():
            if k.lower() not in IGNORED_HEADERS:
                r[k] = v
        r['content-length'] = len(response.content)
        r.status_code = response.status_code
        return r


class FixedPrerenderIO(FixedRequestsBasedBackend, PrerenderIO):
    pass

jdotjdot avatar Apr 09 '16 22:04 jdotjdot