django-robots
django-robots copied to clipboard
[BUG] HOST url scheme without https
Hello!
I am using django + nginx + https
if set ROBOTS_USE_SCHEME_IN_HOST = True
I get this result in the robots.txt: Host: http://site.com
But expected: Host: https://site.com
Maybe it happens because of using nginx. nginx proxies traffic to gunicorn via http
location / {
proxy_pass http://web:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
How i can fix it?
I think you're right that your issue stems from how you use nginx. According to the docs, it looks like ROBOTS_USE_SCHEME_IN_HOST
uses the protocol of the current request. I imagine what's happening with your current setup is that since the requests that gunicorn gets are http, ROBOTS_USE_SCHEME_IN_HOST
adds http instead of https.
I'm using gunicorn without nginx in front of it, and redirect everything in my DNS to https, so ROBOTS_USE_SCHEME_IN_HOST = True
properly sets the Host
to use https for me.
The way this is handled in Django is leveraging a secure header (e.g. one you have scrubbed and maintained as safe, typically X-Forwarded-Proto
) which seems like it could be pulled into for this project as well. See this link for their implementation details. I would guess that this part of code could be adjusted to use a setting (defaulting to X-Forwarded-Proto if enabled?, and allow other headers to be specified if needed)
https://github.com/jazzband/django-robots/blob/f484c2a6abcae6244f860ad077af28d4be62037e/robots/views.py#L43-L47
To fix this issue, you can try updating your nginx configuration to add the X-Forwarded-Proto header. This will enable django-robots to correctly detect the protocol used and generate the correct Host value in the robots.txt file.
Here's an example of how to update your nginx configuration:
location / { proxy_pass http://web:8080/; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; # Add this line }
This will set the X-Forwarded-Proto header to the value of $scheme, which should be https when using HTTPS. Django will then use this header to correctly detect the protocol and generate the correct Host value in the robots.txt file.
It looks like the issue you're encountering is due to the fact that your Django application is not aware of the fact that it's being served over HTTPS by Nginx. When the header X-Forwarded-Proto is not properly set, Django assumes the request is coming over HTTP, hence the resulting Host using the incorrect scheme in the robots.txt.
To fix this, you have a couple of options:
-
Set the X-Forwarded-Proto header in your Nginx configuration. You can do this by adding the following line to your Nginx location block:
proxy_set_header X-Forwarded-Proto $scheme;
This tells Nginx to forward the scheme (HTTP or HTTPS) that the user connected with to your Django application. Then, in your Django settings file (settings.py), add the following line to make Django aware of the X-Forwarded-Proto header:
SECURE_PROXY_SSL_HEADER = ('HTTP_X_FORWARDED_PROTO', 'https')
This tells Django to trust the X-Forwarded-Proto header and treat the connection as HTTPS if the header is set to 'https'.
-
Alternatively, if you want your site to strictly use HTTPS, you can directly modify the django-robots package to always use https as the scheme in the Host line. To do this, locate the robots/views.py file in the django-robots package and find the following line:
host = full_host(request)
Change this line to:
host = 'https://' + request.get_host()
This will directly set the scheme to https for the Host line in your robots.txt. Keep in mind that modifying the package directly is not recommended, as it could cause issues when updating the package or deploying your project.
After making the necessary changes, restart your Nginx and Django services to ensure your changes take effect.