RSS fetcher doesn't fetch conditionally
The RSS fetcher should be doing conditional requests, according to https://github.com/MarginaliaSearch/MarginaliaSearch/issues/136#issuecomment-2563756729, but apparently it doesn't:
$ ll /var/www/michaelnordmeyer.com/feed.xml*
-rw-r--r-- 1 user group 136066 Apr 5 11:10 /var/www/michaelnordmeyer.com/feed.xml
-rw-r--r-- 1 user group 21350 Apr 5 11:10 /var/www/michaelnordmeyer.com/feed.xml.gz
$ grep marginalia michaelnordmeyer.com.log
[04/Apr/2025:15:57:10.133 +0000] 200 193.183.0.165 "GET /feed.xml HTTP/2.0" "-" "search.marginalia.nu" gzip/21225
[05/Apr/2025:18:32:57.705 +0000] 200 193.183.0.165 "GET /feed.xml HTTP/2.0" "-" "search.marginalia.nu" gzip/21350
[06/Apr/2025:21:08:11.011 +0000] 200 193.183.0.165 "GET /feed.xml HTTP/2.0" "-" "search.marginalia.nu" gzip/21350
The last request at 06/Apr/2025:21:08:11.011 should have been conditional and should have resulted in a 304. The server's timezone is set to UTC.
I run nginx mainline:
$ nginx -v
nginx version: nginx/1.27.5
- Files are pre-gzipped and used by nginx with
gzip_static on. - Only default ETags are being used.
Short version: I think this is nginx jank in how it deals with If-Modified-Since (which is to say, poorly). In the short term I'm altering the logic to only send the If-None-Match header if it is available, and omit If-Modified-Since unless that's the only option, as that seems to solve the problem.
Long version: I've done some digging. I can't seem to make your server respect if-modified-since. With curl, I did a request and got headers from the server that were
last-modified: Thu, 24 Apr 2025 12:08:51 GMT
etag: "680a29d3-21a6d"
So I ran
curl \
-H"If-Modified-Since: Fri, 25 Apr 2025 10:49:13 GMT"\
-H"If-None-Match: \"680a29d3-21a6d\""\
-H"User-Agent: search.marginalia.nu"\
https://michaelnordmeyer.com/feed.xml
and this gives me a 200.
Though this gives me a 304:
curl \
-H"If-None-Match: \"680a29d3-21a6d\""\
-H"User-Agent: search.marginalia.nu"\
https://michaelnordmeyer.com/feed.xml
I can't get it to 304 on just If-Modified-Since.
At least according to MDN, the server should ignore If-Modified-Since in the presence of If-None-Match.
(I'm also curiously getting different E-Tags for the same endpoint when I run the search engine's feed fetcher, though I'm seeing the same behavior there, it works with just the etag, but not with both fields populated.)
As mentioned, I think this is nginx not dealing well with the If-Modified-Since header. I'm seeing the same weird behavior when testing with my blog.
Committed fix 77f727a5babff0cd6445af6f9270f410b2bc98dd
Thank you for looking into it.
As you can see from my output above, I host static files which have been pre-gzipped by me to lower request processing even more. I have updated my issue above with nginx settings.
I will investigate on my side and use your findings as well.
I even dug up the specs on this. Seems really clear on how the server should act in this scenario.
A recipient MUST ignore If-Modified-Since if the request contains an If-None-Match header field; the condition in If-None-Match is considered to be a more accurate replacement for the condition in If-Modified-Since, and the two are only combined for the sake of interoperating with older intermediaries that might not implement If-None-Match.
Though I'm too used to specs and reality being two wildly different things when dealing with web servers :P I'll write an issue on the nginx issue tracker and see what they have to say about it.
Testing with your example from above, for
curl \
-H"If-Modified-Since: Thu, 24 Apr 2025 12:08:51 GMT"\
-H"User-Agent: search.marginalia.nu"\
https://michaelnordmeyer.com/feed.xml
nginx returns a 304, which is correct. But if the date in If-Modified-Since is not the exact date from last-modified, nginx returns a 200.
It begs the question if a request using an arbitrary date for If-Modified-Since is a valid request in terms of conditional requests, because the last-modified date is the relevant date and not the request date.
But you are quite right with nginx not ignoring the If-Modified-Since in the presence of If-None-Match.
By the way, because the site is built by Jekyll, the feed.xml will be regenerated every time I push a change. And even if the feed's content didn't change, the <updated> tag of the feed always will, because it's the build date, resulting in a new ETag and last-modified response header.
And thank you for opening an issue with nginx.
Regarding the relevant date for If-Modified-Since, the RFC 9110 states:
When used for cache updates, a cache will typically use the value of the cached message's Last-Modified header field to generate the field value of If-Modified-Since. This behavior is most interoperable for cases where clocks are poorly synchronized or when the server has chosen to only honor exact timestamp matches (due to a problem with Last-Modified dates that appear to go "back in time" when the origin server's clock is corrected or a representation is restored from an archived backup).
Oh, sorry, I forgot to comment on this:
(I'm also curiously getting different E-Tags for the same endpoint when I run the search engine's feed fetcher, though I'm seeing the same behavior there, it works with just the etag, but not with both fields populated.)
Well, at least the curl requests you did above don't have gzip turned on. They're missing the --compressed parameter.
Hm, yeah, the feed fetcher is synthesizing the date based on the local clock, with the idea that since it won't revisit until at least a day later, clock skews shouldn't matter. That might cause problems for servers that do an exact match against the mtime of the file itself (as seems to be the nginx default behavior), though it shouldn't matter with an etag present.
[..] By the way, because the site is built by Jekyll, the
feed.xmlwill be regenerated every time I push a change. And even if the feed's content didn't change, the<updated>tag of the feed always will, because it's the build date, [..]
I have a fix for this at https://github.com/jekyll/jekyll-feed/pull/368. If anyone knows someone in the Jekyll community that can land this, that'd be great!