souin icon indicating copy to clipboard operation
souin copied to clipboard

[bug] blank page cached on context canceled

Open frederichoule opened this issue 1 year ago • 53 comments

I have a recurring error where the homepage of the website is blank.

When I check the key value in redis, it looks like this:

HTTP/0.0 200 OK
Date: Fri, 21 Apr 2023 18:44:19 GMT
X-Souin-Stored-Ttl: 24h0m0s
Content-Length: 0

What strikes to me is the key name, which is GET-https-www.website.com-/, which all the other keys have {-VARY-} appended to them. If I delete the key, and refresh the website, the new key that appears is GET-https-www.website.com-/{-VARY-}Accept-Encoding:gzip, deflate, br, and now the page isn't blank anymore.

What would cause that?

frederichoule avatar Apr 21 '23 18:04 frederichoule

For reference, a working key contains this:

HTTP/0.0 200 OK
Cache-Control: public, max-age=86400, must-revalidate
Content-Encoding: gzip
Content-Type: text/html; charset=UTF-8
Date: Fri, 21 Apr 2023 18:58:53 GMT
Server: Caddy
Server: Google Frontend
Vary: Accept-Encoding
X-Cloud-Trace-Context: 63177d3330a11c9b0a44a650162499a1

Meaning it really comes from the backend server.

frederichoule avatar Apr 21 '23 19:04 frederichoule

Unrelated: why the HTTP/0.0 ?

frederichoule avatar Apr 21 '23 20:04 frederichoule

Hey @frederichoule that's weird. Do you have a reproducible repository/example please? To explain how the vary works under the hood, it will list all keys starting with (in your case) GET-https-www.website.com-/($|\{VARY\}), either the key ends by the trailing or has the vary separator. If the first case is the first key to be returned, the key doesn't have any vary headers to validate, so it will match the client expectations and returns the content to the user. It will loop over the other keys and try to validate the varied headers.

What I think at the moment is on the first time a user tried to load the page but the backend returned nothing with an HTTP 200 OK status code.

Btw I don't know why your server returns an HTTP/0.0 protocol version, I don't deal with it and don't try to override it. Can you post at least your caddy configuration?

darkweak avatar Apr 21 '23 20:04 darkweak

1- I'll try to reproduce the issue locally and let you know.

2- The output is clearly not from the backend, since there's no Server: Google Frontend or X-Cloud-Trace-Context.

3- Backend returns HTTP/1.1, Caddy returns HTTP/1.1, but stored key is HTTP/0.0 - This is a non-problem, I was just wondering.

frederichoule avatar Apr 21 '23 20:04 frederichoule

Here is the Caddyfile:

{
	order cache before handle
	email [email protected]

	admin off

	on_demand_tls {
		ask http://localhost:5555/__ask
		interval 2m
		burst 5
	}

	cache {
		cache_name Tarmac
		default_cache_control public, max-age=86400, must-revalidate
		allowed_http_verbs GET
		ttl 86400s
		redis {
			url 127.0.0.1:6379
		}
		key {
			hide
		}
	}

}

(cache_regular) {
	cache {
		key {
			disable_body
			disable_query
		}
	}
}

(cache_qs) {
	cache {
		key {
			disable_body
		}
	}
}

(upstream) {

	request_header -Cache-Control

	tls {
		on_demand
	}

	header {
		-X-Cloud-Trace-Context
		Server Tarmac
	}

	@notwww {
		not host localhost
		not header_regexp www Host ^www\.[a-zA-Z0-9-]+\.[a-zA-Z]{2,}$
	}

	redir @notwww https://www.{host}{uri} permanent

	@get {
		method GET
	}

	@hits {
		path /__hits
		query id=*
		method GET
	}

	@redis {
		path /__redis
		method POST
	}

	@query_caliente query caliente=1
	
	@query_search {
		path /search
		query q=*
	}

	@query_after query after=*

	handle @query_caliente {
		import cache_qs
	}

	handle @query_search {
		import cache_qs
	}

	handle @query_after {
		import cache_qs
	}

	handle @hits {
		reverse_proxy http://127.0.0.1:5555
	}

	handle @redis {
		reverse_proxy http://127.0.0.1:5555
	}

	handle @get {
		import cache_regular
	}

	handle {
		abort
	}

	reverse_proxy backend.com {
		header_down -Cache-Control
		header_up -Cache-Control
		header_up Host {upstream_hostport}
		header_up X-Backend-Hostname {host}		
	}

}

localhost {
	tls internal
	import upstream
}

https:// {
	import upstream
}

http:// {
	redir https://{host}{uri} permanent
}

frederichoule avatar Apr 21 '23 20:04 frederichoule

Unable to reproduce the problem locally for now. I am closing this for now.

frederichoule avatar Apr 24 '23 15:04 frederichoule

That could be something like cache poisoning (I wonder if @jenaye could help us about that). It could explain the

Server: Google Frontend
X-Cloud-Trace-Context: 63177d3330a11c9b0a44a650162499a1

darkweak avatar Apr 24 '23 16:04 darkweak

It's a brand new website, not really public yet. I think it might have to do with robot scanning the website, generating incorrect cached page. I'll investigate it more thoroughly soon.

frederichoule avatar Apr 24 '23 17:04 frederichoule

I was able to pinpoint when the blank page bug happens.

Whenever I have a blank page in cache, it comes from a request that contains this in the logs:

"error": "context canceled"

Any idea where it's coming from? I'll investigate after lunch, but if you have a hint for me, let me know!

frederichoule avatar May 17 '23 16:05 frederichoule

Nice, that means I don't handle correctly the client disconnections. It should be quite easy to reproduce if that's this case. Let's reopen this issue then 🙂

darkweak avatar May 17 '23 16:05 darkweak

Have you been able to reproduce the issue yet @darkweak ? I'll give it a try tomorrow if you didn't have time yet.

frederichoule avatar May 18 '23 18:05 frederichoule

Reproducible with the following code

// runner/requester.go
package main

import (
	"context"
	"fmt"
	"net/http"
	"time"
)

func main() {
	ctx, _ := context.WithTimeout(context.Background(), 10*time.Millisecond)
	rq, _ := http.NewRequestWithContext(ctx, http.MethodGet, "http://localhost:4443/handler", nil)
	res, err := http.DefaultClient.Do(rq)

	fmt.Printf("-----\nres => %+v\nerr => %+v\n-----\n", res, err)
}
// handlers/handler.go
package main

import (
	"fmt"
	"net/http"
	"time"
)

func main() {
	http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		fmt.Println("Sleep...")
		time.Sleep(3 * time.Second)
		fmt.Println("Wake up")
		w.Write([]byte("Hello after sleep"))
	})
	http.ListenAndServe(":80", nil)
}

With the given caddyfile

{
    order cache before rewrite
    debug
    log {
        level debug
    }
    cache {
      ttl 100s
    }
}

:4443

route /handler {
    cache {
        ttl 10s
    }
    reverse_proxy 127.0.0.1:80
}
./caddy run
go run handler/handler.go
go run runner/requester.go
curl -I http://localhost:4443/handler
# Should return the right headers without content

darkweak avatar May 19 '23 11:05 darkweak

To keep you updated, I'll work on that tomorrow.

darkweak avatar May 22 '23 19:05 darkweak

Amazing, thanks! Haven't had time to take a look yet, sorry about that.

frederichoule avatar May 22 '23 19:05 frederichoule

It should work using @frederichoule

xcaddy build --with github.com/darkweak/souin/plugins/caddy@47ea558146d978d6be0ec6a42804f679200d6d70 --with github.com/darkweak/souin@47ea558146d978d6be0ec6a42804f679200d6d70

darkweak avatar May 23 '23 20:05 darkweak

Amazing! Will try it ASAP.

frederichoule avatar May 23 '23 20:05 frederichoule

use the commit 2cffe5ac9c989e963b3e28c327e6b6c63eab371d instead of the previous one. I pushed a fix on the uncached responses.

darkweak avatar May 24 '23 07:05 darkweak

Testing right away.

frederichoule avatar May 25 '23 18:05 frederichoule

First page load works, second page load throws a panic and the server stops.

2023/05/25 19:42:38.124	ERROR	http.log.error	context canceled	{...}, "duration": 0.628182}
panic: Header called after Handler finished

goroutine 56 [running]:
net/http.(*http2responseWriter).Header(0x40003584c0?)
	net/http/h2_bundle.go:6569 +0x80
github.com/darkweak/souin/pkg/middleware.(*CustomWriter).Header(...)
	github.com/darkweak/[email protected]/pkg/middleware/writer.go:46
github.com/darkweak/souin/pkg/middleware.(*SouinBaseHandler).Upstream(0x4000990640, 0x40003584c0, 0x4000656c00, 0x40005948e0, 0x4000287c80?, {0x4000144660, 0x22})
	github.com/darkweak/[email protected]/pkg/middleware/middleware.go:274 +0x404
github.com/darkweak/souin/pkg/middleware.(*SouinBaseHandler).ServeHTTP.func2()
	github.com/darkweak/[email protected]/pkg/middleware/middleware.go:478 +0x40
created by github.com/darkweak/souin/pkg/middleware.(*SouinBaseHandler).ServeHTTP
	github.com/darkweak/[email protected]/pkg/middleware/middleware.go:477 +0x1324

frederichoule avatar May 25 '23 19:05 frederichoule

I hope that this commit fix your issue f9f8ab6da7e98e2e7898d10122766865e0b4884c

darkweak avatar May 25 '23 21:05 darkweak

Seems OK locally. I just pushed to prod for a small segment of our network and will let you know if we still get blank pages. Thanks!

frederichoule avatar May 25 '23 22:05 frederichoule

I still have blank pages. I'm unable to debug right now as my internet connection is pretty weak, but I'll give you more informations later.

frederichoule avatar May 28 '23 13:05 frederichoule

Reopened again 😅

darkweak avatar May 28 '23 14:05 darkweak

So I do have the same "context canceled" in the logs. I'm not sure if I can provide more informations than that.

{
  "level": "debug",
  "ts": 1685364382.2595735,
  "logger": "http.handlers.cache",
  "msg": "Incomming request &{Method:GET URL:/requested-uri Proto:HTTP/2.0 ProtoMajor:2 ProtoMinor:0 Header:map[...] Body:0xc0008b0e10 GetBody:<nil> ContentLength:0 TransferEncoding:[] Close:false Host:www.website.com Form:map[] PostForm:map[] MultipartForm:<nil> Trailer:map[] RemoteAddr:X.X.X.X:XXX RequestURI:/url TLS:0xc0000d4fd0 Cancel:<nil> Response:<nil> ctx:0xc0008b1020}"
}
{
  "level": "debug",
  "ts": 1685364382.8981557,
  "logger": "http.handlers.reverse_proxy",
  "msg": "upstream roundtrip",
  "upstream": "upstream.com:80",
  "duration": 0.637657811,
  "request": {
    "remote_ip": "1.1.1.1",
    "remote_port": "111",
    "proto": "HTTP/2.0",
    "method": "GET",
    "host": "upstream.com:80",
    "uri": "/requested-uri",
    "headers": {...},
    "tls": {
      "resumed": false,
      "version": 772,
      "cipher_suite": 4865,
      "proto": "h2",
      "server_name": "www.website.com"
    }
  },
  "error": "context canceled"
}

frederichoule avatar May 29 '23 12:05 frederichoule

Please wait before checking anything. I just redeployed everything with cache disabled on the docker build, I think I had some cached steps.

frederichoule avatar May 29 '23 13:05 frederichoule

Let me know if something weird happens.

darkweak avatar May 30 '23 09:05 darkweak

So far everything works perfectly. Let's close this! Thanks @darkweak

frederichoule avatar May 30 '23 15:05 frederichoule

Caddy crashed completely twice in the last 2 days. The last line in the log each time is related to context canceled. I don't know how to reproduce the crash locally yet. I'll find a way to make Caddy log the crash output somewhere.

{"level":"error","ts":1685643137.1032906,"logger":"http.log.error","msg":"context canceled","request":{},"tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","server_name":"www.website.com"}},"duration":4.829884199}
{"level":"debug","ts":1685643137.1034744,"logger":"http.handlers.reverse_proxy","msg":"upstream roundtrip","upstream":"upstream.com:80","duration":4.251552925,"request":{},"tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","server_name":"www.website.com"}},"error":"context canceled"}

frederichoule avatar Jun 01 '23 18:06 frederichoule

Do you have some debug logs please ? Because these are not really explicit. I will write the patch to add more debug/info logs during the request process in the next days.

darkweak avatar Jun 01 '23 19:06 darkweak

This is all I got from Caddy. I'll read the doc to see what i can do to get more.

frederichoule avatar Jun 01 '23 19:06 frederichoule