echo
echo copied to clipboard
Race on echo.maxParams, unable to use some context methods safely concurrently
Issue Description
Hi there!
I've been using echo for a relatively large api service, upgrading from 3.x to the latest 4.x recently and fixing some issues along the way.
While the initial issue I ran into may be related to #1678 (some groups or routes randomly "forget" their handler and start returning 404's) I discovered some other issues in echo.
First I'd like to thank the maintainers for the project, I don't mind discussing this issue or attempting to contribute a fix.
Anyways, I noticed here a significant race condition introduced via #1535 .
echo.maxParam cannot be changed safely, even with atomics (which would fix the data race), as the algorithm itself is racy.
Overwriting that value via the context I realized was causing panics here for me (not sure why that isn't just using the len of pvalues to begin with).
Further, manipulation of the context/echo types seems to lead to issues with the Find algorithm in the Router which is already hard to follow, and may be leading to my initial issue above.
Perhaps unrelated, is there a particular usage for the context type and methods I'm missing? It's defined as an interface but cannot be (re)implemented due to unsafe assertions such as this. I was hoping to override/shadow some methods for debugging purposes but cannot.
Thanks again, as for the initial issue I mentioned above regarding 404's I'd like to discuss/tackle after this one.
Checklist
- [ x] Dependencies installed
- Latest Go
- [ x] No typos
- [x ] Searched existing issues and docs
Expected behaviour
Safely use context methods
Actual behaviour
Panic from incorrect len / race.
Steps to reproduce
Noted above.
Version/commit
Latest echo release, v4.1.17. Note I haven't tried with master, I noticed ~50 commits to master since the last release, I briefly looked at them but didn't see anything that would fix this.
Hi @iaburton. Yes you are right, the use of maxParam is not thread safe, and changing it from a context could lead to have Context instances on the sync.Pool that don't have the proper length for pvalues (panics when maxParam is changed to a bigger value, the other way around it could lead to a memory leak) I need to continue checking the Router code to see which is the best way to tackle this, but as you mention doesn't make any sense to use maxParam instead of len(c.pvalues) on Context#Reset
I'll continue trying to dig into the Git history to find out what is the intended purpose of Context#SetParamNames, because at a first glance it looks like as a "testing helper". If that is true, that could be the reason of why Echo's Context/Router assumes that maxParams shouldn't be protected (in production maxParam always is set by the Router and don't change). Please don't interpret this as me saying that maxParams shouldn't be protected, I'm just trying to discover what is the design behind the code, to then try to propose the best solution (not breaking change/keep performance)
Hey @pafuent , nice to get some confirmation on what I was observing.
I noticed after posting that the original issue was related to #1492 that effected several people. I have a workaround right now that might be of help in terms of a partial solution. Note this workaround below assumes you've pinned a certain echo version where the context is described below, and compiling for a 64bit system.
//inside custom setParamNamesUnsafe function
eptr := unsafe.Pointer(reflect.ValueOf(ectx).Pointer())
/*
type context struct {
request *http.Request // 64bit pointer
response *Response // 64bit pointer
path string // size of string header
pnames []string // value we want to change
pvalues []string
query url.Values
handler HandlerFunc
store Map
echo *Echo
logger Logger
lock sync.RWMutex
}
Use unsafe pointer arithmetic to access and change pnames.
stringSize := reflect.TypeOf(reflect.StringHeader{}).Size()
*/
pnames := (*[]string)(unsafe.Pointer(uintptr(eptr) + 8 + 8 + stringSize))
copy(*pnames, vals)
Basically I use unsafe to set pnames myself using copy rather than a directly set (like currently) or append like how it was originally. This makes sure the values always stay within len and avoid the race since I'm not using the Context method.
Perhaps removing the *c.echo.maxParam from SetParamNames would be good enough if we also change the code to use copy? I haven't been running this code long enough to say if it fixes everything, but it's a start. It has stopped the extra panics.
I'm still not sure what is causing the 404's I mentioned prior though, I'm still doing more testing as I cannot easily replicate the issue (leading me to believe it is a concurrency issue). It's like certain groups or routes lose their ability to parse out params (or rather params look like part of a path so it becomes 404), which is what led me to this code in the first place, a hot fix for echo that lets me set the name and params on a context before going into the final handler.
Btw putting a mutex or using atomics on maxParam will solve the data race but not the inherit race condition with this design. Let me know if I can help.
We need to look at the original concept behind maxParam. I'd like to avoid introducing unsafe pointer arithmetics for working around a conceptional issue.
We need to look at the original concept behind maxParam. I'd like to avoid introducing unsafe pointer arithmetics for working around a conceptional issue.
Hi @lammel, my apologizes for the ambiguity. I didn't mean to imply that unsafe should be added to echo, on the contrary that was meant as a workaround (for others if their code is also panicing) until something better was found. The echo version of this unsafe code is simply:
func (c *context) SetParamNames(names ...string) {
copy(c.pnames, names)
}
Your restating the need to look at maxParam is correct, as the above is not enough on its own.
@iaburton No offense taken, just wanted to clarify we are not finished yet ;-)
Bit of an interesting development; my 404 issue has gone away this this, but instead of 404s now I get invalid contexts essentially. My unsafe code is honoring the original len of pnames, but some other code seems to be changing that len.
I'm guessing it's Reset again, like before where the panic was occuring
func (c *context) Reset(r *http.Request, w http.ResponseWriter) {
c.request = r
c.response.reset(w)
c.query = nil
c.handler = NotFoundHandler
c.store = nil
c.path = ""
c.pnames = nil
c.logger = nil
// NOTE: Don't reset because it has to have length c.echo.maxParam at all times
for i := 0; i < *c.echo.maxParam; i++ {
c.pvalues[i] = ""
}
}
Once pnames becomes nil, my code with the copy doesn't actually copy anymore so I get requests without params in the context. 🤔
EDIT: Since the issue is nondeterministic let me get back to this after trying another patch.
EDIT 2: Added:
if len(*pnames) == 0 {
*pnames = make([]string, len(vals))
}
To my unsafe code, and that seems to have stopped the params from "disappearing", but its ultimately the same issue, bad contexts getting into the sync.Pool and messing up internal state in the router.Find and other related code.
We'll use this issue for a general improvement for the current quite messy handling with echo.maxParam. So tagged v5 now.
Maybe if we decide to return errors from our router (see https://github.com/labstack/echo/pull/1762#issuecomment-770794367), we can implement the solution already proposed by @lammel. Basically if we set a fixed value for maxParam, we can get rid of all this kind of issues. If a new route is added that needs to increment this value, we will be able to return an error. To let Echo users fix this error, maxParam could be transformed in a configuration value for Echo (something like Echo#NewWithConfig() with maxParam as a parameter or as a member of a config struct). This could help us to remove the dependency of our Router to know the Echo instance. Also Context#SetParamNames() should be modified to return an error if maxParam needs to be increased to handle all the new params names. Another side effect of this change, is remove the need to reference maxParam in Echo Context, because we can safely rely on the length of pnames (this is set when the Context is created and it won't change later on)
Any thoughts?
Basically yes. As we have benchmarking in place can check for negative impact on allocations or processing time. We can use a sane default (like 16 or 32 params) for the router and allow to change this value when initializing the router (or echo), so all further processing can rely on the params array to be available.
You are right, that the router can be decoupled from the echo instance, it just needs to know the maximum number of parameters that will be processed.
We just ran into this, and the panic in context.Reset(...) can be hit without it being a race condition at all - it's just a fundamental mismatch of context (or at least context.Reset(...)) expecting c.pvalues to always equal Echo.maxParam, but then allowing Echo.maxParam to change (i.e. when registering a new endpoint with more params), and at the same time trying to reuse pooled contexts.
In other words, if you execute a request and a context is created and pooled, then that c.pvalues length is whatever the Echo.maxParam is at the time of that request. If you then come later and register a new endpoint with the echo server that has more params than the previous Echo.maxParam, then the next time that previous context is pulled from the pool for reuse you're guaranteed to hit the index-out-of-range panic in context.Reset(...).
Here's a quick unit test to explain and reproduce the problem:
func Test_show_context_pvalues_vs_echo_maxParam_panic_during_pooled_context_reset(t *testing.T) {
// Start a fresh echo server.
e := echo.New()
port := 9999
go func() {
_ = e.Start(fmt.Sprintf("localhost:%d", port))
}()
time.Sleep(100)
// Register a no-params endpoint - at this point e.maxParam is 0.
noParamsPath := "/foo"
e.GET(noParamsPath, func(c echo.Context) error {
return c.String(200, "doesnotmatter")
})
// Execute a request. This will create a context that gets put into the e.pool pool of contexts,
// and that pooled context will be created with len(c.pvalues) of 0 to match the
// _current_ e.maxParam value.
_, _ = http.Get(fmt.Sprintf("http://localhost:%d%s", port, noParamsPath))
// Now register an endpoint that does have a param. This will cause e.maxParam to change to 1.
withPathParamBase := "/bar"
withParamPathTmplt := fmt.Sprintf("%s/:stuff", withPathParamBase)
e.GET(withParamPathTmplt, func(c echo.Context) error {
return c.String(200, "doesnotmatter")
})
// Execute some more requests. Eventually this will pull the pooled context with len(c.pvalues) of 0,
// but now e.maxParam is 1, leading to a panic in context.Reset() due to the for loop logic in
// context.Reset(...) expecting len(c.pvalues) >= e.maxParam.
urlWithParams := fmt.Sprintf("http://localhost:%d%s%s", port, withPathParamBase, "/stuff-param")
for i := 0; i < 1000; i++ {
_, err := http.Get(urlWithParams)
if err != nil {
t.Errorf("call failed due to %v", err)
}
}
}
If there's a design decision that you should never add endpoints after any requests have been processed (in order to guarantee that all pooled context.pvalues equal Echo.maxParam), then the endpoint registration methods should panic or return an error or otherwise prevent that registration once any context has been pooled.
But assuming you do want people to be able to add to the Echo server's endpoints after it's started processing requests, then IMO you have a few options:
- Dump the
Echo.poolcontext pool wheneverEcho.maxParamchanges. - Bulletproof
context.Reset(...)against the reality thatEcho.maxParammight have changed sincecontext.pvalueswas created.
Assuming that (2) would be preferred for performance reasons, then this would be a fix that would improve things a little:
func (c *context) Reset(r *http.Request, w http.ResponseWriter) {
c.request = r
... etc
// c.echo.maxParam might have changed since this context was used last. Make sure our len(c.pvalues)
// is big enough to handle c.echo.maxParams, otherwise we end up with a panic.
if len(c.pvalues) < *c.echo.maxParam {
c.pvalues = make([]string, *c.echo.maxParam)
}
// NOTE: Don't reset because it has to have length c.echo.maxParam at all times
for i := 0; i < *c.echo.maxParam; i++ {
c.pvalues[i] = ""
}
}
There would still be a potential for a race condition if Echo.maxParam changed while the context.Reset(...) was occurring. So a more robust and complete solution for v5 would be best. But this would at least improve things for v4 if you wanted to roll in a small fix.
At this point - routes (adding a route, note: there is no way to remove a route) or any of the echo fields is not meant to be mutated after server has started. This is not limited to c.Reset. Every single field on Echo instance is public and therefore potentially datarace place. And echo.Context instance must not be passed to any other coroutine out of original handler coroutine - whis will be another datarace place as context instance will be put back into pool and leaked reference will mess up next request response etc.
At this point - routes (adding a route, note: there is no way to remove a route) or any of the echo fields is not meant to be mutated after server has started.
Is this an official and intentional design decision (and if so is it documented somewhere)? Or just an implicit assumption in the codebase?
If it's official and intentional, then IMO any of the mutation methods should fail-fast if they're called after the server has started.
If the intention is that you should be able to mutate after the server is started, then it looks like a significant overhaul is needed to clean up all the race condition potential (not just this one in c.Reset(...)).
I really can not say what intentions @vishr had but echo.Echo fields are public since v3. Probably assumptions is that getters are annoying when 99.9% cases users do not change things after server has started - and this is decision, whether we agree or not, that can not be changed within v4. This is not even limited to public fields - you can call e.Use and have datarace with middlewares as their slice is accessed in http.Server.ServeHTTP() coroutine when handling clients request.
Basically - trade-off is that for ease of use you must not mutate anything related to running Echo and Router instance.
This is not something that is specific to Echo. As Echo and Gin are quite similar both have similar limitation (having public fields which are accessed or accessible from context inside handler coroutines) - they are not meant to be mutated after server has been started.
Personally - router not being safe after server has started has been bothering me from design perspective but addressing everything related to potentially dataracey parts is questionable, as there are not much demand for it (adding routes etc after startup) and this would change the API quite drastically (no public fields anymore) - which would rub people, who do not need this feature, wrong way and they seem to be vast majority.
There is trick to avoid it: We can register an empty route with many param to set an enough maxParam
e.GET("/api/v1/:id/:idx/:idy/:idz/:idxx", func(c echo.Context) error {
return nil
})
Hello all :wave: long time
I got the notification that this had been closed @aldas , but don't see a commit that references a fix. Was this addressed in the upcoming v5?
This will not be changed in v4. Router is not goroutine safe and is not designed to be safe. v5 addresses that part.
For v4 - If there is a middleware that needs to modify param names/values. You can manipulate maxParam to be high enough before server start thus avoiding changing that field when serving requests.
See
e := echo.New()
// workaround to get echo.maxParam value to high enough before server starts to routing requests
{
tmp := e.NewContext(nil, nil)
tmp.SetParamNames(make([]string, 256)...) // echo.maxParam will be now 256
e.ReleaseContext(tmp)
}
e.Use(middleware.Logger())
but there is nothing for guarding you from races if you start adding/registering routes after HTTP server has been started.
Ah okay, so the suggestion is to use the workaround until v5. Thanks for the info!
Edit: Though I do wonder, for those who don't know about this, if echo.New should simply set the size of maxParams before returning, thus not requiring the workaround (or people racing without knowing by calling some of those middleware/context methods.
@iaburton could you first describe the use-case where it is necessary mutate params count? It sound like some kind of dynamic routing like thing. I do not understand why it is necessary.
For example dynamically changing routes is pretty much dead end at the moment because router does not support removing routes at all.
The original issue was simply about using the context method SetParamNames at all specifically the race that could be caused here and here. There wasn't any changing routes (on the echo instance or otherwise) or mutating param count (beyond calling those methods which were unknown to be unsafe at the time).
But is there actual use case when you want to SetParam* in when serving the request? Every field that echo.Echo struct contains is unsafe when you start to mutate and read them in multiple goroutines.
SetParamNames is not a problem unless something decides that param names need to be replaced with bigger amount than actual route that Router matched has (maybe * match?). For that case we can trick Echo prematurely increase that number to some safe value so when http server started goroutines are serving the request access that we will not have to increase that value.
And if routes needs to be added after server has started. something like that could be created to provide locking for router
type safeServe struct {
mu *sync.RWMutex
e *echo.Echo
}
func newSafeServe(e *echo.Echo) *safeServe {
// workaround to get echo.maxParam value to high enough before server starts to routing requests
{
tmp := e.NewContext(nil, nil)
tmp.SetParamNames(make([]string, 256)...) // echo.maxParam will be now 256
e.ReleaseContext(tmp)
}
return &safeServe{
mu: new(sync.RWMutex),
e: e,
}
}
func (s *safeServe) ServeHTTP(w http.ResponseWriter, r *http.Request) {
c := s.e.NewContext(r, w)
defer s.e.ReleaseContext(c)
s.find(r, c)
if err := c.Handler()(c); err != nil {
s.e.HTTPErrorHandler(err, c)
}
}
func (s *safeServe) find(r *http.Request, c echo.Context) {
s.mu.RLock()
defer s.mu.RUnlock()
s.e.Router().Find(r.Method, echo.GetPath(r), c)
}
func (s *safeServe) Add(method, path string, handler echo.HandlerFunc, middleware ...echo.MiddlewareFunc) {
for i := len(middleware) - 1; i >= 0; i-- {
handler = middleware[i](handler)
}
s.mu.Lock()
defer s.mu.Unlock()
s.e.Router().Add(method, path, handler)
}
func main() {
e := echo.New()
safe := newSafeServe(e)
//e.Use(middleware.Logger()) // DO NOT USE e.Use
handler := func(c echo.Context) error {
return c.String(http.StatusOK, c.Param("param"))
}
safe.Add(http.MethodGet, "/:param", handler, middleware.Logger()) // add middlewares to route here
s := http.Server{
Addr: ":8080",
Handler: safe,
}
if err := s.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
log.Fatal(err)
}
}
but this looks silly. + you can not still remove these added routes which is probably something that you want if dynamic routing is something that is required.
coming back to maxParam and Context we probably can remove references to maxParam there and allow it grow if needed by SetParamValues and not to shrink. And reset it to current capacity and we are good to go.
assuming it can only grow we do not need to access echo.Echo.maxParam field.
#2611 addresses this issue. middlewares using SetParamValues are now in better place.
Hey @aldas thanks for the replies
I get what you're asking now. So originally (a few years ago now so the details are a bit fuzzy) I was updating an echo service from 3 to 4 and ran into an issue regarding routes disappearing, as mentioned in the OP
upgrading from 3.x to the latest 4.x recently and fixing some issues along the way. While the initial issue I ran into may be related to https://github.com/labstack/echo/issues/1678 (some groups or routes randomly "forget" their handler and start returning 404's) I discovered some other issues in echo.
I think I ended up trying the SetParamNames method as a workaround, but then that caused problems of its own. Originally my intent was never to change routes or params after starting the server, it just happened without knowing how some of these methods were implemented. Given SetParamNames was on the context type it was surprising to find it was modifying the base echo instance.
Anyways, I see your #2611 addresses it nicely though, fixing the race caused by the original MR that added it.