appengine
appengine copied to clipboard
delay: Function key generation is obscure and need to be documented
Calling delay.Func
uses the name of the file from whom this function was called to generate the func key that used in the task invocation. It means that deploying new version of service can break execution of tasks that were scheduled before, if the call of delay.Func
moved to the another file in code.
This behaviour is not documented and can cause a confusion.
Related lines of code: https://github.com/golang/appengine/blob/5bee14b453b4c71be47ec1781b0fa61c2ea182db/delay/delay.go#L105-L107
Yes, deploying a new version breaks all previously scheduled tasks. I'm getting a lot of logs like this each time I redeploy:
2018/10/25 07:21:03 ERROR: delay: no func with key "/tmp/staging873052621/srv/main.go:editMessage" found
Fortunately all my tasks are non-critical and is OK to be purged after a deployment, but I think this a serious issue if someone is relying on the task to be processed even after a deployment.
Can we either consider this a bug, or document this feature and provide a workaround?
Shoot. This will break compatibility between Go 1.9 and Go 1.11 on app engine, and between every deployment on Go 1.11 since we build from a temp directory without a set name. I'm looking into a fix.
Thanks for the report @manisenkov, and the ping @yegle
So I spent some time looking into this today. Our old runtime environment patches filesystem access, and special cases filenames for your app to obscure the actual location on-disk of the file. For example, that results in a file at /path/to/foo/bar/baz.go
having a reported filename of bar/baz.go
. We don't do that in our second generation runtimes because you have a normal filesystem to work with.
We also did super weird things with GOPATH
that I won't go into details about, but it results in a mismatch of app vs dependency filenames. For example, if you had a single-package app, the filename would be main.go
instead of example.com/myapp/main.go
, but your dependencies would have filenames of github.com/foo/bar.go
.
I don't think this is behavior we want to replicate in our new runtime.
What we can do, though, is ensure that the temp directory, in which we build your app, is consistent across deploys. That would still cause 1.9->1.11 to be a breaking change, but it would not break tasks across deployments of the same runtime.
@yegle WDYT?
@sbuss can you handle 1.9 => 1.11 as a special case and try to map old path to a new one?
And I wonder why just don’t replicate old behavior by stripping path prefix from task? The prefix is not neeeded for “task to function” mapping. Seems like an unnecessary breaking change to me. Is there reasons for this change that I miss?
When thinking on how to migrate an app that relies on delay functions for consistency it seems to be a hell difficult if you can’t pause delays firing.
@sbuss I'm fine with that solution (break 1.9 -> 1.11 once but not after switching to 1.11).
To be clear I think there are multiple issues here and they might need to be tracked separately:
- the original report by @manisenkov that moving a
delay.Func
to another file in the same package will break the scheduled tasks - My current 1.11 specific issue that each redeploy will break scheduled tasks regardless
- @sbuss you mentioned the one-time breakage switching from 1.9 -> 1.11
- And I think a subtle issue here: those scheduled tasks are lost forever without retry, and customer cannot mitigate/workaround the issue by simply switching back to the old version.
@yegle, @sbuss can you explain why old behavior is bad and you don't replicate it? Doing that would solve all issues all together without any breaking change and also will be future proof from infrastructure changes.
Citing @sbuss:
We also did super weird things with GOPATH that I won't go into details about, but it results in a mismatch of app vs dependency filenames. For example, if you had a single-package app, the filename would be main.go instead of example.com/myapp/main.go, but your dependencies would have filenames of github.com/foo/bar.go
So my understanding it's a good and valid behavior:
- App insert tasks with a file name relative from app root.
- On receiving task the full file name restored by concatenating app root + the path received from the task.
- If app root changed or some temp directory is used to build tasks are not broken/lost.
What's wrong with this approach?
@astec
can you handle 1.9 => 1.11 as a special case and try to map old path to a new one?
Any 1.9-1.11 special case would mean we would have to maintain that special case for all future versions as well, so there isn't a one-time special case we can do.
why just don’t replicate old behavior by stripping path prefix from task?
It's not as simple as stripping the path prefix. The old runtime environment had a virtual filesystem that faked filenames in a way that isn't straightforwardly compatible with the new environment. In addition, the filename which got reported depended on 1) your local filesystem set up (eg if you were developing on or off your GOPATH and 2) whether or not this call originated from package main
.
Replicating the old behavior would require some information we no longer have in the new runtime environment.
can you explain why old behavior is bad and you don't replicate it?
Hopefully the preceding paragraph answers this.
App insert tasks with a file name relative from app root.
Not quite. Some files are relative to app root, some are without a prefix entirely.
On receiving task the full file name restored by concatenating app root + the path received from the task.
Nope, due to how the old runtime handled translating local filesystem structure to what we used in the builder.
@sbuss thanks for reply. I understand it's tricky.
I'm continuing to classify exactly how the old runtime munges filenames. After exploring everything we do to modify paths, it turns out that it can all be summarized rather simply, despite the complexity in how we get there.
On the 1.9 runtime:
- For calls from
package main
: strip all leading path entries, leaving just the filename as the path - For calls from anywhere else, strip
$GOPATH/src
Ok, seems easy enough (I'm actually extremely surprised it all collapsed to two rules). So to replicate this behavior for 1.11...
A. For an app on GOPATH, not using go.mod
:
- Have
appengine.Main()
store the location it was called from, which indicates the path ofpackage main
- If file path is in the same dir as
package main
, strip all but the filename - Change the builder to have the same
GOPATH
as the runtime image - Always remove
$GOPATH/src
from the path
This isn't too bad, actually.
B. For an app using go.mod
:
- We don't build a GOPATH, so the app files are at /tmp/staging12345/srv/...; so after making the builder have a stable directory we can just always strip it.
- But handling dependencies declared via
go.mod
is more complex...
go.mod versioned dependency
A dependency declared via go.mod
will have a path on the builder that looks something like /go/pkg/mod/github.com/foo/[email protected]/baz.go
. What would you expect the filename to be in that case?
This makes it apparent that upgrading your dependency will also orphan your delay.Func
s, even with a stable build directory. It might be correct to just strip any component that looks like a go.mod version number (@v0.0.0-20181026220418-f595d03440dc
), and the leading /go/pkg/mod
path. Or it might not be.
replace
If a user uses a go.mod replace directive, the path to the same dependency above would be something like /tmp/staging12345/srv/replacements/foo/bar/baz.go
. There's a strong argument to make that anything included via replace
should get the same path as a non-replaced package; otherwise your tasks would get orphaned between deployments when you went from regular to replaced dependencies. Only... we don't have any way of knowing which dependency is from a replace
and which is from a require
at run-time, when all of this is computed. Well... maybe there's something in the build
package that can reveal that information, but I'm not sure.
What do to?
Since go.mod
will be the dominant pattern going forward, getting go module support right is the most important consideration for me. However, we are advising that all customers migrate to cloudtasks (docs), so we could declare that if you want to be on the bleeding edge and use go modules, you have to upgrade to the more modern tool. I'll have to consult with product management to decide if that's a reasonable trade off.
The correct initial choice would have been to let the user be fully in control of the key, and not add filepath information to it to try to be clever about key uniqueness, but here we are. Cloudtasks doesn't try to be clever; it's a much more sane API.
Fixed in v1.4.0
Fixed in v1.4.0
@sbuss I am using Go 1.11 with AppEngine Standard and this fix doesn't work for me. The reason is that fileKey
(called by delay.Func
) contains this check-
if !internal.IsSecondGen() || internal.MainPath == "" {
return file, nil
}
However, internal.MainPath
won't be set until appengine.Main
is called (which happens in my func main
after all initialization is done).
In other words, the order is-
- Global
var foo = delay.Func(...
are executed, which in turn callfileKey
which expectsinternal.MainPath
to be populated. -
func main()
is called which callsappengine.Main
which in turn populatesinternal.MainPath
Maybe the /srv/
check should happen even if internal.MainPath
is blank?
@shaan7 I had the same problem and made a PR to fix it: https://github.com/golang/appengine/pull/211
Yep, looks like we had a regression. Thanks for the fix, @bashtian
Fixed in v1.6.4
I just got an error like this today after a deploy.
Suddenly my registered functions weren't recognised anymore.
The delay.Func is declared in the subdirectory "game" in the file "game.go", and I got a bunch of errors looking like delay: no func with key "game/game.go:QUEUENAME" found
.
I tried debugging and messing around, and at one point I instead got errors like delay: no func with key "/workspace/game/game.go:QUEUENAME" found
.
Now my cloud tasks are silently (except for a warning in the logs - but no retries) dropping all my tasks, and some of them are pretty hard to retrigger, so this is a sad day for me.
What's up with this? How can I make my task queues stop dropping tasks?
I wish there was a way to explicitly provide the function key instead of letting unknown and sometimes breaking magic handle it.
Apologies for that. This error was caused by a regression in a new release of the builder image, which has been now been rolled back. Please redeploy to restore the previous behavior. If you experience any more issues, please do let us know.
I became so afraid of these kinds of regressions - what if (during the regression) the function key for new tasks is changed, so that I not only drop old tasks during the regression, but also create new tasks that will be dropped after it.
I strongly suggest that tasks never get dropped (without retry), so that I can at least debug them in the Cloud Task console, see their logs, and debug them/fix it so that they run in the future - dropping without retry is scary.
What I did now is fork the delay package completely, and running my own clone where I completely ignore package/filename and only key the functions on the queue name, which I make sure is unique per delay task.