appengine icon indicating copy to clipboard operation
appengine copied to clipboard

delay: Function key generation is obscure and need to be documented

Open manisenkov opened this issue 6 years ago • 17 comments

Calling delay.Func uses the name of the file from whom this function was called to generate the func key that used in the task invocation. It means that deploying new version of service can break execution of tasks that were scheduled before, if the call of delay.Func moved to the another file in code. This behaviour is not documented and can cause a confusion.

Related lines of code: https://github.com/golang/appengine/blob/5bee14b453b4c71be47ec1781b0fa61c2ea182db/delay/delay.go#L105-L107

manisenkov avatar Feb 21 '18 16:02 manisenkov

Yes, deploying a new version breaks all previously scheduled tasks. I'm getting a lot of logs like this each time I redeploy:

2018/10/25 07:21:03 ERROR: delay: no func with key "/tmp/staging873052621/srv/main.go:editMessage" found

Fortunately all my tasks are non-critical and is OK to be purged after a deployment, but I think this a serious issue if someone is relying on the task to be processed even after a deployment.

Can we either consider this a bug, or document this feature and provide a workaround?

yegle avatar Oct 25 '18 07:10 yegle

Shoot. This will break compatibility between Go 1.9 and Go 1.11 on app engine, and between every deployment on Go 1.11 since we build from a temp directory without a set name. I'm looking into a fix.

Thanks for the report @manisenkov, and the ping @yegle

sbuss avatar Oct 25 '18 18:10 sbuss

So I spent some time looking into this today. Our old runtime environment patches filesystem access, and special cases filenames for your app to obscure the actual location on-disk of the file. For example, that results in a file at /path/to/foo/bar/baz.go having a reported filename of bar/baz.go. We don't do that in our second generation runtimes because you have a normal filesystem to work with.

We also did super weird things with GOPATH that I won't go into details about, but it results in a mismatch of app vs dependency filenames. For example, if you had a single-package app, the filename would be main.go instead of example.com/myapp/main.go, but your dependencies would have filenames of github.com/foo/bar.go.

I don't think this is behavior we want to replicate in our new runtime.

What we can do, though, is ensure that the temp directory, in which we build your app, is consistent across deploys. That would still cause 1.9->1.11 to be a breaking change, but it would not break tasks across deployments of the same runtime.

@yegle WDYT?

sbuss avatar Oct 25 '18 23:10 sbuss

@sbuss can you handle 1.9 => 1.11 as a special case and try to map old path to a new one?

And I wonder why just don’t replicate old behavior by stripping path prefix from task? The prefix is not neeeded for “task to function” mapping. Seems like an unnecessary breaking change to me. Is there reasons for this change that I miss?

When thinking on how to migrate an app that relies on delay functions for consistency it seems to be a hell difficult if you can’t pause delays firing.

trakhimenok avatar Oct 26 '18 08:10 trakhimenok

@sbuss I'm fine with that solution (break 1.9 -> 1.11 once but not after switching to 1.11).

To be clear I think there are multiple issues here and they might need to be tracked separately:

  1. the original report by @manisenkov that moving a delay.Func to another file in the same package will break the scheduled tasks
  2. My current 1.11 specific issue that each redeploy will break scheduled tasks regardless
  3. @sbuss you mentioned the one-time breakage switching from 1.9 -> 1.11
  4. And I think a subtle issue here: those scheduled tasks are lost forever without retry, and customer cannot mitigate/workaround the issue by simply switching back to the old version.

yegle avatar Oct 26 '18 17:10 yegle

@yegle, @sbuss can you explain why old behavior is bad and you don't replicate it? Doing that would solve all issues all together without any breaking change and also will be future proof from infrastructure changes.

Citing @sbuss:

We also did super weird things with GOPATH that I won't go into details about, but it results in a mismatch of app vs dependency filenames. For example, if you had a single-package app, the filename would be main.go instead of example.com/myapp/main.go, but your dependencies would have filenames of github.com/foo/bar.go

So my understanding it's a good and valid behavior:

  • App insert tasks with a file name relative from app root.
  • On receiving task the full file name restored by concatenating app root + the path received from the task.
  • If app root changed or some temp directory is used to build tasks are not broken/lost.

What's wrong with this approach?

trakhimenok avatar Oct 26 '18 20:10 trakhimenok

@astec

can you handle 1.9 => 1.11 as a special case and try to map old path to a new one?

Any 1.9-1.11 special case would mean we would have to maintain that special case for all future versions as well, so there isn't a one-time special case we can do.

why just don’t replicate old behavior by stripping path prefix from task?

It's not as simple as stripping the path prefix. The old runtime environment had a virtual filesystem that faked filenames in a way that isn't straightforwardly compatible with the new environment. In addition, the filename which got reported depended on 1) your local filesystem set up (eg if you were developing on or off your GOPATH and 2) whether or not this call originated from package main.

Replicating the old behavior would require some information we no longer have in the new runtime environment.

can you explain why old behavior is bad and you don't replicate it?

Hopefully the preceding paragraph answers this.

App insert tasks with a file name relative from app root.

Not quite. Some files are relative to app root, some are without a prefix entirely.

On receiving task the full file name restored by concatenating app root + the path received from the task.

Nope, due to how the old runtime handled translating local filesystem structure to what we used in the builder.

sbuss avatar Oct 26 '18 21:10 sbuss

@sbuss thanks for reply. I understand it's tricky.

trakhimenok avatar Oct 26 '18 21:10 trakhimenok

I'm continuing to classify exactly how the old runtime munges filenames. After exploring everything we do to modify paths, it turns out that it can all be summarized rather simply, despite the complexity in how we get there.

On the 1.9 runtime:

  1. For calls from package main: strip all leading path entries, leaving just the filename as the path
  2. For calls from anywhere else, strip $GOPATH/src

Ok, seems easy enough (I'm actually extremely surprised it all collapsed to two rules). So to replicate this behavior for 1.11...

A. For an app on GOPATH, not using go.mod:

  1. Have appengine.Main() store the location it was called from, which indicates the path of package main
  2. If file path is in the same dir as package main, strip all but the filename
  3. Change the builder to have the same GOPATH as the runtime image
  4. Always remove $GOPATH/src from the path

This isn't too bad, actually.

B. For an app using go.mod:

  1. We don't build a GOPATH, so the app files are at /tmp/staging12345/srv/...; so after making the builder have a stable directory we can just always strip it.
  2. But handling dependencies declared via go.mod is more complex...

go.mod versioned dependency

A dependency declared via go.mod will have a path on the builder that looks something like /go/pkg/mod/github.com/foo/[email protected]/baz.go. What would you expect the filename to be in that case?

This makes it apparent that upgrading your dependency will also orphan your delay.Funcs, even with a stable build directory. It might be correct to just strip any component that looks like a go.mod version number (@v0.0.0-20181026220418-f595d03440dc), and the leading /go/pkg/mod path. Or it might not be.

replace

If a user uses a go.mod replace directive, the path to the same dependency above would be something like /tmp/staging12345/srv/replacements/foo/bar/baz.go. There's a strong argument to make that anything included via replace should get the same path as a non-replaced package; otherwise your tasks would get orphaned between deployments when you went from regular to replaced dependencies. Only... we don't have any way of knowing which dependency is from a replace and which is from a require at run-time, when all of this is computed. Well... maybe there's something in the build package that can reveal that information, but I'm not sure.

What do to?

Since go.mod will be the dominant pattern going forward, getting go module support right is the most important consideration for me. However, we are advising that all customers migrate to cloudtasks (docs), so we could declare that if you want to be on the bleeding edge and use go modules, you have to upgrade to the more modern tool. I'll have to consult with product management to decide if that's a reasonable trade off.

The correct initial choice would have been to let the user be fully in control of the key, and not add filepath information to it to try to be clever about key uniqueness, but here we are. Cloudtasks doesn't try to be clever; it's a much more sane API.

sbuss avatar Oct 26 '18 22:10 sbuss

Fixed in v1.4.0

sbuss avatar Dec 20 '18 17:12 sbuss

Fixed in v1.4.0

@sbuss I am using Go 1.11 with AppEngine Standard and this fix doesn't work for me. The reason is that fileKey (called by delay.Func) contains this check-

	if !internal.IsSecondGen() || internal.MainPath == "" {
		return file, nil
	}

However, internal.MainPath won't be set until appengine.Main is called (which happens in my func main after all initialization is done). In other words, the order is-

  1. Global var foo = delay.Func(... are executed, which in turn call fileKey which expects internal.MainPath to be populated.
  2. func main() is called which calls appengine.Main which in turn populates internal.MainPath

Maybe the /srv/ check should happen even if internal.MainPath is blank?

shaan7 avatar Sep 26 '19 09:09 shaan7

@shaan7 I had the same problem and made a PR to fix it: https://github.com/golang/appengine/pull/211

bashtian avatar Sep 26 '19 11:09 bashtian

Yep, looks like we had a regression. Thanks for the fix, @bashtian

sbuss avatar Sep 26 '19 21:09 sbuss

Fixed in v1.6.4

sbuss avatar Sep 26 '19 22:09 sbuss

I just got an error like this today after a deploy.

Suddenly my registered functions weren't recognised anymore.

The delay.Func is declared in the subdirectory "game" in the file "game.go", and I got a bunch of errors looking like delay: no func with key "game/game.go:QUEUENAME" found.

I tried debugging and messing around, and at one point I instead got errors like delay: no func with key "/workspace/game/game.go:QUEUENAME" found.

Now my cloud tasks are silently (except for a warning in the logs - but no retries) dropping all my tasks, and some of them are pretty hard to retrigger, so this is a sad day for me.

What's up with this? How can I make my task queues stop dropping tasks?

I wish there was a way to explicitly provide the function key instead of letting unknown and sometimes breaking magic handle it.

zond avatar Aug 18 '20 13:08 zond

Apologies for that. This error was caused by a regression in a new release of the builder image, which has been now been rolled back. Please redeploy to restore the previous behavior. If you experience any more issues, please do let us know.

lukasberger avatar Aug 21 '20 00:08 lukasberger

I became so afraid of these kinds of regressions - what if (during the regression) the function key for new tasks is changed, so that I not only drop old tasks during the regression, but also create new tasks that will be dropped after it.

I strongly suggest that tasks never get dropped (without retry), so that I can at least debug them in the Cloud Task console, see their logs, and debug them/fix it so that they run in the future - dropping without retry is scary.

What I did now is fork the delay package completely, and running my own clone where I completely ignore package/filename and only key the functions on the queue name, which I make sure is unique per delay task.

zond avatar Aug 21 '20 12:08 zond