Freeze on download from non-main DC
What version of gotd are you using?
github.com/gotd/td v0.99.1
Can this issue be reproduced with the latest version?
Yes
What did you do?
I was downloading stickers using some code from this example. I'm using github.com/celestix/gotgproto as wrapper for gotd/td:
func downloadStickerMessagesProcessor(ctx *ext.Context, update *ext.Update) error {
if update.EffectiveMessage.Media == nil {
return nil
}
var buf bytes.Buffer
_, err = ctx.DownloadMedia(update.EffectiveMessage.Media, ext.DownloadOutputStream{Writer: &buf}, nil)
if err != nil {
return errors.Wrap(err, "failed to download media")
}
// doing other things
return nil
DownloadMedia works almost the same as in td example:
downloader.NewDownloader().Download(api, loc).Stream(ctx, &buf)
If sticker is locating in the same DC as my main (2), the is no problem. If it's in other - code freezes with this logs:
2024-03-24T13:07:01.115+0300 DEBUG conn.mtproto mtproto/new_encrypted_msg.go:62 Request {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "type_id": "0xbe5335be", "type_name": "upload.getFile#be5335be", "msg_id": 7349869390778362016}
2024-03-24T13:07:01.185+0300 DEBUG conn.mtproto mtproto/handle_message.go:19 Handle message {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "type_id": "0xf35c6d01", "type_name": "rpc_result", "size_bytes": 36, "msg_id": 7349869391774926849}
2024-03-24T13:07:01.185+0300 DEBUG conn.mtproto mtproto/handle_result.go:24 Handle result {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "type_id": "0x2144ca19", "type_name": "rpc_error#2144ca19", "size_bytes": 24, "msg_id": 7349869390778362016}
2024-03-24T13:07:01.190+0300 DEBUG conn.mtproto mtproto/handle_result.go:53 Got error {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "msg_id": 7349869390778362016, "err_code": 303, "err_msg": "FILE_MIGRATE_4"}
2024-03-24T13:07:01.191+0300 DEBUG conn.mtproto.rpc rpc/engine.go:99 Handler called {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "msg_id": 7349869390778362016}
2024-03-24T13:07:01.192+0300 DEBUG conn.mtproto mtproto/rpc.go:44 Invoke end {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "msg_id": 7349869390778362016}
2024-03-24T13:07:01.192+0300 DEBUG conn manager/conn.go:113 Invoke {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "duration": "77.847208ms", "ongoing": 0}
2024-03-24T13:07:01.197+0300 DEBUG telegram/invoke.go:67 Invoking on target DC {"v": "v0.99.1", "error_type": "FILE_MIGRATE", "target_dc": 4}
2024-03-24T13:07:01.199+0300 DEBUG telegram/pool.go:61 Creating pool {"v": "v0.99.1", "dc_id": 4, "max": 1, "candidates": 5}
2024-03-24T13:07:09.344+0300 DEBUG conn.mtproto mtproto/new_encrypted_msg.go:62 Request {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "type_id": "0x62d6b459", "type_name": "msgs_ack#62d6b459", "msg_id": 7349869425367492384}
On sticker located in my DC I have this logs:
2024-03-24T13:17:25.449+0300 DEBUG conn.mtproto mtproto/handle_message.go:19 Handle message {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "type_id": "0x74ae4240", "type_name": "updates#74ae4240", "size_bytes": 796, "msg_id": 7349872072621388801}
2024-03-24T13:17:25.523+0300 DEBUG conn.mtproto mtproto/rpc.go:29 Invoke start {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "msg_id": 7349872071245901720}
2024-03-24T13:17:25.523+0300 DEBUG conn.mtproto.rpc rpc/engine.go:87 Do called {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "msg_id": 7349872071245901720}
2024-03-24T13:17:25.523+0300 DEBUG conn.mtproto.rpc rpc/ack.go:34 Waiting for acknowledge {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "ack_id": 7349872071245901720}
2024-03-24T13:17:25.524+0300 DEBUG conn.mtproto mtproto/new_encrypted_msg.go:62 Request {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "type_id": "0xbe5335be", "type_name": "upload.getFile#be5335be", "msg_id": 7349872071245901720}
2024-03-24T13:17:25.672+0300 DEBUG conn.mtproto mtproto/handle_message.go:19 Handle message {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "type_id": "0xf35c6d01", "type_name": "rpc_result", "size_bytes": 25124, "msg_id": 7349872073635940353}
2024-03-24T13:17:25.672+0300 DEBUG conn.mtproto mtproto/handle_result.go:24 Handle result {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "type_id": "0x96a18d5", "type_name": "upload.file#96a18d5", "size_bytes": 25112, "msg_id": 7349872071245901720}
2024-03-24T13:17:25.672+0300 DEBUG conn.mtproto.rpc rpc/engine.go:99 Handler called {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "msg_id": 7349872071245901720}
2024-03-24T13:17:25.675+0300 DEBUG conn.mtproto mtproto/rpc.go:47 Invoke end {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "msg_id": 7349872071245901720}
2024-03-24T13:17:25.675+0300 DEBUG conn manager/conn.go:113 Invoke {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "duration": "152.638917ms", "ongoing": 0}
What did you expect to see?
Downloading completes successfully with stickers from any DC.
What did you see instead?
Downloading completes successfully with stickers from only on my DC.
What Go version and environment are you using?
go version go1.22.0 darwin/arm64
go env Output
$ go envGO111MODULE='on' GOARCH='amd64' GOBIN='' GOCACHE='/Users/teadove/Library/Caches/go-build' GOENV='/Users/teadove/Library/Application Support/go/env' GOEXE='' GOEXPERIMENT='' GOFLAGS='' GOHOSTARCH='arm64' GOHOSTOS='darwin' GOINSECURE='' GOMODCACHE='/Users/teadove/go/pkg/mod' GONOPROXY='gitlab.whoosh-cloud.com' GONOSUMDB='gitlab.whoosh-cloud.com' GOOS='darwin' GOPATH='/Users/teadove/go' GOPRIVATE='gitlab.whoosh-cloud.com' GOPROXY='https://proxy.golang.org,direct' GOROOT='/Users/teadove/go/go1.22.0' GOSUMDB='sum.golang.org' GOTMPDIR='' GOTOOLCHAIN='auto' GOTOOLDIR='/Users/teadove/go/go1.22.0/pkg/tool/darwin_arm64' GOVCS='' GOVERSION='go1.22.0' GCCGO='gccgo' GOAMD64='v1' AR='ar' CC='clang' CXX='clang++' CGO_ENABLED='0' GOMOD='/Users/teadove/projects/fun-telegram/go.mod' GOWORK='' CGO_CFLAGS='-O2 -g' CGO_CPPFLAGS='' CGO_CXXFLAGS='-O2 -g' CGO_FFLAGS='-O2 -g' CGO_LDFLAGS='-O2 -g' PKG_CONFIG='pkg-config' GOGCCFLAGS='-fPIC -arch x86_64 -m64 -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/1l/sljqjzq579vfs5tgztmg_x_w0000gn/T/go-build1516207097=/tmp/go-build -gno-record-gcc-switches -fno-common'
I also freeze the client when I use gif search to send a trigger download, and send the same gif using a file to download normally
I'll try to reproduce this, but not sure how to do it easily. Most of my clients are on DC4, what is the simple way to trigger this>
I figured out this only happens when floodwait middleware is present.
For some reason, transfer call which transfer auth from one connection to another hangs when floodwait is present.
I figured out why it happens. Download -> Floodwait -(in its goroutine runs)-> Invoke -> Transfer -> Floodwait But floodwait's goroutine is stuck executing download, causing deadlock.
@cyoung06 you are right, i've disabled floodwaiter and there are no more problems with downloads. Hm, should I create issue for https://github.com/gotd/contrib/blob/master/middleware/floodwait?
P.S. I checked, there is no error with SimpleWaiter, only with just Waiter
So, @TeaDove do you found a solution to fix Waiter?
So, @TeaDove do you found a solution to fix Waiter?
Unfortunately no :(
I've just moved to SimpleWaiter
api.UploadGetFile() has nested tg.AuthExportAuthorizationRequest{} calling, before UploadGetFileRequest done, it will block scheduler run(see codes of waiter.Run()), so it must cause a dead block when DC has changed.
It's a bug of floodwaiter? or a mechanism? Single api calling with only one MTProto call is expected, but multi MTProto calls not. How to resolve this problem?
Here is my workaround, just fork waiter codes to myself repo, and patch it, do not let AuthExportAuthorizationRequest or UploadGetFileRequest (just only one request is working well) be sent to scheduler.
// Handle implements telegram.Middleware.
func (w *Waiter) Handle(next tg.Invoker) telegram.InvokeFunc {
return func(ctx context.Context, input bin.Encoder, output bin.Decoder) error {
if !w.running.Load() {
// Return explicit error if waiter is not running.
return errors.New("the Waiter middleware is not running: Run(ctx) method is not called or exited")
}
switch input.(type) {
case *tg.UploadGetFileRequest, *tg.AuthExportAuthorizationRequest:
return next.Invoke(ctx, input, output)
default:
}
select {
case err := <-w.sch.new(ctx, input, output, next):
return err
case <-ctx.Done():
return ctx.Err()
}
}
}
In fact, floodwait is necessary for long time running and robustness programs, so just disable floodwait for UploadGetFileRequest is a little ugly solution for me now, any more ideas?
I found another way to solve this problem temporary, just create 2 goroutines to work with scheduler.
func (w *Waiter) Run(ctx context.Context, f func(ctx context.Context) error) (err error) {
w.running.Store(true)
defer w.running.Store(false)
ctx, cancel := context.WithCancel(ctx)
wg, ctx := errgroup.WithContext(ctx)
wg.Go(func() error {
defer cancel()
return f(ctx)
})
ticker := w.clock.Ticker(w.tick)
defer ticker.Stop()
for range 2 {
wg.Go(func() error {
var requests []scheduled
for {
select {
case <-ticker.C():
requests = w.sch.gather(requests[:0])
if len(requests) < 1 {
continue
}
for _, s := range requests {
ret, err := w.send(s)
if ret {
select {
case s.request.result <- err:
default:
}
}
}
case <-ctx.Done():
return nil
}
}
})
}
return wg.Wait()
}