td icon indicating copy to clipboard operation
td copied to clipboard

Freeze on download from non-main DC

Open TeaDove opened this issue 2 years ago • 5 comments

What version of gotd are you using?

github.com/gotd/td v0.99.1

Can this issue be reproduced with the latest version?

Yes

What did you do?

I was downloading stickers using some code from this example. I'm using github.com/celestix/gotgproto as wrapper for gotd/td:

func downloadStickerMessagesProcessor(ctx *ext.Context, update *ext.Update) error {
	if update.EffectiveMessage.Media == nil {
		return nil
	}

	var buf bytes.Buffer
	_, err = ctx.DownloadMedia(update.EffectiveMessage.Media, ext.DownloadOutputStream{Writer: &buf}, nil)
	if err != nil {
		return errors.Wrap(err, "failed to download media")
	}

        // doing other things
        return nil

DownloadMedia works almost the same as in td example: downloader.NewDownloader().Download(api, loc).Stream(ctx, &buf)

If sticker is locating in the same DC as my main (2), the is no problem. If it's in other - code freezes with this logs:

2024-03-24T13:07:01.115+0300    DEBUG   conn.mtproto    mtproto/new_encrypted_msg.go:62 Request {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "type_id": "0xbe5335be", "type_name": "upload.getFile#be5335be", "msg_id": 7349869390778362016}
2024-03-24T13:07:01.185+0300    DEBUG   conn.mtproto    mtproto/handle_message.go:19    Handle message  {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "type_id": "0xf35c6d01", "type_name": "rpc_result", "size_bytes": 36, "msg_id": 7349869391774926849}
2024-03-24T13:07:01.185+0300    DEBUG   conn.mtproto    mtproto/handle_result.go:24     Handle result   {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "type_id": "0x2144ca19", "type_name": "rpc_error#2144ca19", "size_bytes": 24, "msg_id": 7349869390778362016}
2024-03-24T13:07:01.190+0300    DEBUG   conn.mtproto    mtproto/handle_result.go:53     Got error       {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "msg_id": 7349869390778362016, "err_code": 303, "err_msg": "FILE_MIGRATE_4"}
2024-03-24T13:07:01.191+0300    DEBUG   conn.mtproto.rpc        rpc/engine.go:99        Handler called  {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "msg_id": 7349869390778362016}
2024-03-24T13:07:01.192+0300    DEBUG   conn.mtproto    mtproto/rpc.go:44       Invoke end      {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "msg_id": 7349869390778362016}
2024-03-24T13:07:01.192+0300    DEBUG   conn    manager/conn.go:113     Invoke  {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "duration": "77.847208ms", "ongoing": 0}
2024-03-24T13:07:01.197+0300    DEBUG   telegram/invoke.go:67   Invoking on target DC   {"v": "v0.99.1", "error_type": "FILE_MIGRATE", "target_dc": 4}
2024-03-24T13:07:01.199+0300    DEBUG   telegram/pool.go:61     Creating pool   {"v": "v0.99.1", "dc_id": 4, "max": 1, "candidates": 5}
2024-03-24T13:07:09.344+0300    DEBUG   conn.mtproto    mtproto/new_encrypted_msg.go:62 Request {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "type_id": "0x62d6b459", "type_name": "msgs_ack#62d6b459", "msg_id": 7349869425367492384}

On sticker located in my DC I have this logs:

2024-03-24T13:17:25.449+0300    DEBUG   conn.mtproto    mtproto/handle_message.go:19    Handle message  {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "type_id": "0x74ae4240", "type_name": "updates#74ae4240", "size_bytes": 796, "msg_id": 7349872072621388801}
2024-03-24T13:17:25.523+0300    DEBUG   conn.mtproto    mtproto/rpc.go:29       Invoke start    {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "msg_id": 7349872071245901720}
2024-03-24T13:17:25.523+0300    DEBUG   conn.mtproto.rpc        rpc/engine.go:87        Do called       {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "msg_id": 7349872071245901720}
2024-03-24T13:17:25.523+0300    DEBUG   conn.mtproto.rpc        rpc/ack.go:34   Waiting for acknowledge {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "ack_id": 7349872071245901720}
2024-03-24T13:17:25.524+0300    DEBUG   conn.mtproto    mtproto/new_encrypted_msg.go:62 Request {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "type_id": "0xbe5335be", "type_name": "upload.getFile#be5335be", "msg_id": 7349872071245901720}
2024-03-24T13:17:25.672+0300    DEBUG   conn.mtproto    mtproto/handle_message.go:19    Handle message  {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "type_id": "0xf35c6d01", "type_name": "rpc_result", "size_bytes": 25124, "msg_id": 7349872073635940353}
2024-03-24T13:17:25.672+0300    DEBUG   conn.mtproto    mtproto/handle_result.go:24     Handle result   {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "type_id": "0x96a18d5", "type_name": "upload.file#96a18d5", "size_bytes": 25112, "msg_id": 7349872071245901720}
2024-03-24T13:17:25.672+0300    DEBUG   conn.mtproto.rpc        rpc/engine.go:99        Handler called  {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "msg_id": 7349872071245901720}
2024-03-24T13:17:25.675+0300    DEBUG   conn.mtproto    mtproto/rpc.go:47       Invoke end      {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "msg_id": 7349872071245901720}
2024-03-24T13:17:25.675+0300    DEBUG   conn    manager/conn.go:113     Invoke  {"v": "v0.99.1", "conn_id": 0, "dc_id": 2, "duration": "152.638917ms", "ongoing": 0}

What did you expect to see?

Downloading completes successfully with stickers from any DC.

What did you see instead?

Downloading completes successfully with stickers from only on my DC.

What Go version and environment are you using?

go version go1.22.0 darwin/arm64

go env Output
$ go env
GO111MODULE='on' GOARCH='amd64' GOBIN='' GOCACHE='/Users/teadove/Library/Caches/go-build' GOENV='/Users/teadove/Library/Application Support/go/env' GOEXE='' GOEXPERIMENT='' GOFLAGS='' GOHOSTARCH='arm64' GOHOSTOS='darwin' GOINSECURE='' GOMODCACHE='/Users/teadove/go/pkg/mod' GONOPROXY='gitlab.whoosh-cloud.com' GONOSUMDB='gitlab.whoosh-cloud.com' GOOS='darwin' GOPATH='/Users/teadove/go' GOPRIVATE='gitlab.whoosh-cloud.com' GOPROXY='https://proxy.golang.org,direct' GOROOT='/Users/teadove/go/go1.22.0' GOSUMDB='sum.golang.org' GOTMPDIR='' GOTOOLCHAIN='auto' GOTOOLDIR='/Users/teadove/go/go1.22.0/pkg/tool/darwin_arm64' GOVCS='' GOVERSION='go1.22.0' GCCGO='gccgo' GOAMD64='v1' AR='ar' CC='clang' CXX='clang++' CGO_ENABLED='0' GOMOD='/Users/teadove/projects/fun-telegram/go.mod' GOWORK='' CGO_CFLAGS='-O2 -g' CGO_CPPFLAGS='' CGO_CXXFLAGS='-O2 -g' CGO_FFLAGS='-O2 -g' CGO_LDFLAGS='-O2 -g' PKG_CONFIG='pkg-config' GOGCCFLAGS='-fPIC -arch x86_64 -m64 -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/1l/sljqjzq579vfs5tgztmg_x_w0000gn/T/go-build1516207097=/tmp/go-build -gno-record-gcc-switches -fno-common'

TeaDove avatar Mar 24 '24 10:03 TeaDove

I also freeze the client when I use gif search to send a trigger download, and send the same gif using a file to download normally

k-qg avatar Apr 11 '24 03:04 k-qg

I'll try to reproduce this, but not sure how to do it easily. Most of my clients are on DC4, what is the simple way to trigger this>

ernado avatar Apr 18 '24 17:04 ernado

I figured out this only happens when floodwait middleware is present.

For some reason, transfer call which transfer auth from one connection to another hangs when floodwait is present.

cyoung06 avatar May 18 '24 16:05 cyoung06

I figured out why it happens. Download -> Floodwait -(in its goroutine runs)-> Invoke -> Transfer -> Floodwait But floodwait's goroutine is stuck executing download, causing deadlock.

cyoung06 avatar May 18 '24 17:05 cyoung06

@cyoung06 you are right, i've disabled floodwaiter and there are no more problems with downloads. Hm, should I create issue for https://github.com/gotd/contrib/blob/master/middleware/floodwait?

P.S. I checked, there is no error with SimpleWaiter, only with just Waiter

TeaDove avatar May 19 '24 18:05 TeaDove

So, @TeaDove do you found a solution to fix Waiter?

XCVYN6UGT85JywuH avatar Nov 08 '24 09:11 XCVYN6UGT85JywuH

So, @TeaDove do you found a solution to fix Waiter?

Unfortunately no :( I've just moved to SimpleWaiter

TeaDove avatar Nov 08 '24 10:11 TeaDove

api.UploadGetFile() has nested tg.AuthExportAuthorizationRequest{} calling, before UploadGetFileRequest done, it will block scheduler run(see codes of waiter.Run()), so it must cause a dead block when DC has changed.

It's a bug of floodwaiter? or a mechanism? Single api calling with only one MTProto call is expected, but multi MTProto calls not. How to resolve this problem?

Here is my workaround, just fork waiter codes to myself repo, and patch it, do not let AuthExportAuthorizationRequest or UploadGetFileRequest (just only one request is working well) be sent to scheduler.

// Handle implements telegram.Middleware.
func (w *Waiter) Handle(next tg.Invoker) telegram.InvokeFunc {
	return func(ctx context.Context, input bin.Encoder, output bin.Decoder) error {
		if !w.running.Load() {
			// Return explicit error if waiter is not running.
			return errors.New("the Waiter middleware is not running: Run(ctx) method is not called or exited")
		}

		switch input.(type) {
		case *tg.UploadGetFileRequest, *tg.AuthExportAuthorizationRequest:
			return next.Invoke(ctx, input, output)
		default:
		}

		select {
		case err := <-w.sch.new(ctx, input, output, next):
			return err
		case <-ctx.Done():
			return ctx.Err()
		}
	}
}

In fact, floodwait is necessary for long time running and robustness programs, so just disable floodwait for UploadGetFileRequest is a little ugly solution for me now, any more ideas?

Student414 avatar Dec 09 '25 23:12 Student414

I found another way to solve this problem temporary, just create 2 goroutines to work with scheduler.

func (w *Waiter) Run(ctx context.Context, f func(ctx context.Context) error) (err error) {
	w.running.Store(true)
	defer w.running.Store(false)

	ctx, cancel := context.WithCancel(ctx)
	wg, ctx := errgroup.WithContext(ctx)
	wg.Go(func() error {
		defer cancel()
		return f(ctx)
	})

	ticker := w.clock.Ticker(w.tick)
	defer ticker.Stop()

	for range 2 {
		wg.Go(func() error {
			var requests []scheduled
			for {
				select {
				case <-ticker.C():
					requests = w.sch.gather(requests[:0])
					if len(requests) < 1 {
						continue
					}

					for _, s := range requests {
						ret, err := w.send(s)
						if ret {
							select {
							case s.request.result <- err:
							default:
							}
						}
					}
				case <-ctx.Done():
					return nil
				}
			}
		})
	}

	return wg.Wait()
}

Student414 avatar Dec 14 '25 02:12 Student414