okhttp icon indicating copy to clipboard operation
okhttp copied to clipboard

OkHttpClient POST request fails when Request body size beyond certain limit

Open saravr opened this issue 3 years ago • 11 comments

I am making a sequence of POST requests with each subsequent request passing the next token received the previous response in the request body of the new request. I am noticing that if POST HTTP request body size is beyond a certain limit, the request times out after certain number of requests in the sequence. Typical request body size is about 14KB. With OkHttpClient event listener I notice that (for the failing call)

override fun requestBodyStart(call: Call)

is invoked but not

override fun requestBodyEnd(call: Call, byteCount: Long)

Instead override fun requestFailed(call: Call, ioe: IOException) is called with Socket timeout exception.

It works if make this call before each request (but looks like a hack)

okHttpClient.connectionPool.evictAll()

Not sure if OkHTTPClient maintains some state within. Charles is showing Failure status with IO: Stream cancelled by CLIENT message.

I am using

implementation "com.squareup.okhttp3:okhttp:4.9.3"

Backend is not in my control and I cannot optimize the request body size. Apparently, a truncated-body request reaches server which doesn't response and hence the timeout. Is there a way to fix this issue. Note: If I replace OkHttp with Volley I do not see this issue.

saravr avatar Apr 20 '22 22:04 saravr

Hmmm... not enough information to suggest any fixes here. I'm curious if your calling code is blocked waiting for the server to read?

Can you isolate this into a standalone test case?

swankjesse avatar Apr 21 '22 00:04 swankjesse

@swankjesse thanks. Here is my test code. Possibly I can try using a fake server to create a standalone test.

object NetworkModule {
    private lateinit var webservice: Webservice
    private lateinit var okHttpClient: OkHttpClient

    fun initRetrofit() {
        val loggingInterceptor = HttpLoggingInterceptor()
        loggingInterceptor.setLevel(HttpLoggingInterceptor.Level.BODY)

        val builder = OkHttpClient.Builder()
        builder.addInterceptor(loggingInterceptor)
        //builder.eventListener(MyEventListener())
        okHttpClient = builder.build()

        val retrofit = Retrofit.Builder()
            .addConverterFactory(GsonConverterFactory.create())
            .baseUrl(BASE_URL)
            .client(okHttpClient)
            .build()

        webservice = retrofit.create(Webservice::class.java)
    }

    private var next: String? = ""

    suspend fun getData() {
        do {
            val data = webservice.getData(getHeaders(), makeRequest(next))
            next = data.items?.result?.get(0)?.meta?.next
            val itemCount = data.items?.result?.get(0)?.items?.size
            
            if (next?.isEmpty() == true) {
                break
            }
            /*CoroutineScope(Dispatchers.IO).launch { 
                okHttpClient.connectionPool.evictAll() /// <-- THIS WILL PREVENT THE ERROR/TIMEOUT CRASH
            }*/
            delay(2000)
        } while (next != null && next?.isNotEmpty() == true)
    }

    private fun getHeaders(): Map<String, String> {
        val headers = mutableMapOf<String,String>()
        headers["Content-Type"] = "application/json"
        headers["Cookie"] = COOKIE
        headers["User-Agent"] = UA_STR
        return headers
    }

    private fun makeRequest(next: String?): MyRequest {
        return MyRequest(next = next, topics = listOf(Topic()))
    }

    private const val TAG = "NetworkModule"
    private const val BASE_URL = "https://<host>"
    private const val UA_STR = "..."
    private const val COOKIE = "..."
}

interface Webservice {
    @POST("... url ....")
    suspend fun getData(@HeaderMap headerMap: Map<String, String>, @Body body: MyRequest): MyResponse
}

class MyApplication: Application() {
    
    override fun onCreate() {
        super.onCreate()
        
        NetworkModule.initRetrofit()
        CoroutineScope(Dispatchers.IO).launch {
            NetworkModule.getData()
        }
    }
}

saravr avatar Apr 21 '22 01:04 saravr

Yeah this looks completely reasonable. Any chance you can post a thread dump when it stalls? I’m wondering if it’s stalling in OkHttp or possibly in coroutines somewhere?

swankjesse avatar Apr 21 '22 02:04 swankjesse

+1 to a thread dump.

Also compare the working and non working case with an event listener printing all events including connections. https://square.github.io/okhttp/features/events/

yschimke avatar Apr 21 '22 07:04 yschimke

Yeah this looks completely reasonable. Any chance you can post a thread dump when it stalls? I’m wondering if it’s stalling in OkHttp or possibly in coroutines somewhere?

@swankjesse attached the thread dump okhttp-thread-dump.txt

saravr avatar Apr 21 '22 14:04 saravr

+1 to a thread dump.

Also compare the working and non working case with an event listener printing all events including connections. https://square.github.io/okhttp/features/events/

Here is the log for not-working case. okhttp-events.txt

saravr avatar Apr 21 '22 15:04 saravr

Thread dump shows it's an issue with HTTP/2 flow control. It's blocking on this line:

           [email protected]() // Wait until we receive a WINDOW_UPDATE.

Somehow OkHttp and your HTTP/2 server don't agree on flow control things, because the server has not granted OkHttp permission to send more bytes.

If you do an HTTP/2 frame dump we should be able to assign blame to either OkHttp or your HTTP/2 server.

https://square.github.io/okhttp/contribute/debug_logging/#http2-frame-logging

swankjesse avatar Apr 22 '22 11:04 swankjesse

@swankjesse attaching the frame dump okhttp.timeout.log

You can see the timeout happing after the log with timestamp 15:27:38.620 1950-2863

Will this help?

saravr avatar May 04 '22 22:05 saravr

@saravr if you’ll forgive me for sitting on this for 2 months, I’ve got a PR out to add even more information to aid debugging. https://github.com/square/okhttp/pull/7325

I’ve also got a hypothesis for what’s happening...

A few streams (0x00000005, 0x00000013) are canceled by the client:

...
2022-05-04 15:27:19.341 1950-2971/com.myapp.debug W/System.err: ODL: [2022-05-04 15:27:19] >> 0x00000005     4 RST_STREAM
...
2022-05-04 15:27:48.650 1950-2857/com.myapp.debug W/System.err: ODL: [2022-05-04 15:27:48] >> 0x00000013     4 RST_STREAM

Later the server does a connection-wide window update:

2022-05-04 15:27:48.698 1950-2851/com.myapp.debug W/System.err: ODL: [2022-05-04 15:27:48] << 0x00000000     4 WINDOW_UPDATE

But is this window update big enough? In particular does it account for the received bytes of the canceled streams? The additional debugging in the PR should tell us.

swankjesse avatar Jun 12 '22 15:06 swankjesse

Relevant section from the HTTP/2 spec

  Flow-controlled frames (i.e., DATA) received after sending
  RST_STREAM are counted toward the connection flow-control window.
  Even though these frames might be ignored, because they are sent
  before the sender receives the RST_STREAM, the sender will
  consider the frames to count against the flow-control window.

swankjesse avatar Jun 12 '22 15:06 swankjesse

One other gotcha – if you’re canceling a stream but not closing the ResponseBody, you will have flow control errors. Make sure you’re closing the response body.

swankjesse avatar Jun 12 '22 15:06 swankjesse

@swankjesse Any chance this is fixed by https://github.com/square/okhttp/pull/7801

or likely needs a similar writer side change?

yschimke avatar May 20 '23 20:05 yschimke

#7801 is the right fix.

swankjesse avatar Dec 09 '23 15:12 swankjesse