cpp_client_telemetry icon indicating copy to clipboard operation
cpp_client_telemetry copied to clipboard

Very large timeout when sending event

Open k-bren opened this issue 2 years ago • 3 comments
trafficstars

Windows 11 SDK version 3.7.5.7

Steps to reproduce. Send 100000 events. Some timeout errors will occur. Receiving service will have missing events The 1ds integration is here https://github.com/AzureAD/mip-sdk-for-cpp/blob/develop/src/core/api_impl/diagnostic/oneds/oneds_helper.cpp

What is the expected behavior? Timeout should be much lower

What is the actual behavior? Timeout is in hours

Additional context. Output from telemetry log file: 2023-03-30 04:05:12.571 T#27140 <W> [MATSDK] HTTP request OneDS_HTTP-13084 failed after 6646128 ms, a network error has occurred and events will be sent at a different time

k-bren avatar Apr 06 '23 00:04 k-bren

@k-bren - Can't access to above repo for integration code. Correct me if wrong, but the MIP SDK uses custom HTTP client library configured through 1ds-sdk C-API. The timeout value would be defined by this custom library. 1DS SDK uses by default libcurl for posix connection (with timeout of 5 secs), and WinINet for windows (with default timeout).

Also regarding missing events, these may be lost due to network issue mentioned by you above. This may be issue with the collector intermittently not accessible, or some other network issue. The 1DS-SDK will keep retrying sending events with exponential backoff and eventually the event would be dropped. You can further troubleshoot the network issue by intercepting the traffic using fiddler.

lalitb avatar Apr 06 '23 05:04 lalitb

Thanks @lalitb , that was helpful information. Below is the mip snippet where it handles the HTTP communication for events.
Would any of these failure callback cases result in the event being dropped? Also, the MIP timeout for HTTP is 30 seconds, this very long delay is very suspicious. I would think that either something is getting lost or the 1DS SDK is getting overwhelmed with very large number of events.

const char* requestId = operation->GetId().c_str(); if (operation->IsCancelled()) { callback(requestId, HTTP_RESULT_CANCELLED, nullptr /response/); } else { shared_ptrmip::HttpResponse mipResponse; try { mipResponse = operation->GetResponse(); } catch (const mip::BadInputError& ex) { MIPLOG(MIPERROR) << GetFormattedErrorMessage(ex, "OneDS HTTP operation failed"); callback(requestId, HTTP_RESULT_LOCAL_FAILURE, nullptr /response/); return; } catch (...) { MIPLOG(MIPERROR) << GetCurrentExceptionFormattedMessage("OneDS HTTP operation failed"); callback(requestId, HTTP_RESULT_NETWORK_FAILURE, nullptr /response/); return; }

  if (mipResponse->GetStatusCode() != ToIntegral(HttpStatusCode::HTTP_RESPONSE_SUCCESS)) {
    MIPLOG(WARNING) << "Unsuccessful OneDS response: " << GetHttpResponseDescription(*mipResponse);
  }

  // Convert to OneDS response
  http_response_t response;
  response.statusCode = mipResponse->GetStatusCode();
  response.bodySize = static_cast<int32_t>(mipResponse->GetBody().size());
  response.body = mipResponse->GetBody().data();
  vector<http_header_t> headers;
  for (const auto& mipHeader : mipResponse->GetHeaders()) {
    http_header_t header;
    header.name = mipHeader.first.c_str();
    header.value = mipHeader.second.c_str();
    headers.push_back(header);
  }
  response.headersCount = static_cast<int32_t>(headers.size());
  response.headers = headers.data();

  // Invoke OneDS callback
  callback(requestId, HTTP_RESULT_OK, &response);

k-bren avatar Apr 06 '23 15:04 k-bren

@lalitb did u mention before that 1DS will 'never' drop the event ? We use winhttp so it will use the default way winhttp timeout.

aks3090 avatar Apr 06 '23 17:04 aks3090