dd-sdk-flutter icon indicating copy to clipboard operation
dd-sdk-flutter copied to clipboard

Allow flushing of logs (force send to server)

Open btrautmann opened this issue 6 months ago • 8 comments

Feature description

We have a feature in our app that shows the user an error code whenever an error is experienced. The idea is that a user can hand this over to customer support to aid in triaging. We log the error code to Datadog to allow us to find the exact point in time the user experienced the issue. Unfortunately, that log is not always sent despite us configuring the SDK to send with max frequency and with minimal batch size.

We'd love to have a public method we could invoke in this scenario that would force-flush the logs and have them sent. Is this possible and I've just missed it?

Proposed solution

As a user I'd love to see a _sdk.flush() function that would attempt to send the existing client-side logs to the server.

Other relevant information

No response

btrautmann avatar Jun 10 '25 18:06 btrautmann

@fuzzybinary any thoughts on this?

I was looking recently at the Posthog SDK and they have something similar.

btrautmann avatar Jun 24 '25 18:06 btrautmann

Hi @btrautmann -

We don't have this, and it would rely on some native functionality to do so. I'll talk with some folks here about if we can support it, as a full "flush" has some implications we'll have to think about.

If you have a CSM, can you reach out to them and have them reach out to us so we can get more information about your use case?

fuzzybinary avatar Jun 30 '25 12:06 fuzzybinary

If you have a CSM, can you reach out to them and have them reach out to us so we can get more information about your use case?

@fuzzybinary sorry I'm not sure what a CSM is (and my googl'ing didn't help me to narrow it down too much). Is there any other information you need RE: our use case? I'd probably be the right person to share it.

btrautmann avatar Jun 30 '25 13:06 btrautmann

CSM - Customer Success Manager on the Datadog side. Not all customers have them.

The questions we have are around when you're expecting the logs and why.

Are folks trying to diagnose issues for customers in real time and the batches are taking too long to arrive?

Are you seeing the logs eventually, or not seeing them at all?

If it's taking too long, how long does it take on average, and what would be an acceptable length of time? If you're not seeing them at all, this would point to a different issue we might need to take care of.

fuzzybinary avatar Jun 30 '25 14:06 fuzzybinary

Are folks trying to diagnose issues for customers in real time and the batches are taking too long to arrive?

Generally the workflow is 1) Customer calls our customer support 2) Support triages the issue and if they need developer help, they card up a ticket and include a screenshot of the error screen with the error code 3) The developer(s) look up the error code in Datadog. It's at Step 3 that we've had multiple instances (I just looked through Slack briefly and found 3 instances in recent weeks. Notably that's only the times this missing data has been reported to Slack and I believe it has a fairly high incidence rate).

Are you seeing the logs eventually, or not seeing them at all?

For the 3 above I checked Datadog just now and none of them ever arrived. We do get error codes logged but I suspect that they are during sessions where the user continues using the app and doesn't close it, which could be for a number of reasons (frustration, the error being "terminal" in that the app must be restarted, etc.)

If it's taking too long, how long does it take on average, and what would be an acceptable length of time? If you're not seeing them at all, this would point to a different issue we might need to take care of.

Anecdotally, our normal (non-error-code) logs arrive in a timely fashion, but I am suspecting that logs that are not sent before a process death are never sent and the nature of error codes and when they appear in the user's journey make them much more likely to be dropped. I am curious if this behavior (logs not being sent if a process death occurs before the batch is uploaded) is expected or if that would be a bug. I have not yet spent time attempting to prove that is the case for us, but will do so if it's not expected behavior.

I'm happy to inquire internally about a CSM if you still think that's the correct path to take.

btrautmann avatar Jun 30 '25 22:06 btrautmann

Logs sent before process death are not sent during the current session -- they will, however be sent during the next session (and, if you are using RUM, they will be attributed to the correct RUM session in this case). If the death happens incredibly fast after you attempt a log, we might miss out on that log, but that would be unusual.

If you don't have a CSM, it may be worth opening a standard support request with Datadog so we can actually look at some examples in your org and get more specifics. Point them to this issue and they'll escalate it fairly quickly.

fuzzybinary avatar Jul 01 '25 19:07 fuzzybinary

I reached out to the person I believe is my CSM to discuss this further. In the meantime let's leave this open? Since I think as a feature request it's desirable more broadly than just our use case.

btrautmann avatar Jul 03 '25 16:07 btrautmann

Yes we'll leave it open. I do think there's potential use for it in the future, but I want to make sure we're solving your actual problem first 🙂

fuzzybinary avatar Jul 07 '25 14:07 fuzzybinary