[BUG]: Undefined DD_API_KEY + Weird behavior for LLMObs
Tracer Version(s)
Lambda Layer 123
Node.js Version(s)
20
Bug Report
Hi all!, am using the datadog layer to trace some llmobs calls inside my lambda (using the datadog agent + datadog node layer, both on the latest current version).
Everything works fine but every now and then I get a 500 error on my lambda with this error message (its weird because it happens ocassionally, normally it happens if I make a request very near of a finished request (like just after it ends, send another oner) ).
{
"errorType": "TypeError",
"errorMessage": "Invalid value \"undefined\" for header \"DD-API-KEY\"",
"code": "ERR_HTTP_INVALID_HEADER_VALUE",
"stack": [
"TypeError [ERR_HTTP_INVALID_HEADER_VALUE]: Invalid value \"undefined\" for header \"DD-API-KEY\"",
" at ClientRequest.setHeader (node:_http_outgoing:658:3)",
" at new ClientRequest (node:_http_client:293:14)",
" at Object.request (node:https:381:10)",
" at /var/task/node_modules/dd-trace/packages/datadog-instrumentations/src/http/client.js:72:31",
" at run (node:diagnostics_channel:162:14)",
" at DatadogStorage.run (/var/task/node_modules/dd-trace/packages/datadog-core/src/storage.js:78:22)",
" at node:diagnostics_channel:94:18",
" at Channel.runStores (node:diagnostics_channel:171:12)",
" at Object.request (/var/task/node_modules/dd-trace/packages/datadog-instrumentations/src/http/client.js:50:27)",
" at makeRequest (/var/task/node_modules/dd-trace/packages/dd-trace/src/exporters/common/request.js:133:24)"
]
}
Is this normal?, maybe its my implementation? (am just using import tracer from 'dd-trace';)
Cheers!
Reproduction Code
No response
Error Logs
{
"errorType": "TypeError",
"errorMessage": "Invalid value \"undefined\" for header \"DD-API-KEY\"",
"code": "ERR_HTTP_INVALID_HEADER_VALUE",
"stack": [
"TypeError [ERR_HTTP_INVALID_HEADER_VALUE]: Invalid value \"undefined\" for header \"DD-API-KEY\"",
" at ClientRequest.setHeader (node:_http_outgoing:658:3)",
" at new ClientRequest (node:_http_client:293:14)",
" at Object.request (node:https:381:10)",
" at /var/task/node_modules/dd-trace/packages/datadog-instrumentations/src/http/client.js:72:31",
" at run (node:diagnostics_channel:162:14)",
" at DatadogStorage.run (/var/task/node_modules/dd-trace/packages/datadog-core/src/storage.js:78:22)",
" at node:diagnostics_channel:94:18",
" at Channel.runStores (node:diagnostics_channel:171:12)",
" at Object.request (/var/task/node_modules/dd-trace/packages/datadog-instrumentations/src/http/client.js:50:27)",
" at makeRequest (/var/task/node_modules/dd-trace/packages/dd-trace/src/exporters/common/request.js:133:24)"
]
}
Tracer Config
DD_API_KEY_SECRET_ARN: datadogApiKeySecretArn,
DD_ENV: 'dev',
DD_SERVICE: 'my-service',
DD_SITE: 'datadoghq.com',
DD_LOGS_ENABLED: 'true',
DD_TRACE_ENABLED: 'true',
DD_VERSION: APP_VERSION,
DD_LLMOBS_AGENTLESS_ENABLED: 0,
DD_LLMOBS_ENABLED: 1,
DD_LAMBDA_HANDLER: 'index.handler',
DD_LLMOBS_ML_APP: 'my-llm-app',
Operating System
No response
Bundling
ESBuild
Hi @abarone-btf! Thanks for opening this issue.
This is an interesting scenario - a few questions/suggestions from my end:
-
Are you doing any custom evaluation metrics submissions through
llmobs.submitEvaluation(...)(where we rely on api key)? Otherwise, a small script of your lambda that could aide a reproduction is appreciated, so I can also set this up and try and spam it. It's a very unlikely scenario for us to try and attach an API key for submitting LLMObs spans if you have disabled agentless (which I can see you have withDD_LLMOBS_AGENTLESS_ENABLED: 0). -
I would also call
llmobs.flush()before your script ends just in case. We try and do this on process exit, but as we iterate our serverless support and investigate, it's possible something like this might happen.llmobs.flushshould add a callback to the event loop to send data before the lambda exits. -
How often would you say this happens/you see these logs?
We're also working on some changes to improve the logic in the writers (LLMObs spans and evaluation metrics), so stay tuned there, as it could help here as well.
Hi! @sabrenner thank you for your help!:
I tested more and realize that the fetching of the secret (DD_API_KEY_SECRET_ARN) was making the error (I update the lambda with DD_API_KEY directly and the error never appeared).
This happens when the agentless is set with 0 (or 1 in that sense) and the dd-agent is actually present on the lambda. This may be a combination of multiple things, the only way I get it to work on my lambda (using both layers (agent and dd-trace)) was making agentless = 1 and DD_API_KEY directly as an env, all other combination didn't work for me (in some cases I got this error, and in other cases the trace never appeared on the LLM Obs Panel).
I am not submitting any evaluation, just making use of llmobs.trace, wrap (for the workflow), and lot of annotate).
Its also worth noticing that on my tests the "flushing" is kind of weird, many times the trace dont appear on the LLM Obs panel until I make another request (like literally after sending a second request, the first one appear on the DD console), dont know if its related to this, but its kind of odd and frustrating jajaja
(I make a llmobs.flush() call before returning in the lamdba handler)
Thanks for the extra clarification @abarone-btf!
For both of the following points:
I tested more and realize that the fetching of the secret (DD_API_KEY_SECRET_ARN) was making the error (I update the lambda with DD_API_KEY directly and the error never appeared).
and
Its also worth noticing that on my tests the "flushing" is kind of weird, many times the trace dont appear on the LLM Obs panel until I make another request (like literally after sending a second request, the first one appear on the DD console), dont know if its related to this, but its kind of odd and frustrating
Flushing is definitely something we need to improve, and I think as we re-work the Node.js support for LLM Observability in AWS Lambda to be more integrated in the layer, these issues should be resolved. I'll keep you updated on this issue, I should be able to take a poke at this over the next week or so.
I would say for now to keep the DD_API_KEY, and if you get traces more consistent when not calling llmobs.flush(), then don't call it for now, we'll introduce automatic support in the Node.js DD layer soon - it's something we're working on.
Feel free to bump this issue again if I don't reply within the next week or so!
No, thank you for your help! @sabrenner
I get almost the same behavior with or without the flush, do you recommend me to remove it for now them?
Will keep looking every now and then this issue 👌
Hi @sabrenner hope you are great!. Just checking if is there an improvement on the flush behavior (not to speed anything but to know so I can upgrade my lambdas and versions to test them out again).
Thanks as usual 🚀
Hi @abarone-btf, could you give the latest versions of each layer a try (extension layer 78, node.js dd layer 125)? We recently merged a change to the extension layer that should be more compatible with how we send data from the LLMObs SDK. I would make sure
DD_LLMOBS_ENABLED=trueDD_LLMOBS_ML_APP=<YOUR_ML_APP>DD_LLMOBS_AGENTLESS_ENABLED=false
are set. let me know if those work any better for you, and definitely feel free to follow up if not!
Hi @abarone-btf - just following up on if my last comment on the updates we made for the serverless layers helped out at all. if so, or if there's no response, i'll close out this issue for now, but if there's still an issue, feel free to comment or re-open the issue if it's closed!