Events lost when sending them to Application Insights using azure-monitor-opentelemetry==1.8.3
Last week, we migrated to azure-monitor-opentelemetry==1.8.3.
In our project, we use it to connect to Application Insights from a Databricks job. We also use azure-monitor-events-extension==0.1.0 to send events to customEvents table.
We install these two libraries using an init script in the job cluster.
This is the code we use to send events to Application Insights.
from azure.monitor.events.extension import track_event
from azure.monitor.opentelemetry import configure_azure_monitor
from opentelemetry._logs import get_logger_provider
def create_azure_connection():
connection_string = dbutils.secrets.get('nonprod-keyvault-scope', 'application-insights-connection-string')
configure_azure_monitor(connection_string=connection_string)
def send_custom_event(event_name, message_dict):
create_azure_connection()
track_event(event_name, message_dict)
get_logger_provider().force_flush()
print(f"Event '{event_name}' tracked")
We send one event at the start of the Databricks job and another at the end.
send_custom_event('Start Cloud ETL Process', attributes)
send_custom_event('Finish Cloud ETL Process', attributes)
We have several ETLs running, and there are events that have not arrived to Application Insights, even though the send_custom_event() function has been executed.
In one of the ETLs, we have received all the Start events but not all the Finish events, despite seeing the print ‘Event “Finish Cloud ETL Process” tracked’. In another ETL, we received the Finish message but not the Start message. So there is no clear pattern as to when events are ‘lost’.
This is the result of running pip freeze from a cluster that has this init script:
annotated-types==0.7.0
asgiref==3.11.0
asttokens==2.0.5
astunparse==1.6.3
azure-core==1.31.0
azure-core-tracing-opentelemetry==1.0.0b12
azure-identity==1.25.1
azure-monitor-events-extension==0.1.0
azure-monitor-opentelemetry==1.8.3
azure-monitor-opentelemetry-exporter==1.0.0b46
azure-storage-blob==12.23.0
azure-storage-file-datalake==12.17.0
black==24.4.2
blinker==1.7.0
boto3==1.34.69
botocore==1.34.69
cachetools==5.3.3
certifi==2024.6.2
cffi==1.16.0
chardet==4.0.0
charset-normalizer==2.0.4
click==8.1.7
cloudpickle==2.2.1
comm==0.2.1
contourpy==1.2.0
cryptography==42.0.5
cycler==0.11.0
Cython==3.0.11
databricks-sdk==0.30.0
dbus-python==1.3.2
debugpy==1.6.7
decorator==5.1.1
Deprecated==1.2.14
distlib==0.3.8
distro==1.9.0
distro-info==1.7+build1
docstring-to-markdown==0.11
executing==0.8.3
facets-overview==1.1.1
filelock==3.15.4
findspark==2.0.1
fonttools==4.51.0
gitdb==4.0.11
GitPython==3.1.37
google-api-core==2.20.0
google-auth==2.35.0
google-cloud-core==2.4.1
google-cloud-storage==2.18.2
google-crc32c==1.6.0
google-resumable-media==2.7.2
googleapis-common-protos==1.65.0
grpcio==1.60.0
grpcio-status==1.60.0
httplib2==0.20.4
idna==3.7
importlib-metadata==6.0.0
ipyflow-core==0.0.201
ipykernel==6.28.0
ipython==8.25.0
ipython-genutils==0.2.0
ipywidgets @ file:///databricks/.virtualenv-def/ipywidgets-7.7.2-2databricksnojsdeps-py2.py3-none-any.whl#sha256=903ead20c8d40de671853515fcad2f34b43ebf3eff80e4df3f876b8dd64c903b
isodate==0.6.1
jedi==0.19.1
jmespath==1.0.1
joblib==1.4.2
jupyter_client==8.6.0
jupyter_core==5.7.2
kiwisolver==1.4.4
launchpadlib==1.11.0
lazr.restfulclient==0.14.6
lazr.uri==1.0.6
matplotlib==3.8.4
matplotlib-inline==0.1.6
mccabe==0.7.0
mlflow-skinny==2.19.0
msal==1.34.0
msal-extensions==1.3.1
msrest==0.7.1
mypy==1.10.0
mypy-extensions==1.0.0
nest-asyncio==1.6.0
nodeenv==1.9.1
numpy==1.26.4
oauthlib==3.2.2
opentelemetry-api==1.39.0
opentelemetry-instrumentation==0.60b0
opentelemetry-instrumentation-asgi==0.60b0
opentelemetry-instrumentation-dbapi==0.60b0
opentelemetry-instrumentation-django==0.60b0
opentelemetry-instrumentation-fastapi==0.60b0
opentelemetry-instrumentation-flask==0.60b0
opentelemetry-instrumentation-psycopg2==0.60b0
opentelemetry-instrumentation-requests==0.60b0
opentelemetry-instrumentation-urllib==0.60b0
opentelemetry-instrumentation-urllib3==0.60b0
opentelemetry-instrumentation-wsgi==0.60b0
opentelemetry-resource-detector-azure==0.1.5
opentelemetry-sdk==1.39.0
opentelemetry-semantic-conventions==0.60b0
opentelemetry-util-http==0.60b0
packaging==24.1
pandas==1.5.3
parso==0.8.3
pathspec==0.10.3
patsy==0.5.6
pexpect==4.8.0
pillow==10.3.0
platformdirs==3.10.0
plotly==5.22.0
pluggy==1.0.0
prompt-toolkit==3.0.43
proto-plus==1.24.0
protobuf==4.24.1
psutil==5.9.0
psycopg2==2.9.3
ptyprocess==0.7.0
pure-eval==0.2.2
pyarrow==15.0.2
pyasn1==0.4.8
pyasn1-modules==0.2.8
pyccolo==0.0.65
pycparser==2.21
pydantic==2.8.2
pydantic_core==2.20.1
pyflakes==3.2.0
Pygments==2.15.1
PyGObject==3.48.2
PyJWT==2.7.0
pyodbc==5.0.1
pyparsing==3.0.9
pyright==1.1.294
python-apt==2.7.7+ubuntu5
python-dateutil==2.9.0.post0
python-lsp-jsonrpc==1.1.2
python-lsp-server==1.10.0
pytoolconfig==1.2.6
pytz==2024.1
PyYAML==6.0.1
pyzmq==25.1.2
requests==2.32.2
requests-oauthlib==2.0.0
rope==1.12.0
rsa==4.9
s3transfer==0.10.2
scikit-learn==1.4.2
scipy==1.13.1
seaborn==0.13.2
setuptools==74.0.0
six==1.16.0
smmap==5.0.0
sqlparse==0.5.1
ssh-import-id==5.11
stack-data==0.2.0
statsmodels==0.14.2
tenacity==8.2.2
threadpoolctl==2.2.0
tokenize-rt==4.2.1
tomli==2.0.1
tornado==6.4.1
traitlets==5.14.3
types-protobuf==3.20.3
types-psutil==5.9.0
types-pytz==2023.3.1.1
types-PyYAML==6.0.0
types-requests==2.31.0.0
types-setuptools==68.0.0.0
types-six==1.16.0
types-urllib3==1.26.25.14
typing_extensions==4.11.0
ujson==5.10.0
unattended-upgrades==0.1
urllib3==1.26.16
virtualenv==20.26.2
wadllib==1.3.6
wcwidth==0.2.5
whatthepatch==1.0.2
wheel==0.43.0
wrapt==1.14.1
yapf==0.33.0
zipp==3.17.0
Do you know if it's because of the new version of azure-monitor-opentelemetry? Is anyone else experiencing this?
Before migrating from version 1.8.1 to 1.8.3 of azure-monitor-opentelemetry, we were receiving all events correctly.
@leireroman12 Could you please address the following questions -
- Which version of
azure-monitor-opentelemetry,azure-monitor-opentelemetry-exporterwere you using before you upgraded to the latest version? - Did you update any other packages as well?
- Have you set any environment variables?
- For these ETLs, do you see any exceptions or error messages in the traces?
- When you received the print message -
‘Event “Finish Cloud ETL Process” tracked’, you did not see the logs for those in application insights, correct? Similarly, for theStart Cloudevent, did you observe the same behavior, i.e. seeing the print message but not the logs in AI? - Do you mind sharing the traces from the transaction search for any of the ETL runs?
- After you upgraded to the latest version of
azure-monitor-opentelemetry, did you restart your application/process?
Hi @leireroman12. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.
@leireroman12 Could you please address the following questions -
- Which version of
azure-monitor-opentelemetry,azure-monitor-opentelemetry-exporterwere you using before you upgraded to the latest version?
We were using azure-monitor-opentelemetry==1.8.1, which used azure-monitor-opentelemetry-exporter==1.0.0b45 underneath.
This is a screenshot of the stdout from the Databricks cluster job where an ETL was run before the upgrade:
- Did you update any other packages as well?
No, I just updated azure-monitor-opentelemetry
- Have you set any environment variables?
No
- For these ETLs, do you see any exceptions or error messages in the traces?
No
- When you received the print message -
‘Event “Finish Cloud ETL Process” tracked’, you did not see the logs for those in application insights, correct? Similarly, for theStart Cloudevent, did you observe the same behavior, i.e. seeing the print message but not the logs in AI?
In all ETLs, I see the two messages: “Event ‘Start Cloud ETL Process’ tracked” and “Event ‘Finish Cloud ETL Process’ tracked”. However, there are some events that have not reached Application Insights (despite having seen the message in Databricks, as I mentioned).
Today, for example, we received all events correctly in Application Insights. We don't know if it was a coincidence, because the only thing we changed was adding the Databricks notification for when the job finishes successfully, which has nothing to do with the OpenTelemetry library or sending events to AI.
- Do you mind sharing the traces from the transaction search for any of the ETL runs?
Example of the 2 events of an ETL in customEvents table in AI (I guess this is what you are asking for):
"timestamp [UTC]",name,itemType,customDimensions,customMeasurements,"operation_Name","operation_Id","operation_ParentId","operation_SyntheticSource","session_Id","user_Id","user_AuthenticatedId","user_AccountId","application_Version","client_Type","client_Model","client_OS","client_IP","client_City","client_StateOrProvince","client_CountryOrRegion","client_Browser","cloud_RoleName","cloud_RoleInstance",appId,appName,iKey,sdkVersion,itemId,itemCount,"_ResourceId"
"12/11/2025, 4:30:12.255 AM","Start Cloud ETL Process",customEvent,"{""DWH"":""sqldb-ana-dev-neu-demo"",""DWH_ID"":""4d22214d-f83a-4ca7-9c82-c22f49ae9856"",""DWH_size"":""GP_S_Gen5_1"",""JOB_ID"":""469100572173435"",""RUN_ID"":""121606667000485"",""ETL_TEMPLATE"":""databricks-etl-demo.json"",""Environment"":""dev"",""code.file.path"":""/databricks/python/lib/python3.12/site-packages/azure/monitor/events/extension/_events.py"",""code.function.name"":""track_event"",""code.line.number"":""36"",""size_files_processed"":""0.00 KiB"",""files_processed"":""()"",""unexpected_files"":""()"",""list_plants_expected"":""()"",""list_plants_processed"":""{}"",""list_plants_not_processed"":""()"",""node_type_id"":""Standard_D4ads_v5"",""min_workers"":""0"",""max_workers"":""0""}",,,00000000000000000000000000000000,0000000000000000,,,,,,,PC,Other,Linux,"98.71.61.183",Dublin,Dublin,Ireland,Other,"unknown_service","4e37842b-732e-4699-8151-b1524a04a626","f082d345-b893-4d5b-87fa-24392fd8159a","/subscriptions/a003f0d1-aede-4610-9a71-f162b4a39f33/resourcegroups/rg-ana-nonprod/providers/microsoft.insights/components/appi-ana-nonprod-neu","60cf0e61-a9ec-4b80-8aae-c1a5832500a6","ulm_py3.12.3:otel1.39.0:ext1.0.0b46","2036079e-d64a-11f0-9c4c-6045bdde6a46",1,"/subscriptions/a003f0d1-aede-4610-9a71-f162b4a39f33/resourcegroups/rg-ana-nonprod/providers/microsoft.insights/components/appi-ana-nonprod-neu"
"12/11/2025, 4:39:50.465 AM","Finish Cloud ETL Process",customEvent,"{""DWH"":""sqldb-ana-dev-neu-demo"",""DWH_ID"":""4d22214d-f83a-4ca7-9c82-c22f49ae9856"",""ETL_duration"":""0:09:52.668904"",""DWH_size"":""GP_S_Gen5_1"",""JOB_ID"":""469100572173435"",""RUN_ID"":""121606667000485"",""ETL_TEMPLATE"":""databricks-etl-demo.json"",""Environment"":""dev"",""ETL_duration_seconds"":""592.668904"",""ETL_return_code"":""0"",""Database_connection_time"":""40.646049"",""update_dates"":""460.943"",""start_etl"":""30.573508"",""index_maintenance"":""35.551"",""update_statistics"":""57.843"",""end"":""10.188"",""init_cluster"":""285.472492"",""code.file.path"":""/databricks/python/lib/python3.12/site-packages/azure/monitor/events/extension/_events.py"",""code.function.name"":""track_event"",""code.line.number"":""36""}",,,00000000000000000000000000000000,0000000000000000,,,,,,,PC,Other,Linux,"98.71.61.183",Dublin,Dublin,Ireland,Other,"unknown_service","4e37842b-732e-4699-8151-b1524a04a626","f082d345-b893-4d5b-87fa-24392fd8159a","/subscriptions/a003f0d1-aede-4610-9a71-f162b4a39f33/resourcegroups/rg-ana-nonprod/providers/microsoft.insights/components/appi-ana-nonprod-neu","60cf0e61-a9ec-4b80-8aae-c1a5832500a6","ulm_py3.12.3:otel1.39.0:ext1.0.0b46","77aa2927-d64b-11f0-89f4-7c1e5275b5f5",1,"/subscriptions/a003f0d1-aede-4610-9a71-f162b4a39f33/resourcegroups/rg-ana-nonprod/providers/microsoft.insights/components/appi-ana-nonprod-neu"
- After you upgraded to the latest version of
azure-monitor-opentelemetry, did you restart your application/process?
ETLs run on a Databricks job cluster, so libraries are installed using pip in a new, clean environment for each run.
@leireroman12 Thank you very much for this information. You mentioned today you received all the event in Application Insights, is that a consistent behavior since then? Or did you lose some more events in a subsequent run?
I have tried to reproduce the issue; by using the same setup and code as yourself and I always see the events land up in the application insights. The fact that send_custom_events is being executed confirms that the code is working and then events are being generated.
You mentioned that you added the Databricks notification for when the job finishes successfully and that kind of coincided with all events appearing in AI. Now was this notification enabled before you upgraded azure-monitor-opentelemetry?
You also mentioned that - ETLs run on a Databricks job cluster, so libraries are installed using pip in a new, clean environment for each run., does it override any settings you may have set, for example something similar to adding the Databricks notification?
Are there any debug logs/ any error logs/ logs of any kind on the Databricks side that we can look through? If not, is it possible to enable the logs which can capture the activities from the future runs?
There is a failures tile on application insights, do you mind checking that for the time duration when the events were expected?
Did you try to refresh the page where the traces appear, sometimes it take a while for them to appear?
Hi @rads-1996,
Receiving all events on that day is not consistent behaviour, because during these days, events have been lost again.
I added the notification after upgrading azure-monitor-opentelemetry. Notifications are configured in the Databricks job definition YAML, so it has nothing to do with or overwrites installing libraries using the init script in the job cluster.
In Databricks, the only logs I know we can see are the ones I mentioned earlier, stdout. And I haven't seen any errors there...
In Failures, I don't see any errors in the last 3 days.
Our ETLs run every day at midnight, and I check if the events have arrived a few hours later, so I don't think the ones we've lost over the last few days will show up.
In your tests, are you also using the azure-monitor-events-extension library to send events to the customEvents table? Perhaps there is some incompatibility with that library since the latest changes? Although, as I said, I don't see any error messages...
The truth is that I'm a bit lost with this and I don't know where the fault might be (latest update of azure-monitor-opentelemetry, the other library azure-monitor-events-extension, in application insights, etc.).
@leireroman12 I am using the exact same code as yourself and see all the events land up in Application Insights. If there would have been some incompatibility with the latest update, the events would not have been generated, and you would have received errors from the script itself. But seems like the events are being generated but some are not landing up in AI.
Do you mind modifying logging level in your code? You can do it by adding the following statements to your existing code -
, or you can add it the way you prefer, just set the logging level to DEBUG, so that we can see if for some reason the events are being rejected or dropped. Note: You might see an influx of traces but that is expected since we are setting the logging level to DEBUG.
Can you share those traces with us. You can remove this logging statement after you have collected the traces to avoid any verbosity.
Hi @leireroman12. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.