Error while trying to send logs threw elasticsearch
Apache Airflow version
2.10.3
If "Other Airflow 2 version" selected, which one?
No response
What happened?
Hello all, hope you doing well.
While trying to send logs to elasticsearch directly threw the elastic adapter (inside airflow conf), it does not work. --> today we pass threw file share (azure) mounted as PV inside K8S, then logstash pipeline, but it costs a lots per years ...
I think the adapter is broken, even when trying to test a connection directly inside the webserver, we have an error :
'ESConnection' object has no attribute 'close'
Then when trying to send logs to elasticsearch, it does not try to send logs and it cannot connect to the elasticsearch when trying to get back logs.
Anyway, while trying to look for logs inside a DAG we also have this error :
elasticsearch.AuthenticationException: AuthenticationException(401, 'security_exception', 'missing authentication credentials for REST request [/airflow-logs-*/_count]') airflow-webserver-1 | 172.18.0.1 - - [06/Dec/2024:11:26:46 +0000] "GET /api/v1/dags/debug_airflow_to_elastic/dagRuns/manual__2024-12-06T11:26:37.042642+00:00/taskInstances/print_debug_message/logs/1?full_content=false HTTP/1.1" 500 1588 "http://localhost:8080/dags/debug_airflow_to_elastic/grid?dag_run_id=manual__2024-12-06T11%3A26%3A37.042642%2B00%3A00&task_id=print_debug_message&tab=logs" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"
But the I tried with an api key, user password and even both but cannot get rid of it.
I think there is a bug around it or did we do something wrong.
Thousands thanks !
Benjamin
What you think should happen instead?
No response
How to reproduce
Just add a connection to elasticsearch, try to connect to it.
Then add remote logging inside conf.
It does not try to send logs and it cannot connect to the elasticsearch.
Operating System
Kubernetes and docker compose. (both d'ont work)
Versions of Apache Airflow Providers
No response
Deployment
Official Apache Airflow Helm Chart
Deployment details
[core] sensitive_var_conn_names = key,login,secret,pass,auth hide_sensitive_var_conn_fields = True max_map_length = 16396 expose_config = non-sensitive-only load_examples = False test_connection = Enabled [webserver] show_trigger_form_if_no_params = True allow_testing_connections = Enabled [logging] remote_logging = True remote_log_conn_id = elasticsearch_default logging_level = INFO [elasticsearch] host = ************************************ write_stdout = True json_format = True index_patterns = airflow-logs-* [elasticsearch_config] verify_certs=False
Anything else?
No response
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
cc @Owen-CH-Leung wdyt?
From your error log, it seems that the elasticsearch cluster has a security setup to prevent unauthorised access from your k8s cluster. The AuthenticationException is a clear indication.
I'd advise to start by testing connectivity outside of Airflow to narrow down the root cause. For example, try running a standalone Python script inside the same Kubernetes cluster that hosts your Airflow environment. In that script, use the official [elasticsearch-py] (https://github.com/elastic/elasticsearch-py) client library to connect to your Elasticsearch cluster and try to do sth like es_client.ping(). Make sure to experiment with SSL-related parameters such as ssl_verify and ca_certs until you can reliably connect.
Once you've confirmed that your Python script can successfully interact with Elasticsearch, you can mirror those working configurations in your airflow.cfg (e.g., adjusting the Elasticsearch configuration sections) and restart Airflow
Hello @Owen-CH-Leung,
Thank you for your response.
The cluster is accessible from the outside, and some pipelines already successfully send data to Elasticsearch using elasticsearch-py.
However, the error logs suggest that Airflow is not transmitting the authentication parameters:
missing authentication credentials for REST request
The elasticsearch_default connection has already been created, so I’m wondering if there might be a workaround to ensure Airflow sends the authentication details to Elasticsearch properly?
Thank you
You can define the credentials in elasticsearch_configs session in your airflow cfg.
https://airflow.apache.org/docs/apache-airflow-providers-elasticsearch/stable/configurations-ref.html#elasticsearch-configs
In the elasticsearch_configs session, you can pass in any parameters that elasticsearch client accepts. Example:
[elasticsearch_configs]
http_compress = True
ca_certs = /root/ca.pem
api_key = "SOMEAPIKEY"
verify_certs = True
All the params you define will be passed into the elasticsearch python library like elasticsearch.Elasticsearch(**kwargs)
Thanks, it solved the issue, but we are still blocked after that, the data isn't push, however we can see pull of logs. We will create another issue for that. Thank you
The doc seems to be missleading, the title is Writing logs to Elasticsearch but it doesn't write anything to Elasticsearch, only read.
The doc seems to be missleading, the title is Writing logs to Elasticsearch but it doesn't write anything to Elasticsearch, only read.
Actually not really - It's about both writing (and then reading the logs. If you read the first paragraph (that's the first time I see the docs). The docs say that you can get the logs from stdout and forward them (write) to elasticsearch by fluentd, logstash or others.
Airflow can be configured to read task logs from Elasticsearch and optionally write logs to stdout in standard or json format. These logs can later be collected and forwarded to the Elasticsearch cluster using tools like fluentd, logstash or others.
Are you doing it @julienlagorsse-loreal ? Maybe that is the problem that you are not forwarding the stdout logs to elasticsearch?
We have issues on file share with k8s, anyway it's not related to the bug, but the title is clearly misleading, it say write logs to elastic, not read logs from elastic or write logs to stdout ... But I agree the doc paragraph is not.
Envoyé à partir de Outlook pour Androidhttps://aka.ms/AAb9ysg
C1 - Internal use
From: Jarek Potiuk @.> Sent: Friday, December 20, 2024 8:52:13 PM To: apache/airflow @.> Cc: LAGORSSE Julien - FREELANCE.COM @.>; Mention @.> Subject: Re: [apache/airflow] Error while trying to send logs threw elasticsearch (Issue #44724)
EXTERNAL EMAIL: BE VIGILANT
The doc seems to be missleading, the title is Writing logs to Elasticsearch but it doesn't write anything to Elasticsearch, only read.
Actually not really - It's about both writing (and then reading the logs. If you read the first paragraph (that's the first time I see the docs). The docs say that you can get the logs from stdout and forward them (write) to elasticsearch by fluentd, logstash or others.
Airflow can be configured to read task logs from Elasticsearch and optionally write logs to stdout in standard or json format. These logs can later be collected and forwarded to the Elasticsearch cluster using tools like fluentd, logstash or others.
Are you doing it @julienlagorsse-lorealhttps://urldefense.com/v3/__https://github.com/julienlagorsse-loreal__;!!IY5JXqZAIQ!8kh68iXFJWWysNnClYw4Xu_NkAOtn7hDyAbnZrfeNTfefGic2TZ9O8JeDyr_z95KgouWMaCt4-3lybhoK1CwzEEJBf3B$ ? Maybe that is the problem that you are not forwarding the stdout logs to elasticsearch?
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/apache/airflow/issues/44724*issuecomment-2557632100__;Iw!!IY5JXqZAIQ!8kh68iXFJWWysNnClYw4Xu_NkAOtn7hDyAbnZrfeNTfefGic2TZ9O8JeDyr_z95KgouWMaCt4-3lybhoK1CwzE5zm7Lm$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ATGMVPQYF2MI7T3QPTBPEW32GRYO3AVCNFSM6AAAAABTEPZO3OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJXGYZTEMJQGA__;!!IY5JXqZAIQ!8kh68iXFJWWysNnClYw4Xu_NkAOtn7hDyAbnZrfeNTfefGic2TZ9O8JeDyr_z95KgouWMaCt4-3lybhoK1CwzKpGjsyx$. You are receiving this because you were mentioned.Message ID: @.***>
This message and any attachments are confidential and intended solely for the addressees. If you receive this message in error, please delete it and immediately notify the sender. If the reader of this message is not the intended recipient, you are hereby notified that any unauthorized use, copying or dissemination is prohibited. E-mails are susceptible to alteration. Neither LOREAL nor any of its subsidiaries or affiliates shall be liable for the message if altered, changed or falsified.
We have issues on file share with k8s, anyway it's not related to the bug, but the title is clearly misleading, it say write logs to elastic, not read logs from elastic or write logs to stdout ... But I agree the doc paragraph is not.
Can you please propose an update to the page. It's as simple as clickign "Suggest a change on this page" and it will open a Pull Request where you can propose a change tha will remove the confusion.
Can we count on it @julienlagorsse-loreal ?