Self-Signed Certificate Verification Error
Describe the bug I am trying to add a Tableau server Ingestion source. The server is hosted on an internal network and utilizes HTTPS with a self-signed certificate. The DataHub instance is the Quickstart container running locally.
The flow will provide the following error:
Unable to login (check your Tableau connection and credentials): HTTPSConnectionPool(host='${TABLEAU_HOST}', port=443): Max retries exceeded with url: /api/2.4/auth/signin (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
To Reproduce Flow formula:
source:
type: tableau
config:
connect_uri: 'https://${TABLEAU_HOST}'
stateful_ingestion:
enabled: true
ingest_owner: true
ingest_tags: true
username: cbosco
password: '${tableau_cbosco}'
sink:
type: datahub-rest
config:
server: 'http://datahub-gms:8080'
- Go to Ingestion
- Create source from formula
- Execute
Expected behavior A clear and concise description of what you expected to happen.
Desktop (please complete the following information):
- OS: Mac OS Sonoma 14.5
- Browser: chrome
- Datahub Version v0.13.3rc1
Additional context Full logs are here:
Execution finished with errors.
{'exec_id': 'f595a55a-0169-4c3e-9864-639debce590e',
'infos': ['2024-06-26 14:57:20.044957 INFO: Starting execution for task with name=RUN_INGEST',
"2024-06-26 14:57:30.118957 INFO: Failed to execute 'datahub ingest', exit code 1",
'2024-06-26 14:57:30.119043 INFO: Caught exception EXECUTING task_id=f595a55a-0169-4c3e-9864-639debce590e, name=RUN_INGEST, '
'stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 140, in execute_task\n'
' task_event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n'
' return future.result()\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 282, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"],
'errors': []}
~~~~ Ingestion Report ~~~~
{
"cli": {
"cli_version": "0.13.3rc1",
"cli_entry_location": "/tmp/datahub/ingest/venv-tableau-03575587e416950c/lib/python3.10/site-packages/datahub/__init__.py",
"models_version": "bundled",
"py_version": "3.10.13 (main, Jan 17 2024, 05:40:33) [GCC 12.2.0]",
"py_exec_path": "/tmp/datahub/ingest/venv-tableau-03575587e416950c/bin/python3",
"os_details": "Linux-6.4.16-linuxkit-aarch64-with-glibc2.36",
"mem_info": "72.92 MB",
"peak_memory_usage": "72.92 MB",
"disk_info": {
"total": "62.67 GB",
"used": "24.88 GB",
"used_initally": "24.88 GB",
"free": "34.58 GB"
},
"peak_disk_usage": "24.88 GB",
"thread_count": 1,
"peak_thread_count": 1
},
"source": {
"type": "tableau",
"report": {
"events_produced": 0,
"events_produced_per_sec": 0,
"entities": {},
"aspects": {},
"aspect_urn_samples": {},
"warnings": {},
"failures": {
"tableau-login": [
"Unable to login (check your Tableau connection and credentials): HTTPSConnectionPool(host='${TABLEAU_HOST}', port=443): Max retries exceeded with url: /api/2.4/auth/signin (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))"
]
},
"soft_deleted_stale_entities": [],
"start_time": "2024-06-26 14:57:21.563443 (6.6 seconds ago)",
"running_time": "6.6 seconds"
}
},
"sink": {
"type": "datahub-rest",
"report": {
"total_records_written": 0,
"records_written_per_second": 0,
"warnings": [],
"failures": [],
"start_time": "2024-06-26 14:57:21.446230 (6.72 seconds ago)",
"current_time": "2024-06-26 14:57:28.161440 (now)",
"total_duration_in_seconds": 6.72,
"max_threads": 15,
"gms_version": "v0.13.3rc1",
"pending_requests": 0
}
}
}
~~~~ Ingestion Logs ~~~~
Obtaining venv creation lock...
Acquired venv creation lock
venv is already set up
venv setup time = 0 sec
This version of datahub supports report-to functionality
+ exec datahub ingest run -c /tmp/datahub/ingest/f595a55a-0169-4c3e-9864-639debce590e/recipe.yml --report-to /tmp/datahub/ingest/f595a55a-0169-4c3e-9864-639debce590e/ingestion_report.json
[2024-06-26 14:57:21,442] INFO {datahub.cli.ingest_cli:147} - DataHub CLI version: 0.13.3rc1
[2024-06-26 14:57:21,448] INFO {datahub.ingestion.run.pipeline:254} - Sink configured successfully. DataHubRestEmitter: configured to talk to http://datahub-gms:8080
[2024-06-26 14:57:21,683] INFO {tableauserverclient.server.server:178} - Could not get version info from server: <class 'requests.exceptions.SSLError'>HTTPSConnectionPool(host='${TABLEAU_HOST}', port=443): Max retries exceeded with url: /api/2.4/serverInfo (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
[2024-06-26 14:57:21,684] INFO {tableauserverclient.server.server:180} - versions: None, 2.4
[2024-06-26 14:57:28,153] ERROR {datahub.ingestion.source.tableau:796} - tableau-login => Unable to login (check your Tableau connection and credentials): HTTPSConnectionPool(host='${TABLEAU_HOST}', port=443): Max retries exceeded with url: /api/2.4/auth/signin (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
[2024-06-26 14:57:28,153] INFO {datahub.ingestion.run.pipeline:276} - Source configured successfully.
[2024-06-26 14:57:28,153] INFO {datahub.cli.ingest_cli:128} - Starting metadata ingestion
[2024-06-26 14:57:28,160] INFO {datahub.ingestion.run.pipeline:529} - Processing commit request for DatahubIngestionCheckpointingProvider. Commit policy = CommitPolicy.ALWAYS, has_errors=True, has_warnings=False
[2024-06-26 14:57:28,160] WARNING {datahub.ingestion.source.state_provider.datahub_ingestion_checkpointing_provider:95} - No state available to commit for DatahubIngestionCheckpointingProvider
[2024-06-26 14:57:28,160] INFO {datahub.ingestion.run.pipeline:549} - Successfully committed changes for DatahubIngestionCheckpointingProvider.
[2024-06-26 14:57:28,161] INFO {datahub.ingestion.reporting.file_reporter:54} - Wrote FAILURE report successfully to <_io.TextIOWrapper name='/tmp/datahub/ingest/f595a55a-0169-4c3e-9864-639debce590e/ingestion_report.json' mode='w' encoding='UTF-8'>
[2024-06-26 14:57:28,161] INFO {datahub.cli.ingest_cli:141} - Finished metadata ingestion
Cli report:
{'cli_version': '0.13.3rc1',
'cli_entry_location': '/tmp/datahub/ingest/venv-tableau-03575587e416950c/lib/python3.10/site-packages/datahub/__init__.py',
'models_version': 'bundled',
'py_version': '3.10.13 (main, Jan 17 2024, 05:40:33) [GCC 12.2.0]',
'py_exec_path': '/tmp/datahub/ingest/venv-tableau-03575587e416950c/bin/python3',
'os_details': 'Linux-6.4.16-linuxkit-aarch64-with-glibc2.36',
'mem_info': '72.92 MB',
'peak_memory_usage': '72.92 MB',
'disk_info': {'total': '62.67 GB', 'used': '24.88 GB', 'used_initally': '24.88 GB', 'free': '34.57 GB'},
'peak_disk_usage': '24.88 GB',
'thread_count': 1,
'peak_thread_count': 1}
Source (tableau) report:
{'events_produced': 0,
'events_produced_per_sec': 0,
'entities': {},
'aspects': {},
'aspect_urn_samples': {},
'warnings': {},
'failures': {'tableau-login': ["Unable to login (check your Tableau connection and credentials): HTTPSConnectionPool(host='${TABLEAU_HOST}', port=443): Max retries exceeded with url: /api/2.4/auth/signin (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))"]},
'soft_deleted_stale_entities': [],
'start_time': '2024-06-26 14:57:21.563443 (6.82 seconds ago)',
'running_time': '6.82 seconds'}
Sink (datahub-rest) report:
{'total_records_written': 0,
'records_written_per_second': 0,
'warnings': [],
'failures': [],
'start_time': '2024-06-26 14:57:21.446230 (6.94 seconds ago)',
'current_time': '2024-06-26 14:57:28.382283 (now)',
'total_duration_in_seconds': 6.94,
'max_threads': 15,
'gms_version': 'v0.13.3rc1',
'pending_requests': 0}
Pipeline finished with at least 1 failures; produced 0 events in 6.82 seconds.
I know that when I use tableauserverclient in Python, I have to add the following option to get the connection to work:
import tableauserverclient as TSC
server = TSC.Server(os.getenv("TABLEAU_SERVER"), use_server_version=True)
server.add_http_options({"verify": False})
@craigbosco have you tried using the ssl_verify config? https://datahubproject.io/docs/generated/ingestion/sources/tableau/