datahub icon indicating copy to clipboard operation
datahub copied to clipboard

facing ingestion issue with delta lake source

Open kirantodekar123 opened this issue 2 years ago • 3 comments

After ingesting the config file in datahub ,I am facing below issue .Please suggest

'[2022-08-04 15:30:53,279] INFO {datahub.ingestion.run.pipeline:163} - Sink configured successfully. DataHubRestEmitter: configured ' 'to talk to http://datahub-gms:8080/\n' '[2022-08-04 15:30:53,797] ERROR {logger:26} - Please set env variable SPARK_VERSION\n' "[2022-08-04 15:30:53,934] ERROR {datahub.ingestion.run.pipeline:127} - 's3'\n" '[2022-08-04 15:30:53,935] INFO {datahub.cli.ingest_cli:119} - Starting metadata ingestion\n' '[2022-08-04 15:30:53,936] INFO {datahub.cli.ingest_cli:137} - Finished metadata ingestion\n' "[2022-08-04 15:30:54,309] ERROR {datahub.entrypoints:188} - Command failed with 'Pipeline' object has no attribute 'source'. Run with " '--debug to get full trace\n' '[2022-08-04 15:30:54,310] INFO {datahub.entrypoints:191} - DataHub CLI version: 0.8.42 at ' '/tmp/datahub/ingest/venv-delta-lake-0.8.42/lib/python3.9/site-packages/datahub/init.py\n', "2022-08-04 15:30:54.582911 [exec_id=692d63dc-ff19-44de-a420-2ce1839da529] INFO: Failed to execute 'datahub ingest'", '2022-08-04 15:30:54.585438 [exec_id=692d63dc-ff19-44de-a420-2ce1839da529] INFO: Caught exception EXECUTING ' 'task_id=692d63dc-ff19-44de-a420-2ce1839da529, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n' ' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n' ' self.event_loop.run_until_complete(task_future)\n' ' File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 89, in run_until_complete\n' ' return f.result()\n' ' File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n' ' raise self._exception\n' ' File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n' ' result = coro.send(None)\n' ' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 112, in execute\n' ' raise TaskError("Failed to execute 'datahub ingest'")\n' "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]} Execution finished with errors.

kirantodekar123 avatar Aug 04 '22 19:08 kirantodekar123

What does your config file look like?

shirshanka avatar Aug 08 '22 05:08 shirshanka

source: type: delta-lake config: env: DEV base_path: "s3://dt.lakehouse.uevents/eventsData/us-west-1"

    s3:
        aws_config:
            aws_region: "us-west-1"
            aws_access_key_id: ""
            aws_secret_access_key: ""
            
            

sink: type: "datahub-rest" config: server: "http://localhost:8080"

kirantodekar123 avatar Aug 08 '22 07:08 kirantodekar123

Could you paste it correctly formatted?

shirshanka avatar Aug 09 '22 06:08 shirshanka

You can close this .It has been resolved

kirantodekar123 avatar Aug 17 '22 05:08 kirantodekar123

Can you advise on what the resolution is?

jag959 avatar Aug 20 '22 20:08 jag959