datahub icon indicating copy to clipboard operation
datahub copied to clipboard

Pip packages are not installed for Iceberg source plugin with Hive type

Open usmanovbf opened this issue 1 year ago • 4 comments

Describe the bug Pip packages are not installed for Iceberg source plugin with hive type

To Reproduce Steps to reproduce the behavior:

  1. Create Iceberg ingestion source by the example below but without type: hive
  2. Click Save & Run
  3. Get an error about required type field
  4. Update the recipe with type: hive
  5. Click Save & Run button
  6. See the errors (logs are below):
    1. ModuleNotFoundError: No module named 'thrift'
    2. pyiceberg.exceptions.NotInstalledError: Apache Hive support not installed: pip install 'pyiceberg[hive]'

Expected behavior All pypi packages 'pyiceberg[hive]' thrift should be installed properly

Solution

Execute pip install every time before execution of recipe

Screenshots image

Desktop (please complete the following information):

  • OS: MacOS Sonoma arm64
  • Browser Chrome
  • Version 122.0.6261.112

Additional context

  1. Recipe:
source:
    type: iceberg
    config:
        env: PROD
        catalog:
            name: iceberg-catalog
            type: hive
            config:
                uri: 'https://hostname1:9083'
                s3.endpoint: 'https://hostname2'
                s3.access-key-id: '${secret1}'
                s3.secret-access-key: '${secret2}'
        table_pattern:
            allow:
                - 'test.*'
        profiling:
            enabled: false
  1. Error logs: exec-urn_li_dataHubExecutionRequest_1d2b870e-81e9-477a-8869-39505a9f2b3d.log
  2. Even adding Extra Pip Libraries does not help Extra Pip Libraries9
  3. Datahub version 0.12.1
  4. As I see, it is not fixed in 0.13.0 from 0.12.1 https://github.com/datahub-project/datahub/commits/v0.12.1/metadata-ingestion/src/datahub/ingestion/source/iceberg

usmanovbf avatar Mar 14 '24 22:03 usmanovbf

@usmanovbf would you be open to sending a PR for this?

hsheth2 avatar Mar 19 '24 01:03 hsheth2

@hsheth2 sorry, I have no time for now. Hope you or your teammate will find some time to fix it

usmanovbf avatar Mar 20 '24 10:03 usmanovbf

might be related to #10289

igorvoltaic avatar Apr 15 '24 15:04 igorvoltaic

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

github-actions[bot] avatar May 16 '24 01:05 github-actions[bot]

This issue was closed because it has been inactive for 30 days since being marked as stale.

github-actions[bot] avatar Jun 15 '24 01:06 github-actions[bot]