hive icon indicating copy to clipboard operation
hive copied to clipboard

HIVE-26400: Provide docker images for Hive

Open dengzhhu653 opened this issue 2 years ago • 7 comments

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

dengzhhu653 avatar Jul 18 '22 02:07 dengzhhu653

Is there any scope to run it with local version of Hive/Hadoop/Tez or do we need a released version always for this?

The quick answer is yes, but there are some places to modify in order to run specified version:

  • change version in docker-compose.yml https://github.com/apache/hive/blob/213208570d1efa0f7a41d5a742edd0439b99163b/dev-support/docker/docker-compose.yml#L51-L52
  • change the download url in Dockerfile https://github.com/apache/hive/blob/213208570d1efa0f7a41d5a742edd0439b99163b/dev-support/docker/Dockerfile#L37-L43

I'm wondering if we can build hive from source directly, still need some feedback and investigation.

dengzhhu653 avatar Jul 21 '22 07:07 dengzhhu653

The new changes add support for running with local version of Hive:

sh deploy.sh --hadoop <hadoop version> --tez <tez version>

this command will build the image with given Hadoop and Tez version, and the local packaging/target/apache-hive-${project.version}-bin.tar.gz built from source, a cluster with HiveServer2, Metastore and MySQL would be started.

We can also build the image with a specified Hive version, just append --hive <hive version> to the above command.

By default, the command reads the version info from project pom.xml: project.version, hadoop.version, tez.version, these properties are read as hive version, hadoop version, tez version and used for deploy.sh to build the image.

Besides, we can start a standalone HiveServer2 only with embedded Metastore,

sh deploy.sh --hiveserver2

or just start a standalone Metastore with derby,

sh deploy.sh --metastore

dengzhhu653 avatar Jul 22 '22 05:07 dengzhhu653

@kgyrtkirk @abstractdog @pvary also any ideas or suggestions? Thank you!

dengzhhu653 avatar Jul 25 '22 01:07 dengzhhu653

Very nice work @dengzhhu653!

Could we add BeeLine to the examples, so the user could start to run queries immediately?

Will we officially provide an already built image for the Apache Hive versions? It would be good to update https://cwiki.apache.org/confluence/plugins/servlet/mobile?contentId=27362090#content/view/27362090 for using these images.

Thanks, Peter

pvary avatar Jul 25 '22 05:07 pvary

is there a chance that we can build this docker image + run simple queries in precommit time? otherwise, I'm afraid we cannot guarantee the stability of this feature

abstractdog avatar Jul 25 '22 06:07 abstractdog

Very nice work @dengzhhu653!

Could we add BeeLine to the examples, so the user could start to run queries immediately?

Will we officially provide an already built image for the Apache Hive versions? It would be good to update https://cwiki.apache.org/confluence/plugins/servlet/mobile?contentId=27362090#content/view/27362090 for using these images.

Thanks, Peter

Seems we should create the remote offical repository first, any suggestions on the repository name? such as apache/hive-hiveserver2, apache/hive-metastore, or just apache/hive? also cc @nrg4878, @abstractdog, @ayushtkn

Thanks, Zhihua

dengzhhu653 avatar Jul 25 '22 09:07 dengzhhu653

is there a chance that we can build this docker image + run simple queries in precommit time? otherwise, I'm afraid we cannot guarantee the stability of this feature

I've just realized that the docker image isn't built from the latest master, but instead from an already released hive version, so after thinking this over again, I guess we don't need a docker image build + test in precommit time (my original idea was to automatically check for every single hive commit if it breaks the image...like a smoke test)

UPDATE: just found this one:

this command will build the image with given Hadoop and Tez version, and the local packaging/target/apache-hive-${project.version}-bin.tar.gz built from source, a cluster with HiveServer2, Metastore and MySQL would be started.

I guess in this case we can do a follow-up jira to track precommit efforts: build hive image and run some simple queries after hive was succesfully built (and fresh jars are present under packaging/target/)

abstractdog avatar Jul 25 '22 10:07 abstractdog

why is this closed? I cannot see the patch on master I would be sad to see this forgotten, how can we proceed with this?

abstractdog avatar Oct 03 '22 07:10 abstractdog

@abstractdog Agreed. when the PR has no activity, it automatically gets closed. I am re-opening the PR.

nrg4878 avatar Oct 05 '22 03:10 nrg4878

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

sonarqubecloud[bot] avatar Nov 03 '22 09:11 sonarqubecloud[bot]

@zabetak @abstractdog what is to be done for this here. This seems like very useful and easier way to run hive services out of the box than dev-box setup though less powerful. the dev-box setup is not part of hive codebase as I understand it. What is it we want to complete before this can be merged? Tahnk you

nrg4878 avatar Nov 14 '22 14:11 nrg4878

@zabetak @abstractdog what is to be done for this here. This seems like very useful and easier way to run hive services out of the box than dev-box setup though less powerful. the dev-box setup is not part of hive codebase as I understand it. What is it we want to complete before this can be merged? Tahnk you

I haven't picked up the context here yet need to check what we have at the moment, I'll try out what was implemented here my expectation is that I can run different hive components very conveniently on my local machine according to a readme (also included in the repo), and if so, this PR is good to be merged in, let me check next week

abstractdog avatar Nov 18 '22 14:11 abstractdog

@deniskuzZ any thoughts about the PR? Thank you in advance!

dengzhhu653 avatar Jan 17 '23 02:01 dengzhhu653

I had some comments earlier, if you can confirm you addressed those @dengzhhu653 , I'll take a second look and approve, it's time to merge this I believe please create an umbrella ticket for hive docker improvements and add this one as the first sub-jira, let's track further improvements there

abstractdog avatar Jan 18 '23 14:01 abstractdog

I had some comments earlier, if you can confirm you addressed those @dengzhhu653 , I'll take a second look and approve, it's time to merge this I believe please create an umbrella ticket for hive docker improvements and add this one as the first sub-jira, let's track further improvements there

I've merged the two images into only one(apache/hive), and create a parent jira to track the improvements, please take a look if have time, thank you!

dengzhhu653 avatar Jan 19 '23 06:01 dengzhhu653

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
No Duplication information No Duplication information

sonarqubecloud[bot] avatar Feb 03 '23 07:02 sonarqubecloud[bot]

I love this initiative. Can we get more eyes on it?

I have 2 comments about it:

  1. HMS can work together with MySQL but to many times we found bugs with MySQL which gave us a lot of headache. Is it possible to change for Postgre?
  2. I think we should ask a docker account to push the image to the repository as we have a new build or new release.

What is the remaining part of this task to make it happens?

aturoczy avatar Mar 27 '23 11:03 aturoczy

I love this initiative. Can we get more eyes on it?

I have 2 comments about it:

  1. HMS can work together with MySQL but to many times we found bugs with MySQL which gave us a lot of headache. Is it possible to change for Postgre?
  2. I think we should ask a docker account to push the image to the repository as we have a new build or new release.

What is the remaining part of this task to make it happens?

Thank you @TuroczyX for the comments.

  1. Have changed the back db to Postgres or embedded Derby;
  2. This is the remaining part, want to track it in the future after this task has finished.

dengzhhu653 avatar Mar 27 '23 13:03 dengzhhu653

Excellent! We have started a conversation today how we can regularly publish the hive into the docker hub. It would be cool if the daily build or the release version could be playable easily.

Can't wait to play with it :)

-Attila

aturoczy avatar Mar 28 '23 08:03 aturoczy

Should be included in this initiative also create an docker image for the hive metastore standalone?

Something like this: https://techjogging.com/standalone-hive-metastore-presto-docker.html

https://github.com/arempter/hive-metastore-docker/blob/master/Dockerfile

https://github.com/aws-samples/hive-emr-on-eks/blob/main/docker/Dockerfile

Thanks

jtvmatos avatar Mar 28 '23 13:03 jtvmatos

Should be included in this initiative also create an docker image for the hive metastore standalone?

Something like this: https://techjogging.com/standalone-hive-metastore-presto-docker.html

https://github.com/arempter/hive-metastore-docker/blob/master/Dockerfile

https://github.com/aws-samples/hive-emr-on-eks/blob/main/docker/Dockerfile

Thanks

It is a good point. @dengzhhu653 @deniskuzZ @ayushtkn @abstractdog What do you think about it?

aturoczy avatar Mar 28 '23 13:03 aturoczy

Should be included in this initiative also create an docker image for the hive metastore standalone? Something like this: https://techjogging.com/standalone-hive-metastore-presto-docker.html https://github.com/arempter/hive-metastore-docker/blob/master/Dockerfile https://github.com/aws-samples/hive-emr-on-eks/blob/main/docker/Dockerfile Thanks

It is a good point. @dengzhhu653 @deniskuzZ @ayushtkn @abstractdog What do you think about it?

The image can serve both HS2 and Metastore, as you can see in the README: https://github.com/apache/hive/pull/3448/files#diff-75345b4702a737ff955983bea3daeac9243e26ef1d2dc0398a31ef28380da9cb. Separating them needs another build, makes it a bit hard to maintain in the public repo.

dengzhhu653 avatar Mar 29 '23 04:03 dengzhhu653

Should be included in this initiative also create an docker image for the hive metastore standalone? Something like this: https://techjogging.com/standalone-hive-metastore-presto-docker.html https://github.com/arempter/hive-metastore-docker/blob/master/Dockerfile https://github.com/aws-samples/hive-emr-on-eks/blob/main/docker/Dockerfile Thanks

It is a good point. @dengzhhu653 @deniskuzZ @ayushtkn @abstractdog What do you think about it?

The image can serve both HS2 and Metastore, as you can see in the README: https://github.com/apache/hive/pull/3448/files#diff-75345b4702a737ff955983bea3daeac9243e26ef1d2dc0398a31ef28380da9cb. Separating them needs another build, makes it a bit hard to maintain in the public repo.

Understandable.

aturoczy avatar Mar 29 '23 16:03 aturoczy

Seems like the build is broken. @deniskuzZ Could you please re-start?

aturoczy avatar Mar 29 '23 16:03 aturoczy

Seems like the build is broken. @deniskuzZ Could you please re-start?

A fork of this gets a green run: https://github.com/apache/hive/pull/4133. I think the broken build is may due to some time consuming tests running in the same split.

dengzhhu653 avatar Mar 31 '23 03:03 dengzhhu653

Seems like the build is broken. @deniskuzZ Could you please re-start?

A fork of this gets a green run: #4133. I think the broken build is may due to some time consuming tests running in the same split.

Do you need any help?

aturoczy avatar Apr 03 '23 18:04 aturoczy

Seems like the build is broken. @deniskuzZ Could you please re-start?

A fork of this gets a green run: #4133. I think the broken build is may due to some time consuming tests running in the same split.

Do you need any help?

Thank you @TuroczyX. The build would get a green run if I open another jira, I think the fix doesn't change any codes, nor build, so it's safe to go into master if the change itself looks fine, we can ignore the build failure in such case.

dengzhhu653 avatar Apr 04 '23 01:04 dengzhhu653

Yes, it should be unrelated. But it would be also great if we have a green build to follow the industrial standard. (Even logically does not have relation)

Btw we need to think about as a community to push hive images into docker hub. That would be super cool.

aturoczy avatar Apr 04 '23 09:04 aturoczy

Any update on this? :)

aturoczy avatar Apr 14 '23 19:04 aturoczy

I think it would be a good idea to add ports for development / debugging like this ones: -p9866:9866 -p10000:10000 -p10001:10001 -p9000:9000 -p8000:8000 -p3306:3306 -p50070:50070 -p50030:50030

zratkai avatar Apr 17 '23 09:04 zratkai

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

sonarqubecloud[bot] avatar Apr 17 '23 11:04 sonarqubecloud[bot]