hive
hive copied to clipboard
HIVE-26400: Provide docker images for Hive
What changes were proposed in this pull request?
Why are the changes needed?
Does this PR introduce any user-facing change?
How was this patch tested?
Is there any scope to run it with local version of Hive/Hadoop/Tez or do we need a released version always for this?
The quick answer is yes, but there are some places to modify in order to run specified version:
- change version in docker-compose.yml https://github.com/apache/hive/blob/213208570d1efa0f7a41d5a742edd0439b99163b/dev-support/docker/docker-compose.yml#L51-L52
- change the download url in Dockerfile https://github.com/apache/hive/blob/213208570d1efa0f7a41d5a742edd0439b99163b/dev-support/docker/Dockerfile#L37-L43
I'm wondering if we can build hive from source directly, still need some feedback and investigation.
The new changes add support for running with local version of Hive:
sh deploy.sh --hadoop <hadoop version> --tez <tez version>
this command will build the image with given Hadoop and Tez version, and the local packaging/target/apache-hive-${project.version}-bin.tar.gz built from source, a cluster with HiveServer2, Metastore and MySQL would be started.
We can also build the image with a specified Hive version, just append --hive <hive version>
to the above command.
By default, the command reads the version info from project pom.xml
: project.version
, hadoop.version
, tez.version
, these properties are read as hive version, hadoop version, tez version and used for deploy.sh
to build the image.
Besides, we can start a standalone HiveServer2 only with embedded Metastore,
sh deploy.sh --hiveserver2
or just start a standalone Metastore with derby,
sh deploy.sh --metastore
@kgyrtkirk @abstractdog @pvary also any ideas or suggestions? Thank you!
Very nice work @dengzhhu653!
Could we add BeeLine to the examples, so the user could start to run queries immediately?
Will we officially provide an already built image for the Apache Hive versions? It would be good to update https://cwiki.apache.org/confluence/plugins/servlet/mobile?contentId=27362090#content/view/27362090 for using these images.
Thanks, Peter
is there a chance that we can build this docker image + run simple queries in precommit time? otherwise, I'm afraid we cannot guarantee the stability of this feature
Very nice work @dengzhhu653!
Could we add BeeLine to the examples, so the user could start to run queries immediately?
Will we officially provide an already built image for the Apache Hive versions? It would be good to update https://cwiki.apache.org/confluence/plugins/servlet/mobile?contentId=27362090#content/view/27362090 for using these images.
Thanks, Peter
Seems we should create the remote offical repository first, any suggestions on the repository name? such as apache/hive-hiveserver2, apache/hive-metastore, or just apache/hive? also cc @nrg4878, @abstractdog, @ayushtkn
Thanks, Zhihua
is there a chance that we can build this docker image + run simple queries in precommit time? otherwise, I'm afraid we cannot guarantee the stability of this feature
I've just realized that the docker image isn't built from the latest master, but instead from an already released hive version, so after thinking this over again, I guess we don't need a docker image build + test in precommit time (my original idea was to automatically check for every single hive commit if it breaks the image...like a smoke test)
UPDATE: just found this one:
this command will build the image with given Hadoop and Tez version, and the local packaging/target/apache-hive-${project.version}-bin.tar.gz built from source, a cluster with HiveServer2, Metastore and MySQL would be started.
I guess in this case we can do a follow-up jira to track precommit efforts: build hive image and run some simple queries after hive was succesfully built (and fresh jars are present under packaging/target/)
why is this closed? I cannot see the patch on master I would be sad to see this forgotten, how can we proceed with this?
@abstractdog Agreed. when the PR has no activity, it automatically gets closed. I am re-opening the PR.
@zabetak @abstractdog what is to be done for this here. This seems like very useful and easier way to run hive services out of the box than dev-box setup though less powerful. the dev-box setup is not part of hive codebase as I understand it. What is it we want to complete before this can be merged? Tahnk you
@zabetak @abstractdog what is to be done for this here. This seems like very useful and easier way to run hive services out of the box than dev-box setup though less powerful. the dev-box setup is not part of hive codebase as I understand it. What is it we want to complete before this can be merged? Tahnk you
I haven't picked up the context here yet need to check what we have at the moment, I'll try out what was implemented here my expectation is that I can run different hive components very conveniently on my local machine according to a readme (also included in the repo), and if so, this PR is good to be merged in, let me check next week
@deniskuzZ any thoughts about the PR? Thank you in advance!
I had some comments earlier, if you can confirm you addressed those @dengzhhu653 , I'll take a second look and approve, it's time to merge this I believe please create an umbrella ticket for hive docker improvements and add this one as the first sub-jira, let's track further improvements there
I had some comments earlier, if you can confirm you addressed those @dengzhhu653 , I'll take a second look and approve, it's time to merge this I believe please create an umbrella ticket for hive docker improvements and add this one as the first sub-jira, let's track further improvements there
I've merged the two images into only one(apache/hive), and create a parent jira to track the improvements, please take a look if have time, thank you!
I love this initiative. Can we get more eyes on it?
I have 2 comments about it:
- HMS can work together with MySQL but to many times we found bugs with MySQL which gave us a lot of headache. Is it possible to change for Postgre?
- I think we should ask a docker account to push the image to the repository as we have a new build or new release.
What is the remaining part of this task to make it happens?
I love this initiative. Can we get more eyes on it?
I have 2 comments about it:
- HMS can work together with MySQL but to many times we found bugs with MySQL which gave us a lot of headache. Is it possible to change for Postgre?
- I think we should ask a docker account to push the image to the repository as we have a new build or new release.
What is the remaining part of this task to make it happens?
Thank you @TuroczyX for the comments.
- Have changed the back db to Postgres or embedded Derby;
- This is the remaining part, want to track it in the future after this task has finished.
Excellent! We have started a conversation today how we can regularly publish the hive into the docker hub. It would be cool if the daily build or the release version could be playable easily.
Can't wait to play with it :)
-Attila
Should be included in this initiative also create an docker image for the hive metastore standalone?
Something like this: https://techjogging.com/standalone-hive-metastore-presto-docker.html
https://github.com/arempter/hive-metastore-docker/blob/master/Dockerfile
https://github.com/aws-samples/hive-emr-on-eks/blob/main/docker/Dockerfile
Thanks
Should be included in this initiative also create an docker image for the hive metastore standalone?
Something like this: https://techjogging.com/standalone-hive-metastore-presto-docker.html
https://github.com/arempter/hive-metastore-docker/blob/master/Dockerfile
https://github.com/aws-samples/hive-emr-on-eks/blob/main/docker/Dockerfile
Thanks
It is a good point. @dengzhhu653 @deniskuzZ @ayushtkn @abstractdog What do you think about it?
Should be included in this initiative also create an docker image for the hive metastore standalone? Something like this: https://techjogging.com/standalone-hive-metastore-presto-docker.html https://github.com/arempter/hive-metastore-docker/blob/master/Dockerfile https://github.com/aws-samples/hive-emr-on-eks/blob/main/docker/Dockerfile Thanks
It is a good point. @dengzhhu653 @deniskuzZ @ayushtkn @abstractdog What do you think about it?
The image can serve both HS2 and Metastore, as you can see in the README: https://github.com/apache/hive/pull/3448/files#diff-75345b4702a737ff955983bea3daeac9243e26ef1d2dc0398a31ef28380da9cb. Separating them needs another build, makes it a bit hard to maintain in the public repo.
Should be included in this initiative also create an docker image for the hive metastore standalone? Something like this: https://techjogging.com/standalone-hive-metastore-presto-docker.html https://github.com/arempter/hive-metastore-docker/blob/master/Dockerfile https://github.com/aws-samples/hive-emr-on-eks/blob/main/docker/Dockerfile Thanks
It is a good point. @dengzhhu653 @deniskuzZ @ayushtkn @abstractdog What do you think about it?
The image can serve both HS2 and Metastore, as you can see in the README: https://github.com/apache/hive/pull/3448/files#diff-75345b4702a737ff955983bea3daeac9243e26ef1d2dc0398a31ef28380da9cb. Separating them needs another build, makes it a bit hard to maintain in the public repo.
Understandable.
Seems like the build is broken. @deniskuzZ Could you please re-start?
Seems like the build is broken. @deniskuzZ Could you please re-start?
A fork of this gets a green run: https://github.com/apache/hive/pull/4133. I think the broken build is may due to some time consuming tests running in the same split.
Seems like the build is broken. @deniskuzZ Could you please re-start?
A fork of this gets a green run: #4133. I think the broken build is may due to some time consuming tests running in the same split.
Do you need any help?
Seems like the build is broken. @deniskuzZ Could you please re-start?
A fork of this gets a green run: #4133. I think the broken build is may due to some time consuming tests running in the same split.
Do you need any help?
Thank you @TuroczyX. The build would get a green run if I open another jira, I think the fix doesn't change any codes, nor build, so it's safe to go into master if the change itself looks fine, we can ignore the build failure in such case.
Yes, it should be unrelated. But it would be also great if we have a green build to follow the industrial standard. (Even logically does not have relation)
Btw we need to think about as a community to push hive images into docker hub. That would be super cool.
Any update on this? :)
I think it would be a good idea to add ports for development / debugging like this ones: -p9866:9866 -p10000:10000 -p10001:10001 -p9000:9000 -p8000:8000 -p3306:3306 -p50070:50070 -p50030:50030