presto
presto copied to clipboard
Bump up hadoop-apache2 version to 2.7.4-11
== NO RELEASE NOTE ==
@tdcmeehan @rschlussel I have bumped the hadoop-apache2 version (related to https://github.com/prestodb/presto-hadoop-apache2/pull/49)
@imjalpreet it looks like there are test failures
@tdcmeehan I will have a look and update the PR
@tdcmeehan I just checked there was another PR (https://github.com/prestodb/presto-hadoop-apache2/pull/47) which got merged into hadoop-apache2 which is causing these failures.
Looks like the config removed in that PR is still required for the current tests to pass, what do you suggest?
Hey @tdcmeehan, any suggestions on Jalpreet's question?
Hey @tdcmeehan, any suggestions on Jalpreet's question?
@imjalpreet and I discussed on Slack and he will be figuring out a path forward.
Due to the PR (https://github.com/prestodb/presto-hadoop-apache2/pull/47) some of the current tests are failing with Hadoop 2.x dependency. After some research, I realised that the config removed in the above PR is not required when using Hadoop 3.x but is still needed with the Hadoop 2.x dependency.
To resolve this, we have two options. We can either revert the above PR and bring in the remaining changes or we need to work on upgrading to Hadoop 3.x
After a discussion with @tdcmeehan, we decided it would be better in the long run if we work on upgrading to Hadoop 3.x since it has been pending for a long time.
I was looking into this and saw that we have a branch https://github.com/prestodb/presto-hadoop-apache2/tree/3.2.x in presto-hadoop-apache2 which was created a couple of years back. @tdcmeehan Do you have an idea of why we did not merge it with master and release it? We can work on top of that branch unless there were some blockers due to which it wasn't released.
I landed in this PR while checking if we have any plans for a Hadoop upgrade. Do we have any plan to upgrade the Hadoop version to 3.2.x? I see there is this PR https://github.com/prestodb/presto-hadoop-apache2/pull/57 to get it updated to 3.2.3. I am not sure if it requires any more work for the Hadoop version upgrade. @imjalpreet Do you know if there are any prerequisites for this upgrade? I see there were some test failures with Hadoop version upgrades earlier.
After the Hadoop version upgrade, we can also add a required dependency for supporting Azure Data Lake file system. I see it's added from hadoop-2.8.0 https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-azure-datalake
I see there is this PR https://github.com/prestodb/presto-hadoop-apache2/pull/57 to get it updated to 3.2.3.
That PR auto-generated and is not enough, there are significant changes that are needed to upgrade to Hadoop 3.
I worked with Rajat to get the Hadoop 3.2.x upgrade changes into https://github.com/prestodb/presto-hadoop-apache2/tree/3.2.x a few months back. We should have almost all the changes from the Hadoop dependency side but we also need to look into updating the docker images that are used in the CI pipelines since they are currently based on Hadoop 2.7.4. There might be some presto code changes as well.
After the Hadoop version upgrade, we can also add a required dependency for supporting Azure Data Lake file system. I see it's added from hadoop-2.8.0 https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-azure-datalake
Yes, that was one of the aims of this effort when I started looking into it last year but it has got delayed due to prioritisation. We wanted to upgrade to 3.2.x since ADLS Gen 2 was added in Hadoop 3 and there have been a few requests for that as well.
We should have a discussion on the plan and proceed from there.
Merged with #21483