hive icon indicating copy to clipboard operation
hive copied to clipboard

HIVE-27884: LLAP: Reuse FileSystem objects from cache across different tasks in the same LLAP daemon

Open abstractdog opened this issue 2 years ago • 3 comments

What changes were proposed in this pull request?

Close the filesystem for a specific ugi once per query + daemon instead of every task. In particular this patch:

  1. moves UGI creation from task running logic (TaskRunnerCallable) to the factory
  2. introduces file system closure in the same factory
  3. removes unnecessary null checks

Why are the changes needed?

Performance. Creating a new filesystem is costly, and closing it too frequently completely disables hadoop's FileSystem CACHE

Does this PR introduce any user-facing change?

No.

Is the change a dependency upgrade?

No.

How was this patch tested?

Tested on LLAP cluster.

  1. ran multiple tpcds queries
  2. checked logs (only 1 ugi/fs creation happened for a query, every other tasks fell into reusing the same ugi)
  3. filesystem is closed after query

abstractdog avatar Nov 20 '23 12:11 abstractdog

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 10 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

warning The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17. Read more here

sonarqubecloud[bot] avatar Nov 27 '23 09:11 sonarqubecloud[bot]

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the [email protected] list if the patch is in need of reviews.

github-actions[bot] avatar Jan 27 '24 00:01 github-actions[bot]

Is the credentials fix - the one Kokila pointed out going in before further merge? Otherwise looks good

yes, tried to clarify this in code comments, but let me add links here for more clarity:

     * Regarding vertex user: LlapTaskCommunicator has a single "user" field,
     * which is passed into the SignableVertexSpec.

https://github.com/apache/hive/blob/e0bd9eac32851b27976a78cd83584cb867b83182/llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskCommunicator.java#L125

https://github.com/apache/hive/blob/e0bd9eac32851b27976a78cd83584cb867b83182/llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskCommunicator.java#L919

     * Regarding credentials: LlapTaskCommunicator creates SubmitWorkRequestProto instances,
     * into which dag-level credentials are passed.

https://github.com/apache/hive/blob/e0bd9eac32851b27976a78cd83584cb867b83182/llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskCommunicator.java#L918

so assuming that the user is constant and credentials are DAG-level ones, we can make sure that the UserGroupInformation and the contained Credentials are not supposed to change withing a dag

abstractdog avatar Sep 06 '24 12:09 abstractdog

@deniskuzZ: your comments were addressed in https://github.com/apache/hive/pull/4882/commits/c84f536c64cf6b298109473c363434d7acbf816e

abstractdog avatar Sep 13 '24 16:09 abstractdog

LGTM, some minor stuff

thanks! addressed in https://github.com/apache/hive/pull/4882/commits/506c55d07c7e6f761c1295bfcdd6e6ec4d1c1f1e

abstractdog avatar Sep 15 '24 13:09 abstractdog

ready to merge, I checked the refactored patch on cluster, everything works as expected, only 1 ugi was created for a query:

query-executor <14>1 2024-09-16T08:28:43.780Z query-executor-0-0 query-executor 1 12b073f6-5537-44d0-8aca-578096c88c8d [mdc@38374 class="llap.LlapUgiManager" dagId="dag_1726475155829_0000_2" fragmentId="1726475155829_0000_2_01_000000_0" level="INFO" queryId="hive_20240916082842_806d9e6f-0a82-43cc-95ba-30b947b7fe0f" thread="IPC Server handler 4 on 25000"] Created ugi hive (auth:SIMPLE) for queryIdentifier 'QueryIdentifier{appIdentifier='application_1726475155829_0000', dagIdentifier=2}', current ugis #0 

abstractdog avatar Sep 16 '24 08:09 abstractdog