feat: Enhance GCS connector docs for distcp with HNS
This PR improves the Cloud Storage connector documentation to better support users performing distcp operations with Hierarchical Namespace (HNS) enabled buckets in self-managed Hadoop environments.
This change directly addresses customer issues observed in Salesforce case [56459963] and Buganizer report [389061732], where users experienced intermittent distcp failures, often manifesting as DEADLINE_EXCEEDED errors or generic SSH operator error: exit status = 25.
Key changes include:
-
gcs/CONFIGURATION.md:- Clarified guidance on
fs.gs.http.read-timeoutandfs.gs.hierarchical.namespace.folders.enableto addressDEADLINE_EXCEEDEDerrors and ensure proper HNS interaction. - Added troubleshooting tips for generic exit codes and recommendations for using shaded JARs to resolve dependency conflicts.
- Clarified guidance on
-
gcs/INSTALL.md:- Expanded the "Troubleshooting the installation" section with more detailed advice on diagnosing dependency conflicts and enabling verbose logging, specifically highlighting its utility for
DEADLINE_EXCEEDEDerrors.
- Expanded the "Troubleshooting the installation" section with more detailed advice on diagnosing dependency conflicts and enabling verbose logging, specifically highlighting its utility for
-
gcs/README.md:- Updated the "Configuring the connector" section to prominently guide users facing
distcpand HNS issues, includingDEADLINE_EXCEEDEDerrors, to the more detailedCONFIGURATION.md.
- Updated the "Configuring the connector" section to prominently guide users facing
These updates aim to provide clearer instructions and troubleshooting steps, reducing the need for support engagement for these common problems in non-Dataproc Hadoop deployments.
Self link: go/ghgcd/hadoop-connectors/pull/1374 Related CL: cl/767194879
Addresses support issue:
go/sf/55915396 (case)
go/sf/56459963 (consult)
Addresses GitHub issue#1375
Addresses bug: b/389061732
Solves issue #1375