Consider using provided scope for hadoop dependencies
I am working on IIS dependencies cleanup and I realised we have one special exclusion case (among 2-3 others): excluding hadoop libraries from CoAnSys dependencies used in IIS to prevent them from appearing in packages uploaded to cluster.
Since most of the hadoop dependencies are already available on CDH5 cluster we don't need to provide them explicitly in uploaded package. This is how we operate in IIS what allow us making oozie packages significantly smaller.
Have you considered using provided scope for CDH5 hadoop dependencies? This would allow us dropping all exclusion sections for citation-matching-core-code and document-similarity-oap-uberworkflow dependencies defined in IIS. You would get smaller packages.
We intend to simplify dependencies in the CoAnSys project. This will allow to set some dependencies as "provided". For the moment, it is not possible.