KYLO-3266 Fix dependency conflict issue
Hi, in kylo-0.10.0 (kylo-0.10.0\core\file-metadata\file-metadata-core module), there are mulptiple versions of org.apache.poi:poi:jar. However, according to Maven's dependency management strategy, only org.apache.poi:poi:jar:3.15 can be loaded, and org.apache.poi:poi:jar:3.17 will be shadowed.
As shown in the following dependency tree, org.apache.tika:tika-parsers:jar:1.18 expects to reference org.apache.poi:poi:jar:3.17. But due to dependency conflicts, Maven actually loads org.apache.poi:poi:jar:3.15. As a result, org.apache.tika:tika-parsers:jar:1.18 has to invoke the methods included in the unexpected version org.apache.poi:poi:jar:3.15, which may cause inconsistent semantic behaviors.
For instance, method <com.thinkbiganalytics.kylo.tika.detector.CSVDetector: org.apache.tika.mime.MediaType detect(java.io.InputStream,org.apache.tika.metadata.Metadata)> actually references method <org.apache.poi.poifs.macros.VBAMacroReader: protected void readMacros(DirectoryNode macroDir, ModuleMap modules)> in the unexpected version org.apache.poi:poi:jar:3.15 via the following invocation path:
<com.thinkbiganalytics.kylo.tika.detector.CSVDetector: org.apache.tika.mime.MediaType detect(java.io.InputStream,org.apache.tika.metadata.Metadata)> D:\testcase\NewProject3\kylo-0.10.0\core\file-metadata\file-metadata-core\target\classes
<com.thinkbiganalytics.file.parsers.util.ParserUtil: void <clinit>()> D:\cEnvironment\repository\com\thinkbiganalytics\kylo\kylo-file-metadata-util\0.10.0\kylo-file-metadata-util-0.10.0.jar
<org.slf4j.LoggerFactory: org.slf4j.Logger getLogger(java.lang.Class)> D:\cEnvironment\repository\org\slf4j\slf4j-api\1.7.12\slf4j-api-1.7.12.jar
<org.slf4j.LoggerFactory: org.slf4j.Logger getLogger(java.lang.String)> D:\cEnvironment\repository\org\slf4j\slf4j-api\1.7.12\slf4j-api-1.7.12.jar
<org.slf4j.impl.Log4jLoggerFactory: org.slf4j.Logger getLogger(java.lang.String)> D:\cEnvironment\repository\org\slf4j\slf4j-log4j12\1.7.10\slf4j-log4j12-1.7.10.jar
<org.apache.log4j.LogManager: void <clinit>()> D:\cEnvironment\repository\log4j\log4j\1.2.17\log4j-1.2.17.jar
<org.apache.log4j.helpers.OptionConverter: void selectAndConfigure(java.net.URL,java.lang.String,org.apache.log4j.spi.LoggerRepository)> D:\cEnvironment\repository\log4j\log4j\1.2.17\log4j-1.2.17.jar
<org.apache.log4j.PropertyConfigurator: void doConfigure(java.net.URL,org.apache.log4j.spi.LoggerRepository)> D:\cEnvironment\repository\log4j\log4j\1.2.17\log4j-1.2.17.jar
<org.apache.log4j.PropertyConfigurator: void doConfigure(java.util.Properties,org.apache.log4j.spi.LoggerRepository)> D:\cEnvironment\repository\log4j\log4j\1.2.17\log4j-1.2.17.jar
<org.apache.log4j.PropertyConfigurator: void parseCatsAndRenderers(java.util.Properties,org.apache.log4j.spi.LoggerRepository)> D:\cEnvironment\repository\log4j\log4j\1.2.17\log4j-1.2.17.jar
<org.apache.log4j.config.PropertySetter: void setProperties(java.util.Properties,java.lang.String)> D:\cEnvironment\repository\log4j\log4j\1.2.17\log4j-1.2.17.jar
<org.apache.log4j.config.PropertySetter: void activate()> D:\cEnvironment\repository\log4j\log4j\1.2.17\log4j-1.2.17.jar
<org.apache.log4j.varia.ExternallyRolledFileAppender: void activateOptions()> D:\cEnvironment\repository\log4j\log4j\1.2.17\log4j-1.2.17.jar
<org.apache.log4j.varia.HUP: void run()> D:\cEnvironment\repository\log4j\log4j\1.2.17\log4j-1.2.17.jar
<junit.extensions.ActiveTestSuite$1: void run()> D:\cEnvironment\repository\junit\junit\4.12\junit-4.12.jar
<junit.framework.JUnit4TestAdapter: void run(junit.framework.TestResult)> D:\cEnvironment\repository\junit\junit\4.12\junit-4.12.jar
<org.junit.internal.runners.JUnit4ClassRunner: void run(org.junit.runner.notification.RunNotifier)> D:\cEnvironment\repository\junit\junit\4.12\junit-4.12.jar
<org.junit.internal.runners.ClassRoadie: void runProtected()> D:\cEnvironment\repository\junit\junit\4.12\junit-4.12.jar
<org.junit.internal.runners.ClassRoadie: void runUnprotected()> D:\cEnvironment\repository\junit\junit\4.12\junit-4.12.jar
<org.apache.tika.parser.ParsingReader$ParsingTask: void run()> D:\cEnvironment\repository\org\apache\tika\tika-core\1.14\tika-core-1.14.jar
<org.apache.tika.parser.microsoft.OfficeParser: void parse(java.io.InputStream,org.xml.sax.ContentHandler,org.apache.tika.metadata.Metadata,org.apache.tika.parser.ParseContext)> D:\cEnvironment\repository\org\apache\tika\tika-parsers\1.18\tika-parsers-1.18.jar
<org.apache.tika.parser.microsoft.OfficeParser: void extractMacros(org.apache.poi.poifs.filesystem.NPOIFSFileSystem,org.xml.sax.ContentHandler,org.apache.tika.extractor.EmbeddedDocumentExtractor)> D:\cEnvironment\repository\org\apache\tika\tika-parsers\1.18\tika-parsers-1.18.jar
<org.apache.poi.poifs.macros.VBAMacroReader: public Map<String, String> readMacros()>
<org.apache.poi.poifs.macros.VBAMacroReader: protected void findMacros(DirectoryNode dir, ModuleMap modules)>
<org.apache.poi.poifs.macros.VBAMacroReader: protected void readMacros(DirectoryNode macroDir, ModuleMap modules)>
By further analyzing, the expected callee <org.apache.poi.poifs.macros.VBAMacroReader: protected void readMacros(DirectoryNode macroDir, ModuleMap modules)>, have different implementations from the actual callees with the same signatures (same method names, same paremeters) included in the unexpected (but actual loaded) version org.apache.poi:poi.jar 3.15, which leads to different behaviors.
Solution: Use the newer version org.apache.poi:poi.jar 3.17 to keep the version consistency in dependency management document.
Thanks! Best regards, Coco
Dependency Tree---
[INFO] — maven-dependency-plugin:2.8:tree (default-cli) @ kylo-file-metadata-core — [INFO] com.thinkbiganalytics.kylo:kylo-file-metadata-core:jar:0.10.0 [INFO] +- com.thinkbiganalytics.kylo:kylo-file-metadata-model:jar:0.10.0:compile [INFO] | +- (javax.inject:javax.inject:jar:1:compile - omitted for duplicate) [INFO] | +- (org.slf4j:slf4j-api:jar:1.7.12:compile - version managed from 1.7.10; omitted for duplicate) [INFO] | - (org.slf4j:slf4j-log4j12:jar:1.7.10:compile - omitted for duplicate) [INFO] +- com.thinkbiganalytics.kylo:kylo-file-metadata-util:jar:0.10.0:compile [INFO] | +- (com.thinkbiganalytics.kylo:kylo-file-metadata-model:jar:0.10.0:compile - omitted for duplicate) [INFO] | +- org.apache.commons:commons-csv:jar:1.4:compile [INFO] | +- (org.apache.commons:commons-lang3:jar:3.7:compile - omitted for duplicate) [INFO] | +- (commons-io:commons-io:jar:2.5:compile - omitted for duplicate) [INFO] | +- (org.slf4j:slf4j-ext:jar:1.7.12:compile - omitted for duplicate) [INFO] | - (javax.inject:javax.inject:jar:1:compile - omitted for duplicate) [INFO] +- org.apache.commons:commons-lang3:jar:3.7:compile [INFO] +- commons-io:commons-io:jar:2.5:compile [INFO] +- com.fasterxml.jackson.core:jackson-databind:jar:2.9.6:compile [INFO] | +- (com.fasterxml.jackson.core:jackson-annotations:jar:2.9.6:compile - version managed from 2.9.0; omitted for duplicate) [INFO] | - com.fasterxml.jackson.core:jackson-core:jar:2.9.6:compile [INFO] +- com.fasterxml.jackson.datatype:jackson-datatype-joda:jar:2.9.6:compile [INFO] | +- (com.fasterxml.jackson.core:jackson-annotations:jar:2.9.6:compile - version managed from 2.9.0; omitted for duplicate) [INFO] | +- (com.fasterxml.jackson.core:jackson-core:jar:2.9.6:compile - omitted for duplicate) [INFO] | +- (com.fasterxml.jackson.core:jackson-databind:jar:2.9.6:compile - omitted for duplicate) [INFO] | - joda-time:joda-time:jar:2.9.2:compile (version managed from 2.7) [INFO] +- com.fasterxml.jackson.core:jackson-annotations:jar:2.9.6:compile [INFO] +- org.apache.tika:tika-core:jar:1.18:compile [INFO] +- org.slf4j:slf4j-api:jar:1.7.12:provided (scope not updated to compile) [INFO] +- org.slf4j:slf4j-ext:jar:1.7.12:compile [INFO] | +- (org.slf4j:slf4j-api:jar:1.7.12:compile - version managed from 1.7.10; omitted for duplicate) [INFO] | - ch.qos.cal10n:cal10n-api:jar:0.8.1:compile [INFO] +- org.slf4j:slf4j-log4j12:jar:1.7.10:provided (scope not updated to compile) [INFO] | +- (org.slf4j:slf4j-api:jar:1.7.12:provided - version managed from 1.7.10; omitted for duplicate) [INFO] | - log4j:log4j:jar:1.2.17:provided [INFO] +- org.apache.tika:tika-parsers:jar:1.18:compile [INFO] | +- (org.apache.tika:tika-core:jar:1.14:compile - version managed from 1.18; omitted for conflict with 1.18) [INFO] | +- org.apache.poi:poi:jar:3.15:compile (version managed from 3.17) [INFO] | | - org.apache.commons:commons-collections4:jar:4.1:compile [INFO] | - com.googlecode.juniversalchardet:juniversalchardet:jar:1.0.3:compile [INFO] +- javax.inject:javax.inject:jar:1:compile [INFO] - junit:junit:jar:4.12:test [INFO] - org.hamcrest:hamcrest-core:jar:1.3:test
Code snippet of <org.apache.poi.poifs.macros.VBAMacroReader: protected void readMacros(DirectoryNode macroDir, ModuleMap modules)> in org.apache.poi:poi:jar:3.15:

Code snippet of <org.apache.poi.poifs.macros.VBAMacroReader: protected void readMacros(DirectoryNode macroDir, ModuleMap modules)> in org.apache.poi:poi:jar:3.17:

Method <org.apache.poi.poifs.macros.VBAMacroReader: protected void readMacros(DirectoryNode macroDir, ModuleMap modules)> included in newer version org.apache.poi:poi:jar:3.17 deals with more cases, which changes the control flows and data flows. So being forced to use older version org.apache.poi:poi:jar:3.15 may lead to inconsisitent semantic behaviors.
Thanks again.
The issue description is available at : https://kylo-io.atlassian.net/projects/KYLO/issues/KYLO-3266?filter=addedrecently
@scottreisdorf Could you help me to review this PR?