gobblin icon indicating copy to clipboard operation
gobblin copied to clipboard

Config management regression test

Open tuGithub opened this issue 9 years ago • 10 comments

tuGithub avatar Feb 24 '16 23:02 tuGithub

@chavdar , @sahilTakiar , please review it when you have time

tuGithub avatar Feb 25 '16 00:02 tuGithub

Can you add a description of what this PR entails?

sahilTakiar avatar Feb 29 '16 21:02 sahilTakiar

High Level Comment:

  • Anyway we can get this to run via TestNG and not via Azkaban?
    • The current implementation seems to require running a job via Azkaban, which means these tests won't run unless someone manually uploads the job to Azkaban and runs them.
    • However, while these are integration tests I don't see anything that requires them to be run on Azkaban or in a real HDFS cluster, it could simply be run on top of the local file system (e.g. FileSystem.getLocal()
  • The advantage would be that these tests get run during each build, which will help us catch any unexpected bugs earlier

sahilTakiar avatar Feb 29 '16 21:02 sahilTakiar

Added the design doc in the ticket: https://jira01.corp.linkedin.com:8443/browse/ETL-4050

If we only test based on testNG unit testing, that is not actual testing for HDFS store, especially through config client.

@chavdar , please advise.

Min

tuGithub avatar Mar 01 '16 18:03 tuGithub

In general, unit tests should be used for testing a single component in isolation. Dependencies should be mocked or use simplified test versions.

Integration tests should use multiple realistic components potentially running on different Hadoop clusters. Eventually, we may have the tooling to setup one or more local Hadoop clusters for testing but we don't have that right now. Loading a test on Azkaban and running it there should be a temp solution.

What this PR is missing is a high-level description of how the integration test framework is working. It seems like RegressionTest is the input point but I am still trying to figure this out.

chavdar avatar Mar 02 '16 00:03 chavdar

Do these tests rely on manual deployment of the config files on HDFS? If yes, we should integrate it with the config deployment code.

chavdar avatar Mar 02 '16 00:03 chavdar

@chavdar why are tests added to TestNG restricted to unit tests?

  • The current store implementations only require working with a FileSystem I'm not sure why we can't use the SimpleLocalHDFSConfigStoreFactory for this
  • We do have the ability to setup HDFS clusters using the MiniClusters, GobblinYarnAppLauncherTest is already doing this
    • This should also allow us to spawn multiple clusters, I'm not sure why these integration tests need that though
    • I've used MiniDFSCluster before and it works well
  • Getting this to run via TestNG also ensures the tests get run on each build
    • Correct me if I am wrong but we currently have to run the integration tests on Azkaban manually; and even if we did automate runs on Azkaban that would only happen internally

sahilTakiar avatar Mar 02 '16 20:03 sahilTakiar

High level structure:

  1. RegressionTest is the entry point of the test
  2. Based on the configuration, ExpectedResultBuilder will build all the expected results by parsing the json file "expected.conf" for each node
  3. There are multiple levels of validation A. validate through SimpleHDFSConfigStore B. validate through the InMemoryTopology and InMemoryValueInspector C. validate through the ConfigClient

tuGithub avatar Mar 08 '16 22:03 tuGithub

@vasanthrajamani Can you please prioritize?

abti avatar Jan 12 '17 03:01 abti

https://issues.apache.org/jira/browse/GOBBLIN-134

Please update your PR title with following prefix: [GOBBLIN-134]

abti avatar Jul 27 '17 18:07 abti