distributed-extraction-framework
distributed-extraction-framework copied to clipboard
Unittests should work without hadoop and spark
IMHO, there are actually 2 issues:
- The extract tests require wikipedia dumps be downloaded first. I think there should be a small static dump stored somewhere in test/resources for the test to use.
- The unittest should run in spark test mode with local files. This will not require developper or the build machines to have hadoop and spark installed.