iis
iis copied to clipboard
Extract plaintexts from NLM records provided by SN as zip packages
We should find the most convenient way to read bunch of zip files from HDFS (ideally straight from S3) and build avro datastore with DocumentText records holding all extracted NLMs.
Related redmine ticket: #5291.