Allow running replay without an explicit index
Now that we have a way to upload WARCs from the admin interface (#436), an index at the replay startup should not be mandatory anymore. New replay CLI should behave line this:
ipwb replayshould start replay with a randomly generated empty index that only contains metadataipwb index some.warc | ipwb replayshould utilize the resulting index received from the pipe for replayipwb replay sample.cdxjshould usesample.cdxjfor replay
Related: #504
I started coding this then realized I was resurfacing old CDXJ /tmp/ generation code.
In __main__.py's else stating that an index is required we can call generateCDXJMetadata() to get a string but replay.py's start() expects a path to the index. Rather than creating a temp file, perhaps we can modify start() to optionally take a string but the current function seems to be heavily reliant on the expectation that the index will be somewhere and not passed along.
Which route do you want to go, @ibnesayeed?
The fix should be simple and some code branches from the checkArgs_replay might go away. Here is how I envision the new pseudo code:
def checkArgs_replay(args):
if not args.index: # irrespective of pipe or lack thereof
args.index = generate_random_index_file_path()
if pipe and pipe_data:
write_pipe_data_to_index()
# fix proxy as usual
replay.start(cdxjFilePath=args.index, proxy=proxy)