Per Unneberg
Per Unneberg
Data output is now collected via rules, making heavy use of pandas. This functionality should also be present in standalone scripts, along the following lines: ``` collect_results.py --samples SAMPLE1 SAMPLE2......
Current rule is not adapted to using "file:" syntax as options -b and -B duplicate the values of 'threeprime' and 'fiveprime'
backend.**global_config** should be updated once all parameters have been updated correctly. Currently not working/implemented.
(Long-term goal?): Integrate with SLURM/drmaa along the lines of luigi.hadoop and luigi.hadoop_jar. Currently using the local scheduler on nodes works well enough
How control the number of workers/threads in use? An example best explains the issue: alignment with bwa aln can be done with multiple threads. bwa sampe is single-threaded, and uses...
Integrate with hadoop. This may be extremely easy: set the job runner for the JobTasks via the config file; by default, they use DefaultShellJobRunner, but could also use a (customized...
Implement options --restart and --restart-from that restart from scratch or from a given task. Would require calculation of target names between any two vertices in the dependency graph. The idea...
Add task for cleaning up intermediate output (related to issue on pipes). Tmp files could be removed if is_tmp=True?
I have added start_time and end_time to BaseJobTask, but currently the times don't get submitted to the graph/table interface. This would allow monitoring execution times and identifying pipeline bottlenecks.