Refactoring ideas
Go best practices recommendations (such as those in this and this talk) suggest to use small interfaces with as little as one method, as this often allows to be more specific and flexible in the interfaces to accept in various methods ... while larger interfaces can always be composed based on smaller ones.
This could probably simplify a number of code pieces, for example around the ports and their connections, the sanity checks (which could be made optional), etc.
It would also be useful to think about whether we could use interfaces as a strategy to create typed ports, to provide compile time checks of the sanity of workflow connectivity.
Refactoring ideas list (updated continuously)
- [x] Make a BaseProcess struct type, with maps to hold the different types of ports, port accessor methods, name, connected checks etc. This could both be embedded by custom components, and the default process type, to decrease amount of code duplication a lot. (Started in 2c5ab163f13d906e29a3660958e7e19a96e8839a)
- [x] Remove the EmptyWorkflowProcess type
- [x] Convert FileReader to use BaseProcess
- [x] Convert FileSplitter to use BaseProcess
- [x] Convert MapToKeys to use BaseProcess
- [x] Convert StreamToSubstream to use BaseProcess
- [x] Move
Process.receiveInputs()and.receiveParams()toBaseProcess, possibly with better name, and perhaps create other schemes (than one IP per in-port) to receive IPs. - [ ] [On hold] Use the same IP (at least the same interface) for both files and parameters. This would allow to implement the
receiveInputs()method with a merge into a singlechan map[string]*IPthat anyprocess.Runmethod would iterate over to generate tasks, (or run inline). It would dramatically simplify the currentProcess.createTasks()method(!). It would also allow us to keep a full audit trail of where certain parameters are coming from, not only files. - [ ] ~~Make the IPs contain a hierarchic trees of all its upstream IPs, instead of just the AuditInfos doing that. That will allow us to dynamically do more interesting stuff, and it will be a more natural thing for custom components to implement (so that we don't get gaps in the provenance line on custom components).~~
- [ ] Make a generic IP interface, and a BaseIP struct with the basics. Then create different implementations, for writing to file, storing in memory, writing to object storage etc.
- [ ] Implement sub-streams as slices of IPs instead, and make those slices implement a "generic" IP interface (which we'll create), by creating the appropriate methods with the slice as receiver. (Think
Path(), which would return the paths of all the IPs concatenated, separated by spaces, etc). Should be useful for concatenation, globbing, etc, which takes or receives collections of files. - [ ] Well, so make a generic interface and base struct + variants, for all the core concepts: Workflow, Process, Task, IP ... and perhaps Executor (different ways of executing a particular type of Tasks - ShellTasks).
- [ ] Perhaps we can do that even for the ports, to at least reduce some code duplication, for the port name, Process, and other navigation code.
- [ ] Although we can't have channels of an interface type (UPDATE: We can), we can still have typed ports, internally working with the concrete IP type (since the container maps of ports can be of interface type). A BamInPort would only connect to a BamOutPort, etc.
Minor clean-up ideas
- [ ] Change the
CheckErrmethod to use a proper error wrapping technique instead. - [x] Remove the
utils.gofile in components (which just forwards the error check method)
Via some of the articles here, especially the ones by Will, I have came to realize struct types are another area where we could improve things. So I'll use this issue to gather a list of refactoring ideas for both interfaces and struct types.