d6tflow
d6tflow copied to clipboard
Potential Typo in the Docs - Define Upstream Dependency Tasks
https://d6tflow.readthedocs.io/en/latest/tasks.html
This following code defines a single output task and calls it as a dependency to other tasks. Yet TaskSingleOutput1 & TaskSingleOutput2 are not defined anywhere on this page.
# quick save one output
class TaskSingleOutput(d6tflow.tasks.TaskPqPandas):
def run(self):
self.save(data_output)
# no dependency
class TaskSingleInput(d6tflow.tasks.TaskPqPandas):
#[...]
# single dependency
@d6tflow.requires(TaskSingleOutput)
class TaskSingleInput(d6tflow.tasks.TaskPqPandas):
#[...]
# multiple dependencies
@d6tflow.requires({'input1':TaskSingleOutput1, 'input2':TaskSingleOutput2})
class TaskMultipleInput(d6tflow.tasks.TaskPqPandas):
#[...]
Also, it should be made clear in something like this example that the child keys are labeled in the persist, and the parent keys are defined in the dependency call.
# multiple dependencies, single & multiple outputs
@d6tflow.requires({'input1':TaskSingleOutput, 'input2':TaskMultipleOutput})
class TaskMultipleInput(d6tflow.tasks.TaskPqPandas):
def run(self):
data = self.inputLoad(as_dict=True)
data1a = data['input1'] # We reference the key defined in the dependency call
data2a = data['input2']['output1'] # 'output1' is a persist label defined in TaskMultipleOutput
data2b = data['input2']['output2']