gokart icon indicating copy to clipboard operation
gokart copied to clipboard

[Feature Request] Don't save Task

Open vaaaaanquish opened this issue 2 years ago • 2 comments

I'll create a task like Function that won't be saved.

for example

import gokart

class Pipeline(gokart.TaskOnKart):
    def requires(self):
        data = LoadData()
        features = [MakeFeatureA(data=data), MakeFeatureB(data=data), MakeFeatureC(data=data)]

        # `Flatten` is a Task, but we don't want to dump result because the data will be too large :(
        feature = Flatten(features=features, axis=1)

        model = TrainModel(feature=feature)
        return model

vaaaaanquish avatar Nov 05 '21 14:11 vaaaaanquish

I'm thinking about making gokart.Function

import pandas
import gokart

class FlattenFunction(gokart.Function):
    def process(self):
        df_list = self.load()
        df = pd.concat(df_list, axis=1)
        return df

Function's result will not be dumped to TASK_WORKSPACE, but will be temporarily stored in a tmp file. In the second runs, There is no file, but it will be skipped judgment for whether the task has been executed.

vaaaaanquish avatar Nov 05 '21 14:11 vaaaaanquish

This is still just idea. Plz comment :)

vaaaaanquish avatar Nov 05 '21 14:11 vaaaaanquish