hamilton icon indicating copy to clipboard operation
hamilton copied to clipboard

Enable one to mark a function that should be recreated for each parallel block

Open skrawcz opened this issue 2 years ago • 1 comments

Is your feature request related to a problem? Please describe. The S3 client below will not serialize with Ray (and probably other task executors)


def s3_client() -> boto.client:
    return boto3.client("s3")  # somethign like that

def task(file_path: str) -> Parallelizeable[pd.Series]:
   for idx, row in pd.read_csv(file_path):
       yield row

def image(task: pd.Series, s3_client: boto.client) -> img:
     _image = s3_client.get(task.url)
    return _image

def finalize(image: Collect[img]) -> ...:
    return ....

We should probably be able to mark the function to know that its result shouldn't be serialized for parallelization.

Describe the solution you'd like We should be able to mark it in a way and therefore know that it should now become part of the graph within the task...


@unserializeable
def s3_client() -> boto.client:
    return boto3.client("s3")  # somethign like that

def task(file_path: str) -> Parallelizeable[pd.Series]:
   for idx, row in pd.read_csv(file_path):
       yield row

def image(task: pd.Series, s3_client: boto.client) -> img:
     _image = s3_client.get(task.url)
    return _image

def finalize(image: Collect[img]) -> ...:
    return ....

Describe alternatives you've considered You make it part of the parallel block:

def task(file_path: str) -> Parallelizeable[pd.Series]:
   for idx, row in pd.read_csv(file_path):
       yield row

def s3_client(task: pd.Series) -> boto.client:
    """Do nothing with the input"""
    return boto3.client("s3")  # something like that

def image(task: pd.Series, s3_client: boto.client) -> img:
     _image = s3_client.get(task.url)
    return _image

def finalize(image: Collect[img]) -> ...:
    return ....

Additional context Not a burning desire, but it would aesthetically look cleaner...

skrawcz avatar Nov 17 '23 05:11 skrawcz

This is pretty similar to this issue: https://github.com/DAGWorks-Inc/hamilton/issues/90

elijahbenizzy avatar Nov 17 '23 05:11 elijahbenizzy