dagster icon indicating copy to clipboard operation
dagster copied to clipboard

s3_pickle_io_manager does not work with dynamic outputs

Open sryza opened this issue 3 years ago • 1 comments

https://dagster.slack.com/archives/C01U954MEER/p1642163479382400:

Hi, I'm experiencing a bug when trying to write an ALS model from pyspark.ml.recommendation to S3 and reading it back in if this takes place within a dynamically executed graph (i.e. via dynamic mapping). I wrote a custom IO manager using the pattern f's3a://{self.s3_bucket}/{key}' as _uri_for_key similar to the one currently implemented in the PickledObjectS3IOManager. As the step identifiers for the dynamically generated steps contain square brackets [ and ] these are included in the S3 uri when an object is written. Even though I can clearly see the model was saved to this path in S3, I'm getting an error when the downstream op tries to load the model, something like:

py4j.protocol.Py4JJavaError: An error occurred while calling o44.load.
: org.apache.hadoop.mapred.InvalidInputException: Input Pattern s3a://....fit_model[...]/model/metadata matches 0 files

When I replace/remove the square brackets from _uri_for_key this works fine:

f's3a://{self.s3_bucket}/{key}'.replace('[', '_').replace(']', '')

sryza avatar Jan 18 '22 16:01 sryza