eo-datasets icon indicating copy to clipboard operation
eo-datasets copied to clipboard

Support mixing remote accessory references

Open jeremyh opened this issue 2 years ago • 4 comments

Reported by Belle and Toktam

warnings.warn(
Traceback (most recent call last):
  File "le_lccs_odc.py", line 74, in <module>
    gridded_classification.run_classification(
  File "/home/jovyan/livingearth_lccs/le_lccs/le_utils/gridded_classification.py", line 366, in run_classification
    export_obj.write_xarray(l4_out_classification_array, **product.config())
  File "/home/jovyan/livingearth_lccs/le_lccs/le_export/gridded_export.py", line 187, in write_xarray
    p.add_accessory_file("lineage:static", data_xarray.attrs.get("accessories"))
  File "/home/jovyan/eo-datasets/eodatasets3/assemble.py", line 1025, in add_accessory_file
    self.note_accessory_file(*args, **kwargs)
  File "/home/jovyan/eo-datasets/eodatasets3/assemble.py", line 1584, in note_accessory_file
    self._checksum.add_file(Path(path))
  File "/home/jovyan/eo-datasets/eodatasets3/verify.py", line 97, in add_file
    hash_ = self._checksum(file_path)
  File "/home/jovyan/eo-datasets/eodatasets3/verify.py", line 115, in _checksum
    hash_ = calculate_file_hash(file_path)
  File "/home/jovyan/eo-datasets/eodatasets3/verify.py", line 42, in calculate_file_hash
    with Path(filename).open("rb") as f:
  File "/usr/lib/python3.8/pathlib.py", line 1222, in open
    return io.open(self, mode, buffering, encoding, errors, newline,
  File "/usr/lib/python3.8/pathlib.py", line 1078, in _opener
    return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: 's3:/dea-public-data/projects/LCCS/urban_mask.tif'

This is because DatasetAssembler is writing a package locally, and assumes all file references are local.

These derivative processing systems write local files but are trying to reference remote accessory files.

I'm not certain if this is what we want to do (accessories were originally references to non-measurement extra files included in the package), but we should decide on the preferred way for people to do this.

jeremyh avatar Apr 11 '22 03:04 jeremyh

I think we need to add support for accessory files stored on s3 but I'm keen to know what @omad and @SpacemanPaul think about this.

tebadi avatar Apr 11 '22 05:04 tebadi

Toktam's point makes sense to me. At a minimum, supporting accessory files stored remotely seems a reasonable use case given the community's increasing reliance on cloud storage.

SpacemanPaul avatar Apr 11 '22 05:04 SpacemanPaul

@jeremyh Happy for me to add support for this?

tebadi avatar Apr 11 '22 23:04 tebadi

Yep!

jeremyh avatar Apr 21 '22 04:04 jeremyh