core icon indicating copy to clipboard operation
core copied to clipboard

refactor AlternativeImage selection logic out of Workspace into a stateless function (without downloading)

Open bertsky opened this issue 11 months ago • 1 comments

we need to know whether refactoring the AlternativeImage selection logic out of Workspace.image_from_* into a stateless function (without any download_file behaviour) would break any existing API in the future, hence whether it must be done prior to 3.0 or can be done later.

@kba I don't think we need to break anything here in the future. The methods Workspace.image_from_page and Workspace.image_from_segment could be re-implemented as follows:

  • delegate to new generateds user methods PageType.get_image and [*Region|TextLine|Word]Type.get_image,
    but pass as new kwarg resolve a function with the following definition:
    def resolve(image_url):
        try:
            f = next(self.mets.find_files(local_filename=str(image_url)))
            return f.local_filename
        except StopIteration:
            try:
                f = next(self.mets.find_files(url=str(image_url)))
                return self.download_file(f).local_filename
            except StopIteration:
                with download_temporary_file(image_url) as f:
                   return f.name
    
  • replace calls to resolve_image_exif by calls to exif_from_filename directly,
    but allow overriding filename via resolve
  • replace calls to resolve_image_as_pil by calls to a new function image_from_filename, which merely contains the parts that do Image.open() and .load() to give up the FD, as well as array conversion for the badly supported color modes I and F, but allow overriding filename via resolve

Originally posted by @bertsky in https://github.com/bertsky/core/issues/21#issuecomment-2593375060

bertsky avatar Jan 20 '25 15:01 bertsky

related: https://github.com/OCR-D/core/issues/264

bertsky avatar Jan 20 '25 15:01 bertsky