core
core copied to clipboard
workspace.download_file - not downloading transitive files
Noticed while fixing the broken tests in https://github.com/OCR-D/ocrd_kraken/pull/42:
Here, we use Resolver.workspace_from_url without download, which copies the mets.xml and nothing else.
@pytest.fixture()
def workspace(tmpdir):
if os.path.exists(tmpdir):
shutil.rmtree(tmpdir)
workspace = Resolver().workspace_from_url(
assets.path_to('kant_aufklaerung_1784/data/mets.xml'),
dst_dir=tmpdir
)
return workspace
In the processors, the PAGE-XML is downloaded via
pcgts = page_from_file(self.workspace.download_file(input_file))
image_url = pcgts.get_Page().imageFilename
# [...]
image = self.workspace.resolve_image_as_pil(image_url)
This is apparently broken because the image file is not downloaded and tests fail.
So either I debug this properly to find out why the baseurl mechanism does not work here or we finally get rid of the long-deprecated resolve_image_as_pil altogether.