layout-parser
layout-parser copied to clipboard
AssertError: Checkpoint /home/ec2-user/.torch/iopath_cache/s/dgy9c10wykk4lq4/model_final.pth not found!
I get the above assert when loading any config from the model zoo. The above example was thrown from the code below
model = lp.Detectron2LayoutModel(config_path='lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config',
extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8],
label_map={0: "Text", 1: "Title", 2: "List", 3: "Table", 4: "Figure"})
The reason for the assert is that the local file is actually called "model.pth?dl=1"
ls ~/.torch/iopath_cache/s/dgy9c10wykk4lq4
model_final.pth?dl=1 model_final.pth?dl=1.lock
I am using the latest version from "pip install layoutmanager", am running on an Amazon CentOS instance.
Any suggestions for a workaround would be greatly appreciated :-)
I can't copy the stack text, but attached is a screenshot

Additional info: The bug does not occur on my Mac laptop. It seems to be a Linux problem
I have the same problem!
Same here
So i didn´t solve the problem with the file name, but I find a way to work around it. If you download the model_final.pth mannually it creates a file with the correct filename. So, you can download the model_final.pth and the config.yaml file mannually and call it directly in your scrip by: model = lp.Detectron2LayoutModel( "./config.yaml", "./model_final.pth", extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8], label_map={0: "Text", 1: "Title", 2: "List", 3: "Table", 4: "Figure"}, ) I found this by searching in other problems in this github page. Right now I don´t have time, but I can give the post link later.
The problem is that this project uses dropbox links https://github.com/Layout-Parser/layout-parser/blob/f230971f7695229ace0c2d68039e466b036101de/src/layoutparser/models/detectron2/catalog.py#L19-L24 in all places. These links has to end with "?dl=1" to be successfully downloaded so they don't have the correct filename that libraries can recognize. It's lucky that this "works" in the past but it's not meant to be supported.
To address this, this project needs to either:
- use a different service (not dropbox) that can provide download links that ends with a clean filename, OR
- implement and register a subclass of HTTPURLHandler that can download dropbox links correctly. This can be done similar to https://github.com/facebookresearch/iopath/blob/09f3bdc0468b0ad921c99b7c206ea2a560fb8ca7/iopath/common/file_io.py#L868-L909. The handler should make sure
get_local_path
returns a filename without?dl=1
and the filename exists on local disk.
I managed to fix this by tweaking the detectron2 lib. I renamed the DetectionCheckpointer's load method to something else
It is defined in ./detectron2/checkpoints/detection_checkpoint.py at line 33:
def load(self, path, *args, **kwargs):
and called in ./detectron2/engine/defaults.py at line 293:
checkpointer.load(cfg.MODEL.WEIGHTS)
I was checking for module name collisions but I'm not sure if that's what fixed it.
If you actually look at the directory where it complains about (in this case it's /home/ec2-user/.torch/iopath_cache/s/dgy9c10wykk4lq4/
), you might find a file called model_final.pth?dl=1
. Just rename the file and remove the ?dl=1
and that should do the trick. This is more of a workaround until we get a proper solution.
Update: I have submitted a PR to iopath. While it's being reviewed, you can install iopath
from my fork:
pip install -U 'git+https://github.com/nikhilweee/iopath'
Does anyone know how to resolve this issue in Google Colab?
Does anyone know how to resolve this issue in Google Colab?
@nikhilweee pip command above works for now..
quickfix- just change the model 'lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config' to something else example 'lp://PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config' and run the code; then change back it will redownload the model again
If you actually look at the directory where it complains about (in this case it's
/home/ec2-user/.torch/iopath_cache/s/dgy9c10wykk4lq4/
), you might find a file calledmodel_final.pth?dl=1
. Just rename the file and remove the?dl=1
and that should do the trick. This is more of a workaround until we get a proper solution.
I tried this, but this just downloaded another model_final.pth?dl=1
to the same location. Then when I try to load the Detectron2 model I get the following error:
File detection_checkpoint.py, line 108, in _load_file
raise ValueError(
ValueError: Unsupported query remaining: f{'dl': ['1']}, orginal filename: /Users/bill.mcneill/.torch/iopath_cache/s/dgy9c10wykk4lq4/model_final.pth?dl=1
Downgrading to detectron2 v0.5 (along with Pillow==9.5.0 to fix another issue, thanks to @gsanch170 for his fix) fixed this problem for me. Using the fixed iopath fork did not work for me, but I may have done something wrong there.
If you actually look at the directory where it complains about (in this case it's
/home/ec2-user/.torch/iopath_cache/s/dgy9c10wykk4lq4/
), you might find a file calledmodel_final.pth?dl=1
. Just rename the file and remove the?dl=1
and that should do the trick. This is more of a workaround until we get a proper solution.I tried this, but this just downloaded another
model_final.pth?dl=1
to the same location. Then when I try to load the Detectron2 model I get the following error:File detection_checkpoint.py, line 108, in _load_file raise ValueError( ValueError: Unsupported query remaining: f{'dl': ['1']}, orginal filename: /Users/bill.mcneill/.torch/iopath_cache/s/dgy9c10wykk4lq4/model_final.pth?dl=1
Were you able to fix this?
If you actually look at the directory where it complains about (in this case it's
/home/ec2-user/.torch/iopath_cache/s/dgy9c10wykk4lq4/
), you might find a file calledmodel_final.pth?dl=1
. Just rename the file and remove the?dl=1
and that should do the trick. This is more of a workaround until we get a proper solution.I tried this, but this just downloaded another
model_final.pth?dl=1
to the same location. Then when I try to load the Detectron2 model I get the following error:File detection_checkpoint.py, line 108, in _load_file raise ValueError( ValueError: Unsupported query remaining: f{'dl': ['1']}, orginal filename: /Users/bill.mcneill/.torch/iopath_cache/s/dgy9c10wykk4lq4/model_final.pth?dl=1
Were you able to fix this?
Same issue
As @ppwwyyxx state - the problem is in the file fetching from Dropbox.
On my linux machine it save the file with the query tail: model_final.pth?dl=1
.
Then, it tries to load the file from the same destination, without this tail model_final.pth
- and fail.
Hence:
AssertError: Checkpoint $HOME/.torch/iopath_cache/s/dgy9c10wykk4lq4/model_final.pth not found!
If you are working on local machine (no future auto deployments using ci/dc or so) - simply rename the files created under:
$HOME/.torch/
( if you use the UI, it is an hidden folder) by removing the ?dl=1
.
The files are under the folder s
is the same tree structure as on Dropbox.
If you into building an auto-deployment module, I suggest to write a function that parse the config_path
, download the model_final.pth
and the config.yml
into a predefined folder and override config_path
and model_path
to be your new paths.
In the following example, I override the folder to be model
, under the project tree.
It is not the most "generalised" function, but I'm sure you'll get the point.
def load_model(
config_path: str = 'lp://<dataset_name>/<model_name>/config',
):
config_path_split = config_path.split('/')
dataset_name = config_path_split[-3]
model_name = config_path_split[-2]
# get the URLs from the MODEL_CATALOG and the CONFIG_CATALOG
model_url = Detectron2LayoutModel.MODEL_CATALOG[dataset_name][model_name]
config_url = detectron2.catalog.CONFIG_CATALOG[dataset_name][model_name]
# override folder destination:
if 'model' not in os.listdir():
os.mkdir('model')
config_file_path, model_file_path = None, None
for url in [model_url, config_url]:
filename = url.split('/')[-1].split('?')[0]
save_to_path = f"model/" + filename
if 'config' in filename:
config_file_path = copy.deepcopy(save_to_path)
if 'model_final' in filename:
model_file_path = copy.deepcopy(save_to_path)
# skip if file exist in path
if filename in os.listdir("model"):
continue
# Download file from URL
r = requests.get(url, stream=True, headers={'user-agent': 'Wget/1.16 (linux-gnu)'})
with open(save_to_path, "wb") as f:
for chunk in r.iter_content(chunk_size=4096):
if chunk:
f.write(chunk)
return Detectron2LayoutModel(
config_path=config_file_path,
model_path=model_file_path,
)
As @ppwwyyxx state - the problem is in the file fetching from Dropbox. On my linux machine it save the file with the query tail:
model_final.pth?dl=1
. Then, it tries to load the file from the same destination, without this tailmodel_final.pth
- and fail. Hence:AssertError: Checkpoint $HOME/.torch/iopath_cache/s/dgy9c10wykk4lq4/model_final.pth not found!
If you are working on local machine (no future auto deployments using ci/dc or so) - simply rename the files created under:
$HOME/.torch/
( if you use the UI, it is an hidden folder) by removing the?dl=1
. The files are under the folders
is the same tree structure as on Dropbox.If you into building an auto-deployment module, I suggest to write a function that parse the
config_path
, download themodel_final.pth
and theconfig.yml
into a predefined folder and overrideconfig_path
andmodel_path
to be your new paths.In the following example, I override the folder to be
model
, under the project tree. It is not the most "generalised" function, but I'm sure you'll get the point.def load_model( config_path: str = 'lp://<dataset_name>/<model_name>/config', ): config_path_split = config_path.split('/') dataset_name = config_path_split[-3] model_name = config_path_split[-2] # get the URLs from the MODEL_CATALOG and the CONFIG_CATALOG model_url = Detectron2LayoutModel.MODEL_CATALOG[dataset_name][model_name] config_url = detectron2.catalog.CONFIG_CATALOG[dataset_name][model_name] # override folder destination: if 'model' not in os.listdir(): os.mkdir('model') config_file_path, model_file_path = None, None for url in [model_url, config_url]: filename = url.split('/')[-1].split('?')[0] save_to_path = f"model/" + filename if 'config' in filename: config_file_path = copy.deepcopy(save_to_path) if 'model_final' in filename: model_file_path = copy.deepcopy(save_to_path) # skip if file exist in path if filename in os.listdir("model"): continue # Download file from URL r = requests.get(url, stream=True, headers={'user-agent': 'Wget/1.16 (linux-gnu)'}) with open(save_to_path, "wb") as f: for chunk in r.iter_content(chunk_size=4096): if chunk: f.write(chunk) return Detectron2LayoutModel( config_path=config_file_path, model_path=model_file_path, )
AttributeError: module 'detectron2' has no attribute 'catalog'
I slightly adapted the load_model() function of @m1cha3lya1r in accessing the catalog globals. This works for me:
import layoutparser as lp
from layoutparser.models.detectron2 import catalog
import copy
import os
import requests as requests
def load_model(
config_path: str = 'lp://<dataset_name>/<model_name>/config',
):
config_path_split = config_path.split('/')
dataset_name = config_path_split[-3]
model_name = config_path_split[-2]
# get the URLs from the MODEL_CATALOG and the CONFIG_CATALOG
# (global variables .../layoutparser/models/detectron2/catalog.py)
model_url = catalog.MODEL_CATALOG[dataset_name][model_name]
config_url = catalog.CONFIG_CATALOG[dataset_name][model_name]
# override folder destination:
if 'model' not in os.listdir():
os.mkdir('model')
config_file_path, model_file_path = None, None
for url in [model_url, config_url]:
filename = url.split('/')[-1].split('?')[0]
save_to_path = f"model/" + filename
if 'config' in filename:
config_file_path = copy.deepcopy(save_to_path)
if 'model_final' in filename:
model_file_path = copy.deepcopy(save_to_path)
# skip if file exist in path
if filename in os.listdir("model"):
continue
# Download file from URL
r = requests.get(url, stream=True, headers={'user-agent': 'Wget/1.16 (linux-gnu)'})
with open(save_to_path, "wb") as f:
for chunk in r.iter_content(chunk_size=4096):
if chunk:
f.write(chunk)
# load the label map
label_map = catalog.LABEL_MAP_CATALOG[dataset_name]
return lp.models.Detectron2LayoutModel(
config_path=config_file_path,
model_path=model_file_path,
label_map=label_map
)
model = load_model('lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config')
If you actually look at the directory where it complains about (in this case it's
/home/ec2-user/.torch/iopath_cache/s/dgy9c10wykk4lq4/
), you might find a file calledmodel_final.pth?dl=1
. Just rename the file and remove the?dl=1
and that should do the trick. This is more of a workaround until we get a proper solution.I tried this, but this just downloaded another
model_final.pth?dl=1
to the same location. Then when I try to load the Detectron2 model I get the following error:File detection_checkpoint.py, line 108, in _load_file raise ValueError( ValueError: Unsupported query remaining: f{'dl': ['1']}, orginal filename: /Users/bill.mcneill/.torch/iopath_cache/s/dgy9c10wykk4lq4/model_final.pth?dl=1
Were you able to fix this?
Same issue here!
I refactored my code into a class which inherits the original Detectron2LayoutModel class.
Note, I changed the path where the model is saved to and loaded from. Now it is model
folder in the same directory it is executed from.
from layoutparser.models import Detectron2LayoutModel, detectron2
import requests
import copy
import os
class ExtractLayout(Detectron2LayoutModel):
def __init__(self,
config_path: str = 'lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config',
*args,
**kwargs
):
"""
The following modified __init__ is to solve this issue:
https://github.com/Layout-Parser/layout-parser/issues/168
:param config_path: A path to the config file
"""
config_path_split = config_path.split('/')
dataset_name = config_path_split[-3]
model_name = config_path_split[-2]
model_url = Detectron2LayoutModel.MODEL_CATALOG[dataset_name][model_name]
config_url = detectron2.catalog.CONFIG_CATALOG[dataset_name][model_name]
if 'model' not in os.listdir():
os.mkdir('model')
config_file_path, model_file_path = None, None
for url in [model_url, config_url]:
filename = url.split('/')[-1].split('?')[0]
save_to_path = f"model/" + filename
if 'config' in filename:
config_file_path = copy.deepcopy(save_to_path)
if 'model_final' in filename:
model_file_path = copy.deepcopy(save_to_path)
if filename in os.listdir("model"):
continue
r = requests.get(url, stream=True, headers={'user-agent': 'Wget/1.16 (linux-gnu)'})
with open(save_to_path, "wb") as f:
for chunk in r.iter_content(chunk_size=4096):
if chunk:
f.write(chunk)
super().__init__(
config_path=config_file_path,
model_path=model_file_path,
*args,
**kwargs
)
This way, if you have other parameters to change that are not relevant to my use, you can still call and modify them according Detectron2LayoutModel docs, within ExtractLayout class. Usage example:
# assuming the class is in a file named extract_layout.py
from .extract_layout import ExtractLayout
model = ExtractLayout(
config_path='lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config', # only this part is explicitly expected
label_map={0: "Text", 1: "Title", 2: "List", 3: "Table", 4: "Figure"},
extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.75]
)
So i didn´t solve the problem with the file name, but I find a way to work around it. If you download the model_final.pth mannually it creates a file with the correct filename. So, you can download the model_final.pth and the config.yaml file mannually and call it directly in your scrip by: model = lp.Detectron2LayoutModel( "./config.yaml", "./model_final.pth", extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8], label_map={0: "Text", 1: "Title", 2: "List", 3: "Table", 4: "Figure"}, ) I found this by searching in other problems in this github page. Right now I don´t have time, but I can give the post link later.
This is what worked for me. You can download the .pth and .yaml files to some folder in your computer and then modify the paths as in this comment. The links to the files are here: https://github.com/Layout-Parser/layout-parser/blob/main/src/layoutparser/models/detectron2/catalog.py. Also, you can use wget -O config.yaml https://...... or wget -O model_final.pth https://....... to get the names as they should.
+1