Grounded-Segment-Anything icon indicating copy to clipboard operation
Grounded-Segment-Anything copied to clipboard

请问,离线运行,配置文件需要修改哪些地方?需要下载哪些文件?

Open xiaowenhe opened this issue 1 year ago • 18 comments

请问,离线运行,配置文件需要修改哪些地方?需要下载config.json等文件,放在哪? 在没有联网的服务器上运行,报 OSError: We couldn't connect to https://huggingface.co to load this file, couldn't find it in che cached files and it looks like bert-base-uncased is not the path to a directory containing a file named config.json.

谢谢!

xiaowenhe avatar Apr 11 '23 11:04 xiaowenhe

您看一下代码里的接口,代码可以直接指定config和预训练模型路径,手动下载并放到合适路径即可。

SlongLiu avatar Apr 11 '23 11:04 SlongLiu

不是很清楚这个整体的流程。可以明确下代码如何直接指定config和预训练模型路径吗?谢谢!

xiaowenhe avatar Apr 11 '23 12:04 xiaowenhe

grounded_sam_demo.py为例, 使用--grounded_checkpoint 控制预训练模型路径,--config控制config路径

SlongLiu avatar Apr 11 '23 14:04 SlongLiu

不好意思,可能没表达清楚,不是如何指定config运行代码。我遇到的问题是,在运行代码的中间过程中,GroundingDINO/groundingdino/util/get_tokenlizer.py 第17行,tokenizer = AutoTokenizer.from_pretrained(text_encoder_type),从这个地方再往下运行,需要联网下载一些文件,也就是我一开始问的(报错提示需要下载config.json等文件,放在哪? 在没有联网的服务器上运行,报 OSError: We couldn't connect to https://huggingface.co to load this file, couldn't find it in che cached files and it looks like bert-base-uncased is not the path to a directory containing a file named config.json.) 谢谢!

xiaowenhe avatar Apr 12 '23 01:04 xiaowenhe

You can first run the hugginface-related code to download the packages, which often likes xx. from_pretrained()

Andy1621 avatar Apr 12 '23 02:04 Andy1621

不好意思,可能没表达清楚,不是如何指定config运行代码。我遇到的问题是,在运行代码的中间过程中,GroundingDINO/groundingdino/util/get_tokenlizer.py 第17行,tokenizer = AutoTokenizer.from_pretrained(text_encoder_type),从这个地方再往下运行,需要联网下载一些文件,也就是我一开始问的(报错提示需要下载config.json等文件,放在哪? 在没有联网的服务器上运行,报 OSError: We couldn't connect to https://huggingface.co to load this file, couldn't find it in che cached files and it looks like bert-base-uncased is not the path to a directory containing a file named config.json.) 谢谢!

确实,我也发现了,由于每次请求都要请求远程的huggingface容易出现连接失败的错误。我下载了 需要的模型文件,还没搞清楚要怎么改成加载本地的文件。

LeanFly avatar Apr 12 '23 07:04 LeanFly

It will load the downloaded model automatically after you download it.

Andy1621 avatar Apr 12 '23 09:04 Andy1621

Maybe you need this link to download the whole model by installing huggingface_hub, then replace the input param in from_pretrained with your local model directory.

Zalberth avatar Apr 12 '23 13:04 Zalberth

Here is my workaround to run the model without connecting to huggingface:

  • Step 1: download necessary files listed in huggingface-bert-base-uncased, including config.json, flax_model.msgpack, pytorch_model.bin, tf_model.h5, tokenizer.json, tokenizer_config.json, vocab.txt
  • Step 2: put downloaded files (Step 1) into your local folder. For example, the local folder could be Grounded-Segment-Anything/huggingface/bert-base-uncased
  • Step 3: modify text_encoder_type in get_tokenlizer.py#L17 and get_tokenlizer.py#L23 to your local folder (defined in Step 2)
  • Step 4: run the model and enjoy it

cxliu0 avatar Apr 14 '23 09:04 cxliu0

Here is my workaround to run the model without connecting to huggingface:

  • Step 1: download necessary files listed in huggingface-bert-base-uncased, including config.json, flax_model.msgpack, pytorch_model.bin, tf_model.h5, tokenizer.json, tokenizer_config.json, vocab.txt
  • Step 2: put downloaded files (Step 1) into your local folder. For example, the local folder could be Grounded-Segment-Anything/huggingface/bert-base-uncased
  • Step 3: modify text_encoder_type in get_tokenlizer.py#L17 and get_tokenlizer.py#L23 to your local folder (defined in Step 2)
  • Step 4: run the model and enjoy it

好麻烦。。。主要是最后这个要改源代码。。。。。那后面怎么保持和master的同步(虽然后面改动的可能性不大

levylll avatar May 08 '23 09:05 levylll

@SlongLiu 小哥,这里可以加一下这个配置么?给一个初始化时透传目录的机会

levylll avatar May 08 '23 10:05 levylll

定位 load_model_hf 这个方法,写一个自己喜欢的调试语句,打印一下 cache_file 在本地的路径,将模型文件拷贝到某个地方,然后注释掉 cache_file 这一行,将 cache_file 指定到本地路径

image

LeanFly avatar May 10 '23 02:05 LeanFly

Here is my workaround to run the model without connecting to huggingface:

  • Step 1: download necessary files listed in huggingface-bert-base-uncased, including config.json, flax_model.msgpack, pytorch_model.bin, tf_model.h5, tokenizer.json, tokenizer_config.json, vocab.txt
  • Step 2: put downloaded files (Step 1) into your local folder. For example, the local folder could be Grounded-Segment-Anything/huggingface/bert-base-uncased
  • Step 3: modify text_encoder_type in get_tokenlizer.py#L17 and get_tokenlizer.py#L23 to your local folder (defined in Step 2)
  • Step 4: run the model and enjoy it

We will highlight it in our issue! Thanks for your solution, we will refine the code in the future release

rentainhe avatar May 31 '23 07:05 rentainhe

Here is my workaround to run the model without connecting to huggingface:

  • Step 1: download necessary files listed in huggingface-bert-base-uncased, including config.json, flax_model.msgpack, pytorch_model.bin, tf_model.h5, tokenizer.json, tokenizer_config.json, vocab.txt
  • Step 2: put downloaded files (Step 1) into your local folder. For example, the local folder could be Grounded-Segment-Anything/huggingface/bert-base-uncased
  • Step 3: modify text_encoder_type in get_tokenlizer.py#L17 and get_tokenlizer.py#L23 to your local folder (defined in Step 2)
  • Step 4: run the model and enjoy it

good job! Thanks

zzh805780186 avatar Jun 16 '23 01:06 zzh805780186

Here is my modified code: from transformers import AutoTokenizer, BertModel, RobertaModel, RobertaTokenizerFast, BertTokenizer

def get_tokenlizer(text_encoder_type): if not isinstance(text_encoder_type, str): if hasattr(text_encoder_type, "text_encoder_type"): text_encoder_type = text_encoder_type.text_encoder_type elif text_encoder_type.get("text_encoder_type", False): text_encoder_type = text_encoder_type.get("text_encoder_type") else: raise ValueError( "Unknown type of text_encoder_type: {}".format(type(text_encoder_type)) ) print("final text_encoder_type: {}".format(text_encoder_type))

tokenizer_path = "Grounded-Segment-Anything/huggingface/bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(tokenizer_path, use_fast=False)
return tokenizer

def get_pretrained_language_model(text_encoder_type): if text_encoder_type == "bert-base-uncased": model_path = "Grounded-Segment-Anything/huggingface/bert-base-uncased/pytorch_model.bin" return BertModel.from_pretrained(model_path) if text_encoder_type == "roberta-base": return RobertaModel.from_pretrained(text_encoder_type) raise ValueError("Unknown text_encoder_type {}".format(text_encoder_type))

But I still get an error: (gsa) D:\forwork\Grounded-Segment-Anything>python grounded_sam_demo.py --config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py --grounded_checkpoint groundingdino_swint_ogc.pth --sam_checkpoint sam_vit_h_4b8939.pth --input_image assets/demo1.jpg --output_dir "outputs" --box_threshold 0.3 --text_threshold 0.25 --text_prompt "bear" --device "cuda" D:\Anaconda3\envs\gsa\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:3191.) return VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] final text_encoder_type: bert-base-uncased Traceback (most recent call last): File "grounded_sam_demo.py", line 181, in model = load_model(config_file, grounded_checkpoint, device=device) File "grounded_sam_demo.py", line 46, in load_model model = build_model(args) File "D:\forwork\Grounded-Segment-Anything\GroundingDINO\groundingdino\models_init.py", line 17, in build_model model = build_func(args) File "D:\forwork\Grounded-Segment-Anything\GroundingDINO\groundingdino\models\GroundingDINO\groundingdino.py", line 372, in build_groundingdino model = GroundingDINO( File "D:\forwork\Grounded-Segment-Anything\GroundingDINO\groundingdino\models\GroundingDINO\groundingdino.py", line 107, in init self.tokenizer = get_tokenlizer.get_tokenlizer(text_encoder_type) File "d:\forwork\grounded-segment-anything\groundingdino\groundingdino\util\get_tokenlizer.py", line 45, in get_tokenlizer tokenizer = BertTokenizer.from_pretrained(tokenizer_path, use_fast=False) File "D:\Anaconda3\envs\gsa\lib\site-packages\transformers\tokenization_utils_base.py", line 1654, in from_pretrained fast_tokenizer_file = get_fast_tokenizer_file( File "D:\Anaconda3\envs\gsa\lib\site-packages\transformers\tokenization_utils_base.py", line 3486, in get_fast_tokenizer_file all_files = get_list_of_files( File "D:\Anaconda3\envs\gsa\lib\site-packages\transformers\file_utils.py", line 2103, in get_list_of_files return list_repo_files(path_or_repo, revision=revision, token=token) File "D:\Anaconda3\envs\gsa\lib\site-packages\huggingface_hub\utils_deprecation.py", line 103, in inner_f return f(*args, **kwargs) File "D:\Anaconda3\envs\gsa\lib\site-packages\huggingface_hub\utils_validators.py", line 110, in _inner_fn validate_repo_id(arg_value) File "D:\Anaconda3\envs\gsa\lib\site-packages\huggingface_hub\utils_validators.py", line 158, in validate_repo_id raise HFValidationError( huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'Grounded-Segment-Anything/huggingface/bert-base-uncased'. Use repo_type argument if needed.

Is there any good solution?

nomoneyExpection avatar Jul 07 '23 02:07 nomoneyExpection

export http_proxy="http://192.168.30.127:4780" export https_proxy="http://192.168.30.127:4780"

littleanapple avatar Oct 29 '23 06:10 littleanapple

set proxy in docker

littleanapple avatar Oct 29 '23 07:10 littleanapple