GoGoJoestar comments

Results 19 comments of


                                            GoGoJoestar

[Feature request] Support Llama3

KeyError: 'pi'

The `pi` was added to `keys` if there is a parameter ends with `vision_model.embeddings.position_ids`. The related parameter `model.vle.vision_model.vision_model.embeddings.position_ids` is a buffer registered in the class `transformers.models.clip.modeling_clip.CLIPVisionEmbeddings`, you can check if...

反向传播不带梯度

可以具体说下"`在BERT的modelling_bert 设置梯度`"是指做了什么操作吗？

反向传播不带梯度

假设图片经过vision model (ViT) 编码后的维度是`[batch_size, vision_length, hidden_size]`，其中第二维`vision_length`表示图片特征的长度，其由图片整体特征拼上每个patch的特征组成。因此`length = 1 + patch数`，以图片尺寸`384*384`、patch大小`16*16`为例，patch数量为`(384 / 16) ^ 2 = 576`，`vision_length = 1 + 576 = 577`

反向传播不带梯度

我们没有对cross attention内部做修改，如果要获取其中的query和key，可以考虑在`models/VLE/modeling_vle.py`中重写huggingface的`BertAttention`等相关代码

反向传播不带梯度

VLEModel的输出中包含了最后的视觉特征，可以参照下面的代码 ```python model = VLEModel.from_pretrained(model_name) model_outputs = model(inputs) # 最后的图像表示 model_outputs.image_embeds # 最后的文本表示 model_outputs.text_embeds ```

反向传播不带梯度

没有这样做过，可以试试在device_map中手动指定模型的各模块分配到哪张卡上。使用device_map可能会和分布式训练冲突 ```python device_map={ "vision_model": 0, "text_model": 0, "text_projection_layer": 1, "image_projection_layer": 1, "token_type_embeddings": 1, "cross_modal_image_layers": 1, "cross_modal_text_layers": 1, "cross_modal_image_pooler": 1, "cross_modal_text_pooler": 1 } ```

反向传播不带梯度

使用VLEForVQA模型的话，device_map里的模块名要调整下 ```python device_map = { "vle.vision_model": 0, "vle.text_model": 0, "vle.text_projection_layer": 1, "vle.image_projection_layer": 1, "vle.token_type_embeddings": 1, "vle.cross_modal_image_layers": 1, "vle.cross_modal_text_layers": 1, "vle.cross_modal_image_pooler": 1, "vle.cross_modal_text_pooler": 1, "vqa_classifier": 1, } ``` 在加载model时传入device_map参数 ```python model...

反向传播不带梯度

可以试试下面几个方法，能否降低显存使用 1. 调整device_map的分配，让显存更均匀 2. 使用torch.no_grad 3. 减小model和processor的image size。具体修改模型config.json的image_size和preprocessor_config.json的crop_size和size

反向传播不带梯度

1. device_map可以根据在两张卡上的实际显存占用，调整放0号卡和1号卡的模块，比如把vle.vision_model也设为1（可能需要相应修改图像输入的device）。我们没有2080ti，没法给出更具体的设置了。 2. 需要梯度的话显存占用确实会增加很多。是需要全部梯度吗？不需要梯度的部分模块有设置requires_grad=False吗 3. 在模型目录下的config.json和preprocessor_config.json中修改size后是可以运行的，具体是报什么错?