NielsRogge

Results 388 comments of NielsRogge

CLIP can also be used for image-text matching, by just encoding the image, encoding the text, and computing a cosine similarity score between the respective embeddings.

> With regards to testing, you should indeed add a tester. While the FeatureExctractor were replaced with ImageProcessors I think we are still using the test_feature_extraction_... We can call them...

Hi @zinengtang, could you push a commit to trigger the CI? Seems like not all tests are run and many are failing. After that, I'll assign one team member for...

@jegork really cool work, as a next step could you try to make the CI as green as possible? Currently there are many failing checks (10 failing and 9 successful)....

Sorry for the late reply here, I've assigned @amyeroberts to review the PR.

Not yet, but it would be straightforward to add. Marking this as a good first issue.

Hi @atturaioe, awesome. So in [this folder](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining), one could add a `run_mim_no_trainer.py` script, similar to the other `no_trainer.py` scripts in the examples folder.

I've seen other people reporting wrong behaviour with unusual characters as well. The logic to go from word-level labels to token-level labels is [here](https://github.com/huggingface/transformers/blob/3b309818e794cf6ff7fa79f34ea3e7b2386156da/src/transformers/models/layoutlmv3/tokenization_layoutlmv3_fast.py#L635-L660), might be worth looking at this...

Hi, thanks for replying, this issue was fixed so I'll close it. Feel free to take a look at other good first issues.