Soonhwan-Kwon

Results 12 comments of Soonhwan-Kwon

> You can try the conversion script in the `tf_to_pytorch` directory. It seems that leads to error that it can't find the missing keys

@authman @chenshen03 I finally make it work with tensorflow==1.14.0, and it works perfectly Thank you

Thank you for finding this error, we had no time to confirm it until now, you helps our project very much, thank you! I can also confirmed it today, again...

I'm also interested in deberta-mt implementation, and there is grey area for unilm implementation for example how we can implement disentangled attention, how did author dealt with relative position bias...

I encountered the same situation in customized model, and it makes me feel stuck, because when you turn on apex amp as fp16 backend then you can't use zero.

![panda](https://user-images.githubusercontent.com/7395166/209418331-d9186064-f279-4f67-8cca-639fa85580ab.jpg) test result 'giant panda , chengdu , china '

![1](https://user-images.githubusercontent.com/7395166/209422564-01e68586-e213-47dc-84db-5391ca6c5737.PNG) ![2](https://user-images.githubusercontent.com/7395166/209422565-3c09c5f6-1a9e-4998-98fd-86057cd8d4b1.PNG) ![3](https://user-images.githubusercontent.com/7395166/209422566-290ec694-0e33-4702-a790-15c6d2c16c11.PNG) ![4](https://user-images.githubusercontent.com/7395166/209422569-b98e26a2-ac55-499d-a0f5-d60853bd6a62.PNG)

some are good but some are bad, and it needs to be fine-tuned with COCO dataset as the CoCa paper for better result. and I'm evaluating the scores on COCO...

It is much slower implementation because it is w/o past_key_values but I expect it to be much more faster w/ past_key_values. I wanted to move on step by step, because...

> It would be really cool if you could make finetuning called for plugging image embeddings into the coca text decoder and train only the decoder :) It sounds very...