CCD Can you update finetune_training for selfsupervised

Hi,

I have fine-tuned my model using supervised mode on my custom data. However, when I switch to selfsupervised_kmeans and add the mask file, I notice that the output shapes of the data from the train_data_loader_iter.next() method are inconsistent with those from the supervised mode.

Observations:

Supervised Mode Output:
- First value of item: Size: torch.Size([3, 32, 128])
- Second value of item: Size: torch.Size([1, 25])
Self-Supervised KMeans Mode Output:
- First value of item: Size: torch.Size([3, 3, 32, 128])
- Second value of item: Size: torch.Size([32, 128])
- Third value of item: Size: torch.Size([3, 3])

Context: The size of each mode is printed in the training script at this line: https://github.com/TongkunGuan/CCD/blob/543109a1e1d9acd15080abb3e4e72d68588ba493/train_finetune.py#L269.

Questions:

In the paper, it seems to mention using Self-Supervised learning by creating 2 additional augmented images to form a batch of 3 torch.Size([3, 3, 32, 128]), where the second's size is the mask torch.Size([32, 128]) and the final is the affine matrix torch.Size([3, 3]). Therefore, I believe this is not compatible with the current training script.
Could you please provide the fine-tuning code for selfsupervised mode?

Thank you!

Oct 31 '24 04:10 machoangha

Hi,

I have fine-tuned my model using supervised mode on my custom data. However, when I switch to selfsupervised_kmeans and add the mask file, I notice that the output shapes of the data from the train_data_loader_iter.next() method are inconsistent with those from the supervised mode.

Observations:

Supervised Mode Output:

First value of item: Size: torch.Size([3, 32, 128])

Second value of item: Size: torch.Size([1, 25])

Self-Supervised KMeans Mode Output:

First value of item: Size: torch.Size([3, 3, 32, 128])

Second value of item: Size: torch.Size([32, 128])

Third value of item: Size: torch.Size([3, 3])

Context: The size of each mode is printed in the training script at this line:

https://github.com/TongkunGuan/CCD/blob/543109a1e1d9acd15080abb3e4e72d68588ba493/train_finetune.py#L269

. Questions:

In the paper, it seems to mention using Self-Supervised learning by creating 2 additional augmented images to form a batch of 3 torch.Size([3, 3, 32, 128]), where the second's size is the mask torch.Size([32, 128]) and the final is the affine matrix torch.Size([3, 3]). Therefore, I believe this is not compatible with the current training script.

Could you please provide the fine-tuning code for selfsupervised mode?

Thank you!

We used only the supervised mode in the fine-tuning file, you can modify this file to suit your needs.

Oct 31 '24 11:10 TongkunGuan

Hi, I have fine-tuned my model using supervised mode on my custom data. However, when I switch to selfsupervised_kmeans and add the mask file, I notice that the output shapes of the data from the train_data_loader_iter.next() method are inconsistent with those from the supervised mode. Observations:

Supervised Mode Output:

First value of item: Size: torch.Size([3, 32, 128])

Second value of item: Size: torch.Size([1, 25])

Self-Supervised KMeans Mode Output:

First value of item: Size: torch.Size([3, 3, 32, 128])

Second value of item: Size: torch.Size([32, 128])

Third value of item: Size: torch.Size([3, 3])

Context: The size of each mode is printed in the training script at this line: https://github.com/TongkunGuan/CCD/blob/543109a1e1d9acd15080abb3e4e72d68588ba493/train_finetune.py#L269

. Questions:

In the paper, it seems to mention using Self-Supervised learning by creating 2 additional augmented images to form a batch of 3 torch.Size([3, 3, 32, 128]), where the second's size is the mask torch.Size([32, 128]) and the final is the affine matrix torch.Size([3, 3]). Therefore, I believe this is not compatible with the current training script.

Could you please provide the fine-tuning code for selfsupervised mode?

Thank you!

We used only the supervised mode in the fine-tuning file, you can modify this file to suit your needs.

Hi,

I would like to confirm if my understanding is correct: During the pretraining phase of CCD, the model uses the self-supervised mode. However, when fine-tuning the model for a specific task like text recognition, you switches to using the supervised mode. Is that correct?

Thank you!

Nov 04 '24 17:11 machoangha

Can you update finetune_training for selfsupervised_kmeans.