CCD icon indicating copy to clipboard operation
CCD copied to clipboard

Can you update finetune_training for selfsupervised_kmeans.

Open machoangha opened this issue 1 year ago • 2 comments

Hi,

I have fine-tuned my model using supervised mode on my custom data. However, when I switch to selfsupervised_kmeans and add the mask file, I notice that the output shapes of the data from the train_data_loader_iter.next() method are inconsistent with those from the supervised mode.

Observations:

  • Supervised Mode Output:

    • First value of item: Size: torch.Size([3, 32, 128])
    • Second value of item: Size: torch.Size([1, 25])
  • Self-Supervised KMeans Mode Output:

    • First value of item: Size: torch.Size([3, 3, 32, 128])
    • Second value of item: Size: torch.Size([32, 128])
    • Third value of item: Size: torch.Size([3, 3])

Context: The size of each mode is printed in the training script at this line: https://github.com/TongkunGuan/CCD/blob/543109a1e1d9acd15080abb3e4e72d68588ba493/train_finetune.py#L269.

Questions:

  1. In the paper, it seems to mention using Self-Supervised learning by creating 2 additional augmented images to form a batch of 3 torch.Size([3, 3, 32, 128]), where the second's size is the mask torch.Size([32, 128]) and the final is the affine matrix torch.Size([3, 3]). Therefore, I believe this is not compatible with the current training script.
  2. Could you please provide the fine-tuning code for selfsupervised mode?

Thank you!

machoangha avatar Oct 31 '24 04:10 machoangha

Hi,

I have fine-tuned my model using supervised mode on my custom data. However, when I switch to selfsupervised_kmeans and add the mask file, I notice that the output shapes of the data from the train_data_loader_iter.next() method are inconsistent with those from the supervised mode.

Observations:

  • Supervised Mode Output:

    • First value of item: Size: torch.Size([3, 32, 128])
    • Second value of item: Size: torch.Size([1, 25])
  • Self-Supervised KMeans Mode Output:

    • First value of item: Size: torch.Size([3, 3, 32, 128])
    • Second value of item: Size: torch.Size([32, 128])
    • Third value of item: Size: torch.Size([3, 3])

Context: The size of each mode is printed in the training script at this line:

https://github.com/TongkunGuan/CCD/blob/543109a1e1d9acd15080abb3e4e72d68588ba493/train_finetune.py#L269

. Questions:

  1. In the paper, it seems to mention using Self-Supervised learning by creating 2 additional augmented images to form a batch of 3 torch.Size([3, 3, 32, 128]), where the second's size is the mask torch.Size([32, 128]) and the final is the affine matrix torch.Size([3, 3]). Therefore, I believe this is not compatible with the current training script.
  2. Could you please provide the fine-tuning code for selfsupervised mode?

Thank you!

We used only the supervised mode in the fine-tuning file, you can modify this file to suit your needs.

TongkunGuan avatar Oct 31 '24 11:10 TongkunGuan

Hi, I have fine-tuned my model using supervised mode on my custom data. However, when I switch to selfsupervised_kmeans and add the mask file, I notice that the output shapes of the data from the train_data_loader_iter.next() method are inconsistent with those from the supervised mode. Observations:

  • Supervised Mode Output:

    • First value of item: Size: torch.Size([3, 32, 128])
    • Second value of item: Size: torch.Size([1, 25])
  • Self-Supervised KMeans Mode Output:

    • First value of item: Size: torch.Size([3, 3, 32, 128])
    • Second value of item: Size: torch.Size([32, 128])
    • Third value of item: Size: torch.Size([3, 3])

Context: The size of each mode is printed in the training script at this line: https://github.com/TongkunGuan/CCD/blob/543109a1e1d9acd15080abb3e4e72d68588ba493/train_finetune.py#L269

. Questions:

  1. In the paper, it seems to mention using Self-Supervised learning by creating 2 additional augmented images to form a batch of 3 torch.Size([3, 3, 32, 128]), where the second's size is the mask torch.Size([32, 128]) and the final is the affine matrix torch.Size([3, 3]). Therefore, I believe this is not compatible with the current training script.
  2. Could you please provide the fine-tuning code for selfsupervised mode?

Thank you!

We used only the supervised mode in the fine-tuning file, you can modify this file to suit your needs.

Hi,

I would like to confirm if my understanding is correct: During the pretraining phase of CCD, the model uses the self-supervised mode. However, when fine-tuning the model for a specific task like text recognition, you switches to using the supervised mode. Is that correct?

Thank you!

machoangha avatar Nov 04 '24 17:11 machoangha