Can you update finetune_training for selfsupervised_kmeans.
Hi,
I have fine-tuned my model using supervised mode on my custom data. However, when I switch to selfsupervised_kmeans and add the mask file, I notice that the output shapes of the data from the train_data_loader_iter.next() method are inconsistent with those from the supervised mode.
Observations:
-
Supervised Mode Output:
- First value of item: Size:
torch.Size([3, 32, 128]) - Second value of item: Size:
torch.Size([1, 25])
- First value of item: Size:
-
Self-Supervised KMeans Mode Output:
- First value of item: Size:
torch.Size([3, 3, 32, 128]) - Second value of item: Size:
torch.Size([32, 128]) - Third value of item: Size:
torch.Size([3, 3])
- First value of item: Size:
Context: The size of each mode is printed in the training script at this line: https://github.com/TongkunGuan/CCD/blob/543109a1e1d9acd15080abb3e4e72d68588ba493/train_finetune.py#L269.
Questions:
- In the paper, it seems to mention using Self-Supervised learning by creating 2 additional augmented images to form a batch of 3
torch.Size([3, 3, 32, 128]), where the second's size is the masktorch.Size([32, 128])and the final is the affine matrixtorch.Size([3, 3]). Therefore, I believe this is not compatible with the current training script. - Could you please provide the fine-tuning code for
selfsupervisedmode?
Thank you!
Hi,
I have fine-tuned my model using
supervisedmode on my custom data. However, when I switch toselfsupervised_kmeansand add the mask file, I notice that the output shapes of the data from thetrain_data_loader_iter.next()method are inconsistent with those from the supervised mode.Observations:
Supervised Mode Output:
- First value of item: Size:
torch.Size([3, 32, 128])- Second value of item: Size:
torch.Size([1, 25])Self-Supervised KMeans Mode Output:
- First value of item: Size:
torch.Size([3, 3, 32, 128])- Second value of item: Size:
torch.Size([32, 128])- Third value of item: Size:
torch.Size([3, 3])Context: The size of each mode is printed in the training script at this line:
https://github.com/TongkunGuan/CCD/blob/543109a1e1d9acd15080abb3e4e72d68588ba493/train_finetune.py#L269
. Questions:
- In the paper, it seems to mention using Self-Supervised learning by creating 2 additional augmented images to form a batch of 3
torch.Size([3, 3, 32, 128]), where the second's size is the masktorch.Size([32, 128])and the final is the affine matrixtorch.Size([3, 3]). Therefore, I believe this is not compatible with the current training script.- Could you please provide the fine-tuning code for
selfsupervisedmode?Thank you!
We used only the supervised mode in the fine-tuning file, you can modify this file to suit your needs.
Hi, I have fine-tuned my model using
supervisedmode on my custom data. However, when I switch toselfsupervised_kmeansand add the mask file, I notice that the output shapes of the data from thetrain_data_loader_iter.next()method are inconsistent with those from the supervised mode. Observations:
Supervised Mode Output:
- First value of item: Size:
torch.Size([3, 32, 128])- Second value of item: Size:
torch.Size([1, 25])Self-Supervised KMeans Mode Output:
- First value of item: Size:
torch.Size([3, 3, 32, 128])- Second value of item: Size:
torch.Size([32, 128])- Third value of item: Size:
torch.Size([3, 3])Context: The size of each mode is printed in the training script at this line: https://github.com/TongkunGuan/CCD/blob/543109a1e1d9acd15080abb3e4e72d68588ba493/train_finetune.py#L269
. Questions:
- In the paper, it seems to mention using Self-Supervised learning by creating 2 additional augmented images to form a batch of 3
torch.Size([3, 3, 32, 128]), where the second's size is the masktorch.Size([32, 128])and the final is the affine matrixtorch.Size([3, 3]). Therefore, I believe this is not compatible with the current training script.- Could you please provide the fine-tuning code for
selfsupervisedmode?Thank you!
We used only the supervised mode in the fine-tuning file, you can modify this file to suit your needs.
Hi,
I would like to confirm if my understanding is correct: During the pretraining phase of CCD, the model uses the self-supervised mode. However, when fine-tuning the model for a specific task like text recognition, you switches to using the supervised mode. Is that correct?
Thank you!