DANN_py3 icon indicating copy to clipboard operation
DANN_py3 copied to clipboard

Size of target domain

Open jayanthsiddamsetty opened this issue 3 years ago • 1 comments

I have a training set with 50k source images and 1k target images. Is DANN a good approach for this use case? If not, what is your recommendation?

jayanthsiddamsetty avatar Feb 04 '22 15:02 jayanthsiddamsetty

It's decieded not only by the number, but also the similarity between the source and target images, the bigger differences, the more data needed.

fungtion avatar May 05 '22 01:05 fungtion

All works I've seen applying this UDA technique considers more or less the same number of data in source and target domains. Now, I am working on a project in which the number of source images is much greater than the target. I am not sure if this is a problem, though.

The only thing is that, by setting num_batches = min(len(train_loader), len(target_loader)) and looping over num_batches as:

for epoch in range(NUM_EPOCHS):
    for batch_index in range(num_batches):
        # forward
        # backward

it would require many "epochs" (maybe calling it "iteration" would be better) to go though the entire training set.

I think it is possible to loop over the entire training set (i.e., num_batches = len(train_loader)), but force the target data to repeat itself multiple times for a given "epoch". To do that, you can use the cycle function from itertools like target_loader = cycle(iter(target_loader)). Then, you could use some data augmentation technique to go around the problem of repeating the target data. Does is make sense?

rfmiotto avatar Sep 21 '22 20:09 rfmiotto

Does is make sense?

Yes, thanks!

jayanthsiddamsetty avatar Sep 21 '22 20:09 jayanthsiddamsetty