imaginaire UNIT/MUNIT training get stuck after FID computation

UNIT/MUNIT training get stuck after FID computation

Open asheroin opened this issue 3 years ago • 2 comments

Following the official settings but replaced to my own dataset, I found the the training would get stuck just after FID computing. Also, I found that if I replace the None return value of FID computing function for not-master thread (https://github.com/NVlabs/imaginaire/blob/c6f74845c699c58975fd12b778c375b72eb00e8d/imaginaire/evaluation/fid.py#L66) to a fixed float number, it would hot-fix the problem but a Long value like -1984 would not. According to the error information, it seems that there is a reducer to sum up all the FID value and return an average one.

Currently, I have let every thread to compute the FID like this:

if is_master() or True:
    fid = _calculate_frechet_distance(
        fake_act, real_act)["FID"]
    if return_act:
        return fid, real_act, fake_act
    else:
        return fid
elif return_act:
    return None, None, None
else:
    return None

Would there be some problems? By the way, my envs is py3.6+torch.1.7.0

Feb 14 '22 06:02 asheroin

Have you found the solution? I encountered the same problem.

Jun 28 '22 20:06 SecureSheII

Have you found the solution? I encountered the same problem.

Had changed torch version to 1.8.1 and it solved. Or just try the hot-fix codes above.

Jun 29 '22 03:06 asheroin

imaginaire imaginaire copied to clipboard

UNIT/MUNIT training get stuck after FID computation

imaginaire
imaginaire copied to clipboard