SpCL icon indicating copy to clipboard operation
SpCL copied to clipboard

Error occurs while uda target dataset is too big

Open darcyzhc opened this issue 3 years ago • 8 comments

While UDA training,my target dataset training split has 79188 images, and my super computer memory is 128G, but it failed, the error info is Pytorch RuntimeError: [enforce fail at CPUAllocator.cpp:56] posix_memalign(&data, gAlignment, nbytes) == 0. 12 vs 0. Because the memory is not enough. So I find the most memory consumed code part, then rewrite them by numpy operation instead of torch.Tensor operation, and divide the whole operation into several parts. In spcl_train_uda.py :

        # select & cluster images as training set of this epochs
        pseudo_labels = cluster.fit_predict(rerank_dist) # shape: num_imgs
        pseudo_labels_tight = cluster_tight.fit_predict(rerank_dist)
        pseudo_labels_loose = cluster_loose.fit_predict(rerank_dist)
        num_ids = len(set(pseudo_labels)) - (1 if -1 in pseudo_labels else 0)
        num_ids_tight = len(set(pseudo_labels_tight)) - (1 if -1 in pseudo_labels_tight else 0)
        num_ids_loose = len(set(pseudo_labels_loose)) - (1 if -1 in pseudo_labels_loose else 0)

        # generate new dataset and calculate cluster centers
        def generate_pseudo_labels(cluster_id, num):
            labels = []
            outliers = 0
            for i, ((fname, _, cid), id) in enumerate(zip(sorted(dataset_target.train), cluster_id)):
                if id!=-1:
                    labels.append(source_classes+id)
                else:
                    labels.append(source_classes+num+outliers)
                    outliers += 1
            return torch.Tensor(labels).long()

        pseudo_labels = generate_pseudo_labels(pseudo_labels, num_ids)
        pseudo_labels_tight = generate_pseudo_labels(pseudo_labels_tight, num_ids_tight)
        pseudo_labels_loose = generate_pseudo_labels(pseudo_labels_loose, num_ids_loose)
# above code is not changed

        # compute_R_old is the old method to compute R_comp and R_indep
        def compute_R_old(pseudo_labels, pseudo_labels_tight, pseudo_labels_loose):
            def convert2tensor(label):
                if(isinstance(label, torch.Tensor)):
                    return label
                else:
                    return torch.from_numpy(label)
            pseudo_labels = convert2tensor(pseudo_labels)
            pseudo_labels_tight = convert2tensor(pseudo_labels_tight)
            pseudo_labels_loose = convert2tensor(pseudo_labels_loose)
            # compute R_indep and R_comp
            N = pseudo_labels.size(0)
            label_sim = pseudo_labels.expand(N, N).eq(pseudo_labels.expand(N, N).t()).float() # shape:[num_imgs, num_imgs]
            label_sim_tight = pseudo_labels_tight.expand(N, N).eq(pseudo_labels_tight.expand(N, N).t()).float()
            label_sim_loose = pseudo_labels_loose.expand(N, N).eq(pseudo_labels_loose.expand(N, N).t()).float()

            R_comp = 1-torch.min(label_sim, label_sim_tight).sum(-1)/torch.max(label_sim, label_sim_tight).sum(-1) # shape: num_imgs
            R_indep = 1-torch.min(label_sim, label_sim_loose).sum(-1)/torch.max(label_sim, label_sim_loose).sum(-1)
            assert((R_comp.min()>=0) and (R_comp.max()<=1))
            assert((R_indep.min()>=0) and (R_indep.max()<=1))
            return R_comp, R_indep

        # compute_R_divide is my divided method to compute R_comp and R_indep
        def compute_R_divide(pseudo_labels, pseudo_labels_tight, pseudo_labels_loose):
            # compute in divided numpy to avoid error '[enforce fail at CPUAllocator.cpp:56]'
            def convert2numpy(label):
                if(isinstance(label, np.ndarray)):
                    return label
                else:
                    return label.numpy().astype(np.int32) # if is torch.Tensor
            def get_sub_label_sim(label_np, start, end):
                label_sim = np.expand_dims(label_np, 0).repeat(end - start, axis=0)
                label_sim_T = np.expand_dims(label_np[start:end], 0).repeat(len(label_np), axis=0).T
                return (label_sim == label_sim_T).astype(np.int32)

            pseudo_labels = convert2numpy(pseudo_labels)
            pseudo_labels_tight = convert2numpy(pseudo_labels_tight)
            pseudo_labels_loose = convert2numpy(pseudo_labels_loose)
            N = pseudo_labels.shape[0]
            divide_base = 15000  # this factor is determined intuitively
            divide = max(int((N/divide_base) * (N/divide_base)), 1)
            num_each = math.ceil(N / divide)
            for i in range(divide):
                start = i*num_each
                end = min((i+1)*num_each, N)
                label_sim_np = get_sub_label_sim(pseudo_labels, start, end)
                label_sim_tight_np = get_sub_label_sim(pseudo_labels_tight, start, end)
                label_sim_loose_np = get_sub_label_sim(pseudo_labels_loose, start, end)
                R_comp_np = 1 - (label_sim_np & label_sim_tight_np).sum(-1)/(label_sim_np | label_sim_tight_np).sum(-1)
                R_indep_np = 1 - (label_sim_np & label_sim_loose_np).sum(-1) / (label_sim_np | label_sim_loose_np).sum(-1)
                if(i==0):
                    R_COMP_np = R_comp_np
                    R_INDEP_np = R_indep_np
                else:
                    R_COMP_np = np.concatenate((R_COMP_np, R_comp_np), axis=-1)
                    R_INDEP_np = np.concatenate((R_INDEP_np, R_comp_np), axis=-1)

            R_comp = torch.from_numpy(R_COMP_np).float()
            R_indep = torch.from_numpy(R_INDEP_np).float()
            assert ((R_comp.min() >= 0) and (R_comp.max() <= 1))
            assert ((R_indep.min() >= 0) and (R_indep.max() <= 1))
            return R_comp, R_indep

        R_comp, R_indep = compute_R_divide(pseudo_labels, pseudo_labels_tight, pseudo_labels_loose)


This is just a workaround to tackle this problem, if there is better solution, it is expected.

darcyzhc avatar Jul 09 '20 06:07 darcyzhc

Have you tried to use rerank_dist = compute_jaccard_distance(target_features, k1=args.k1, k2=args.k2, search_option=3, use_float16=True) in https://github.com/yxgeee/SpCL/blob/master/examples/spcl_train_uda.py#L175 ? (search_option=3 could save GPU memory by using faiss_cpu, use_float16=True could save CPU memory by using numpy.float16)

yxgeee avatar Jul 09 '20 06:07 yxgeee

"rerank_dist = compute_jaccard_distance(target_features, k1=args.k1, k2=args.k2, search_option=3, use_float16=True" It's a pity that it does not work. A server with 384G memory is still facing the pointed error when the training set is very big.

WangWenhao0716 avatar Sep 14 '20 05:09 WangWenhao0716

"rerank_dist = compute_jaccard_distance(target_features, k1=args.k1, k2=args.k2, search_option=3, use_float16=True" It's a pity that it does not work. A server with 384G memory is still facing the pointed error when the training set is very big.

@WangWenhao0716 Yes, it would be out of memory when training with large-scale datasets. I am still work on solving this problem. Also welcome to provide any useful suggestions!

yxgeee avatar Sep 16 '20 06:09 yxgeee

Thanks for your reply~

发自我的iPhone

------------------ Original ------------------ From: Yixiao Ge <[email protected]> Date: Wed,Sep 16,2020 2:49 PM To: yxgeee/SpCL <[email protected]> Cc: WenhaoWang <[email protected]>, Mention <[email protected]> Subject: Re: [yxgeee/SpCL] Error occurs while uda target dataset is too big (#4)

"rerank_dist = compute_jaccard_distance(target_features, k1=args.k1, k2=args.k2, search_option=3, use_float16=True" It's a pity that it does not work. A server with 384G memory is still facing the pointed error when the training set is very big.

@WangWenhao0716 Yes, it would be out of memory when training with large-scale datasets. I am still work on solving this problem. Also welcome to provide any useful suggestions!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

WangWenhao0716 avatar Sep 16 '20 06:09 WangWenhao0716

I also run out of memory because of DBSCAN high memory usage. It is a pity that DBSCAN limits the spread of SpCL. Look forward to any goods news.

Jie2World avatar Dec 08 '20 08:12 Jie2World

One potential solution is to use cuml(https://github.com/rapidsai/cuml) instead of sklearn package (I have tried to cluster over 1 million data with cuml.). However, cuml does not support the pre-computed affinity matrix, which means that it is hard to use Jaccard distance with cuml (Euclidean distance is used in default). So it is recommended to use adaptive eps hyper-parameter if using the DBSCAN from cuml. @Jie2World @darcyzhc @WangWenhao0716

Example:

import cudf
from cuml.cluster import DBSCAN
from torch.utils.dlpack import from_dlpack, to_dlpack

# I use a toy example with eps=0.2 here, however, it should be self-adaptive in current training epoch
# e.g. the mean of top-k affinities.
cluster_alg = DBSCAN(eps=0.2, min_samples=4, output_type='numpy')
cluster_alg.fit(cudf.from_dlpack(to_dlpack(memory.features)))
pseudo_labels = cluster_alg.labels_.tolist()

yxgeee avatar Dec 08 '20 08:12 yxgeee

Thanks. cuml DBSCAN maybe will support pre-computed affinity metric(https://github.com/rapidsai/cuml/issues/3302). Hope it help for you.

Jie2World avatar Dec 14 '20 09:12 Jie2World

Hahaha, this issue is given by myself.

WangWenhao0716 avatar Dec 14 '20 09:12 WangWenhao0716