SpCL Error occurs while uda target dataset is too big

While UDA training，my target dataset training split has 79188 images, and my super computer memory is 128G, but it failed, the error info is Pytorch RuntimeError: [enforce fail at CPUAllocator.cpp:56] posix_memalign(&data, gAlignment, nbytes) == 0. 12 vs 0. Because the memory is not enough. So I find the most memory consumed code part, then rewrite them by numpy operation instead of torch.Tensor operation, and divide the whole operation into several parts. In spcl_train_uda.py :

        # select & cluster images as training set of this epochs
        pseudo_labels = cluster.fit_predict(rerank_dist) # shape: num_imgs
        pseudo_labels_tight = cluster_tight.fit_predict(rerank_dist)
        pseudo_labels_loose = cluster_loose.fit_predict(rerank_dist)
        num_ids = len(set(pseudo_labels)) - (1 if -1 in pseudo_labels else 0)
        num_ids_tight = len(set(pseudo_labels_tight)) - (1 if -1 in pseudo_labels_tight else 0)
        num_ids_loose = len(set(pseudo_labels_loose)) - (1 if -1 in pseudo_labels_loose else 0)

        # generate new dataset and calculate cluster centers
        def generate_pseudo_labels(cluster_id, num):
            labels = []
            outliers = 0
            for i, ((fname, _, cid), id) in enumerate(zip(sorted(dataset_target.train), cluster_id)):
                if id!=-1:
                    labels.append(source_classes+id)
                else:
                    labels.append(source_classes+num+outliers)
                    outliers += 1
            return torch.Tensor(labels).long()

        pseudo_labels = generate_pseudo_labels(pseudo_labels, num_ids)
        pseudo_labels_tight = generate_pseudo_labels(pseudo_labels_tight, num_ids_tight)
        pseudo_labels_loose = generate_pseudo_labels(pseudo_labels_loose, num_ids_loose)
# above code is not changed

        # compute_R_old is the old method to compute R_comp and R_indep
        def compute_R_old(pseudo_labels, pseudo_labels_tight, pseudo_labels_loose):
            def convert2tensor(label):
                if(isinstance(label, torch.Tensor)):
                    return label
                else:
                    return torch.from_numpy(label)
            pseudo_labels = convert2tensor(pseudo_labels)
            pseudo_labels_tight = convert2tensor(pseudo_labels_tight)
            pseudo_labels_loose = convert2tensor(pseudo_labels_loose)
            # compute R_indep and R_comp
            N = pseudo_labels.size(0)
            label_sim = pseudo_labels.expand(N, N).eq(pseudo_labels.expand(N, N).t()).float() # shape:[num_imgs, num_imgs]
            label_sim_tight = pseudo_labels_tight.expand(N, N).eq(pseudo_labels_tight.expand(N, N).t()).float()
            label_sim_loose = pseudo_labels_loose.expand(N, N).eq(pseudo_labels_loose.expand(N, N).t()).float()

            R_comp = 1-torch.min(label_sim, label_sim_tight).sum(-1)/torch.max(label_sim, label_sim_tight).sum(-1) # shape: num_imgs
            R_indep = 1-torch.min(label_sim, label_sim_loose).sum(-1)/torch.max(label_sim, label_sim_loose).sum(-1)
            assert((R_comp.min()>=0) and (R_comp.max()<=1))
            assert((R_indep.min()>=0) and (R_indep.max()<=1))
            return R_comp, R_indep

        # compute_R_divide is my divided method to compute R_comp and R_indep
        def compute_R_divide(pseudo_labels, pseudo_labels_tight, pseudo_labels_loose):
            # compute in divided numpy to avoid error '[enforce fail at CPUAllocator.cpp:56]'
            def convert2numpy(label):
                if(isinstance(label, np.ndarray)):
                    return label
                else:
                    return label.numpy().astype(np.int32) # if is torch.Tensor
            def get_sub_label_sim(label_np, start, end):
                label_sim = np.expand_dims(label_np, 0).repeat(end - start, axis=0)
                label_sim_T = np.expand_dims(label_np[start:end], 0).repeat(len(label_np), axis=0).T
                return (label_sim == label_sim_T).astype(np.int32)

            pseudo_labels = convert2numpy(pseudo_labels)
            pseudo_labels_tight = convert2numpy(pseudo_labels_tight)
            pseudo_labels_loose = convert2numpy(pseudo_labels_loose)
            N = pseudo_labels.shape[0]
            divide_base = 15000  # this factor is determined intuitively
            divide = max(int((N/divide_base) * (N/divide_base)), 1)
            num_each = math.ceil(N / divide)
            for i in range(divide):
                start = i*num_each
                end = min((i+1)*num_each, N)
                label_sim_np = get_sub_label_sim(pseudo_labels, start, end)
                label_sim_tight_np = get_sub_label_sim(pseudo_labels_tight, start, end)
                label_sim_loose_np = get_sub_label_sim(pseudo_labels_loose, start, end)
                R_comp_np = 1 - (label_sim_np & label_sim_tight_np).sum(-1)/(label_sim_np | label_sim_tight_np).sum(-1)
                R_indep_np = 1 - (label_sim_np & label_sim_loose_np).sum(-1) / (label_sim_np | label_sim_loose_np).sum(-1)
                if(i==0):
                    R_COMP_np = R_comp_np
                    R_INDEP_np = R_indep_np
                else:
                    R_COMP_np = np.concatenate((R_COMP_np, R_comp_np), axis=-1)
                    R_INDEP_np = np.concatenate((R_INDEP_np, R_comp_np), axis=-1)

            R_comp = torch.from_numpy(R_COMP_np).float()
            R_indep = torch.from_numpy(R_INDEP_np).float()
            assert ((R_comp.min() >= 0) and (R_comp.max() <= 1))
            assert ((R_indep.min() >= 0) and (R_indep.max() <= 1))
            return R_comp, R_indep

        R_comp, R_indep = compute_R_divide(pseudo_labels, pseudo_labels_tight, pseudo_labels_loose)

This is just a workaround to tackle this problem, if there is better solution, it is expected.

Jul 09 '20 06:07 darcyzhc

Have you tried to use rerank_dist = compute_jaccard_distance(target_features, k1=args.k1, k2=args.k2, search_option=3, use_float16=True) in https://github.com/yxgeee/SpCL/blob/master/examples/spcl_train_uda.py#L175 ? (search_option=3 could save GPU memory by using faiss_cpu, use_float16=True could save CPU memory by using numpy.float16)

Jul 09 '20 06:07 yxgeee

"rerank_dist = compute_jaccard_distance(target_features, k1=args.k1, k2=args.k2, search_option=3, use_float16=True" It's a pity that it does not work. A server with 384G memory is still facing the pointed error when the training set is very big.

Sep 14 '20 05:09 WangWenhao0716

"rerank_dist = compute_jaccard_distance(target_features, k1=args.k1, k2=args.k2, search_option=3, use_float16=True" It's a pity that it does not work. A server with 384G memory is still facing the pointed error when the training set is very big.

@WangWenhao0716 Yes, it would be out of memory when training with large-scale datasets. I am still work on solving this problem. Also welcome to provide any useful suggestions!

Sep 16 '20 06:09 yxgeee

Thanks for your reply～

发自我的iPhone

------------------ Original ------------------ From: Yixiao Ge <[email protected]> Date: Wed,Sep 16,2020 2:49 PM To: yxgeee/SpCL <[email protected]> Cc: WenhaoWang <[email protected]>, Mention <[email protected]> Subject: Re: [yxgeee/SpCL] Error occurs while uda target dataset is too big (#4)

"rerank_dist = compute_jaccard_distance(target_features, k1=args.k1, k2=args.k2, search_option=3, use_float16=True" It's a pity that it does not work. A server with 384G memory is still facing the pointed error when the training set is very big.

@WangWenhao0716 Yes, it would be out of memory when training with large-scale datasets. I am still work on solving this problem. Also welcome to provide any useful suggestions!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Sep 16 '20 06:09 WangWenhao0716

I also run out of memory because of DBSCAN high memory usage. It is a pity that DBSCAN limits the spread of SpCL. Look forward to any goods news.

Dec 08 '20 08:12 Jie2World

One potential solution is to use cuml(https://github.com/rapidsai/cuml) instead of sklearn package (I have tried to cluster over 1 million data with cuml.). However, cuml does not support the pre-computed affinity matrix, which means that it is hard to use Jaccard distance with cuml (Euclidean distance is used in default). So it is recommended to use adaptive eps hyper-parameter if using the DBSCAN from cuml. @Jie2World @darcyzhc @WangWenhao0716

Example:

import cudf
from cuml.cluster import DBSCAN
from torch.utils.dlpack import from_dlpack, to_dlpack

# I use a toy example with eps=0.2 here, however, it should be self-adaptive in current training epoch
# e.g. the mean of top-k affinities.
cluster_alg = DBSCAN(eps=0.2, min_samples=4, output_type='numpy')
cluster_alg.fit(cudf.from_dlpack(to_dlpack(memory.features)))
pseudo_labels = cluster_alg.labels_.tolist()

Dec 08 '20 08:12 yxgeee

Thanks. cuml DBSCAN maybe will support pre-computed affinity metric(https://github.com/rapidsai/cuml/issues/3302). Hope it help for you.

Dec 14 '20 09:12 Jie2World

Hahaha, this issue is given by myself.

Dec 14 '20 09:12 WangWenhao0716

SpCL SpCL copied to clipboard

Error occurs while uda target dataset is too big

SpCL
SpCL copied to clipboard