SpCL
SpCL copied to clipboard
Error occurs while uda target dataset is too big
While UDA training,my target dataset training split has 79188 images, and my super computer memory is 128G, but it failed, the error info is Pytorch RuntimeError: [enforce fail at CPUAllocator.cpp:56] posix_memalign(&data, gAlignment, nbytes) == 0. 12 vs 0. Because the memory is not enough. So I find the most memory consumed code part, then rewrite them by numpy operation instead of torch.Tensor operation, and divide the whole operation into several parts.
In spcl_train_uda.py
:
# select & cluster images as training set of this epochs
pseudo_labels = cluster.fit_predict(rerank_dist) # shape: num_imgs
pseudo_labels_tight = cluster_tight.fit_predict(rerank_dist)
pseudo_labels_loose = cluster_loose.fit_predict(rerank_dist)
num_ids = len(set(pseudo_labels)) - (1 if -1 in pseudo_labels else 0)
num_ids_tight = len(set(pseudo_labels_tight)) - (1 if -1 in pseudo_labels_tight else 0)
num_ids_loose = len(set(pseudo_labels_loose)) - (1 if -1 in pseudo_labels_loose else 0)
# generate new dataset and calculate cluster centers
def generate_pseudo_labels(cluster_id, num):
labels = []
outliers = 0
for i, ((fname, _, cid), id) in enumerate(zip(sorted(dataset_target.train), cluster_id)):
if id!=-1:
labels.append(source_classes+id)
else:
labels.append(source_classes+num+outliers)
outliers += 1
return torch.Tensor(labels).long()
pseudo_labels = generate_pseudo_labels(pseudo_labels, num_ids)
pseudo_labels_tight = generate_pseudo_labels(pseudo_labels_tight, num_ids_tight)
pseudo_labels_loose = generate_pseudo_labels(pseudo_labels_loose, num_ids_loose)
# above code is not changed
# compute_R_old is the old method to compute R_comp and R_indep
def compute_R_old(pseudo_labels, pseudo_labels_tight, pseudo_labels_loose):
def convert2tensor(label):
if(isinstance(label, torch.Tensor)):
return label
else:
return torch.from_numpy(label)
pseudo_labels = convert2tensor(pseudo_labels)
pseudo_labels_tight = convert2tensor(pseudo_labels_tight)
pseudo_labels_loose = convert2tensor(pseudo_labels_loose)
# compute R_indep and R_comp
N = pseudo_labels.size(0)
label_sim = pseudo_labels.expand(N, N).eq(pseudo_labels.expand(N, N).t()).float() # shape:[num_imgs, num_imgs]
label_sim_tight = pseudo_labels_tight.expand(N, N).eq(pseudo_labels_tight.expand(N, N).t()).float()
label_sim_loose = pseudo_labels_loose.expand(N, N).eq(pseudo_labels_loose.expand(N, N).t()).float()
R_comp = 1-torch.min(label_sim, label_sim_tight).sum(-1)/torch.max(label_sim, label_sim_tight).sum(-1) # shape: num_imgs
R_indep = 1-torch.min(label_sim, label_sim_loose).sum(-1)/torch.max(label_sim, label_sim_loose).sum(-1)
assert((R_comp.min()>=0) and (R_comp.max()<=1))
assert((R_indep.min()>=0) and (R_indep.max()<=1))
return R_comp, R_indep
# compute_R_divide is my divided method to compute R_comp and R_indep
def compute_R_divide(pseudo_labels, pseudo_labels_tight, pseudo_labels_loose):
# compute in divided numpy to avoid error '[enforce fail at CPUAllocator.cpp:56]'
def convert2numpy(label):
if(isinstance(label, np.ndarray)):
return label
else:
return label.numpy().astype(np.int32) # if is torch.Tensor
def get_sub_label_sim(label_np, start, end):
label_sim = np.expand_dims(label_np, 0).repeat(end - start, axis=0)
label_sim_T = np.expand_dims(label_np[start:end], 0).repeat(len(label_np), axis=0).T
return (label_sim == label_sim_T).astype(np.int32)
pseudo_labels = convert2numpy(pseudo_labels)
pseudo_labels_tight = convert2numpy(pseudo_labels_tight)
pseudo_labels_loose = convert2numpy(pseudo_labels_loose)
N = pseudo_labels.shape[0]
divide_base = 15000 # this factor is determined intuitively
divide = max(int((N/divide_base) * (N/divide_base)), 1)
num_each = math.ceil(N / divide)
for i in range(divide):
start = i*num_each
end = min((i+1)*num_each, N)
label_sim_np = get_sub_label_sim(pseudo_labels, start, end)
label_sim_tight_np = get_sub_label_sim(pseudo_labels_tight, start, end)
label_sim_loose_np = get_sub_label_sim(pseudo_labels_loose, start, end)
R_comp_np = 1 - (label_sim_np & label_sim_tight_np).sum(-1)/(label_sim_np | label_sim_tight_np).sum(-1)
R_indep_np = 1 - (label_sim_np & label_sim_loose_np).sum(-1) / (label_sim_np | label_sim_loose_np).sum(-1)
if(i==0):
R_COMP_np = R_comp_np
R_INDEP_np = R_indep_np
else:
R_COMP_np = np.concatenate((R_COMP_np, R_comp_np), axis=-1)
R_INDEP_np = np.concatenate((R_INDEP_np, R_comp_np), axis=-1)
R_comp = torch.from_numpy(R_COMP_np).float()
R_indep = torch.from_numpy(R_INDEP_np).float()
assert ((R_comp.min() >= 0) and (R_comp.max() <= 1))
assert ((R_indep.min() >= 0) and (R_indep.max() <= 1))
return R_comp, R_indep
R_comp, R_indep = compute_R_divide(pseudo_labels, pseudo_labels_tight, pseudo_labels_loose)
This is just a workaround to tackle this problem, if there is better solution, it is expected.
Have you tried to use rerank_dist = compute_jaccard_distance(target_features, k1=args.k1, k2=args.k2, search_option=3, use_float16=True)
in https://github.com/yxgeee/SpCL/blob/master/examples/spcl_train_uda.py#L175 ? (search_option=3
could save GPU memory by using faiss_cpu, use_float16=True
could save CPU memory by using numpy.float16)
"rerank_dist = compute_jaccard_distance(target_features, k1=args.k1, k2=args.k2, search_option=3, use_float16=True" It's a pity that it does not work. A server with 384G memory is still facing the pointed error when the training set is very big.
"rerank_dist = compute_jaccard_distance(target_features, k1=args.k1, k2=args.k2, search_option=3, use_float16=True" It's a pity that it does not work. A server with 384G memory is still facing the pointed error when the training set is very big.
@WangWenhao0716 Yes, it would be out of memory when training with large-scale datasets. I am still work on solving this problem. Also welcome to provide any useful suggestions!
Thanks for your reply~
发自我的iPhone
------------------ Original ------------------ From: Yixiao Ge <[email protected]> Date: Wed,Sep 16,2020 2:49 PM To: yxgeee/SpCL <[email protected]> Cc: WenhaoWang <[email protected]>, Mention <[email protected]> Subject: Re: [yxgeee/SpCL] Error occurs while uda target dataset is too big (#4)
"rerank_dist = compute_jaccard_distance(target_features, k1=args.k1, k2=args.k2, search_option=3, use_float16=True" It's a pity that it does not work. A server with 384G memory is still facing the pointed error when the training set is very big.
@WangWenhao0716 Yes, it would be out of memory when training with large-scale datasets. I am still work on solving this problem. Also welcome to provide any useful suggestions!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
I also run out of memory because of DBSCAN high memory usage. It is a pity that DBSCAN limits the spread of SpCL. Look forward to any goods news.
One potential solution is to use cuml
(https://github.com/rapidsai/cuml) instead of sklearn
package (I have tried to cluster over 1 million data with cuml
.). However, cuml
does not support the pre-computed affinity matrix, which means that it is hard to use Jaccard distance with cuml
(Euclidean distance is used in default). So it is recommended to use adaptive eps
hyper-parameter if using the DBSCAN from cuml
. @Jie2World @darcyzhc @WangWenhao0716
Example:
import cudf
from cuml.cluster import DBSCAN
from torch.utils.dlpack import from_dlpack, to_dlpack
# I use a toy example with eps=0.2 here, however, it should be self-adaptive in current training epoch
# e.g. the mean of top-k affinities.
cluster_alg = DBSCAN(eps=0.2, min_samples=4, output_type='numpy')
cluster_alg.fit(cudf.from_dlpack(to_dlpack(memory.features)))
pseudo_labels = cluster_alg.labels_.tolist()
Thanks. cuml DBSCAN maybe will support pre-computed affinity metric(https://github.com/rapidsai/cuml/issues/3302). Hope it help for you.
Hahaha, this issue is given by myself.