xiaokening
xiaokening
@jwmueller Thanks for your reply, I will try your proposed method later. Below are the details of my data: data_for_train(pd.DataFrame): the dateset for multilabel-classification algorithm; pred_probs(nd.array): this matrix is converted...
@jwmueller I'm not currently able to get the runtimes for each issue check on my data, I will share it with you later. But now I have another question here....
@jwmueller Below is the runtimes for each issue check on my data. near_duplicate: 27 hours and 20 minutes; non_iid_issue: 89 hours and 24 minutes; outlier_issue: 90 hours and 17 minutes;...
@jwmueller Below is the code. ``` import os import logging import warnings import numpy as np import pandas as pd from pecos.utils import cli, logging_util, smat_util, torch_util from cleanlab import...
@jwmueller After I use cleanlab to find near_duplicate_issue of my data, I don't know how to deal with the data whose is_near_duplicate_issue is True(these data are indeed near duplicated) to...
@jwmueller Yes, I know about near_duplicate_sets, but I need to manually confirm which sample to keep, which is quite a tedious process. I want to use an automated algorithm to...
thanks! @jiong-zhang
@jiong-zhang When I train xtransformer with pecos model, the same training error occurs in the matcher stage. At first I thought that my data volume was too large, but when...