imbalanced-regression Using FDS/LDS with a custom model and data

Hi @YyzHarry,

I am trying to adapt the example from here https://github.com/YyzHarry/imbalanced-regression/tree/main/agedb-dir with my custom model and data. Thus, I would like to ask you whether this would be feasible and if yes if there are any example showing explicitly how to do that.

Thanks.

May 14 '21 10:05 ttsesm

Hi @YyzHarry I am having a regression dataset CSV file containing all numeric values. Please let me know https://github.com/YyzHarry/imbalanced-regression/tree/main/imdb-wiki-dir will work my requirement? Thanks

May 14 '21 14:05 snigdhasen

Hi @ttsesm Yes - that could serve as an example!

The core code for LDS is basically here: https://github.com/YyzHarry/imbalanced-regression/blob/main/agedb-dir/datasets.py#L55, where we estimate a weight for each sample based on the effective label density.
As for FDS, we have a separete file for its implementation: https://github.com/YyzHarry/imbalanced-regression/blob/main/agedb-dir/fds.py. It's kind of like the BatchNorm implementation, and could be incorporated into any networks.

May 15 '21 03:05 YyzHarry

Hi @snigdhasen Yes, I believe that is a complete codebase, and you might only need to modify the data loading part (and maybe the network you choose to use).

May 15 '21 03:05 YyzHarry

@YyzHarry I found some time so I was going through the paper and also your blog post as well as the links that you have pointed me to, but I still do not get how you apply the LDS/FDS distribution smoothing in practice.

So I would appreciate if you could give a step by step guide how to be done. I think this would be helpful for others as well.

For example in my case I have dataset of point clouds where for each point I have a set of feature vectors, e.g.:

-0.471780000000000	0.702420000000000	0.291670000000000	156.716000000000	0.800000000000000	0.800000000000000	0.800000000000000	1	0	0	0.0111600000000000	0	0	0	8.47483000000000	0	0
-0.471780000000000	0.826370000000000	0.216670000000000	139.612000000000	0.800000000000000	0.800000000000000	0.800000000000000	1	0	0	0.0111600000000000	0	0	0	8.61834000000000	0	0
0.471780000000000	0.280970000000000	0.458330000000000	195.465000000000	0.800000000000000	0.800000000000000	0.800000000000000	-1	0	0	0.0111600000000000	0	0	0	8.56491000000000	0	0
0.206920000000000	-0.239650000000000	0	670.182010000000	0.800000000000000	0.800000000000000	0.800000000000000	0	0	1	0.0110800000000000	0	0	0	8.63796000000000	0	0
0.455220000000000	0.727210000000000	0.500000000000000	107.883000000000	0.800000000000000	0.800000000000000	0.800000000000000	0	0	-1	0.0110800000000000	0	0	0	8.65391000000000	0	0
-0.231750000000000	-0.801580000000000	0	250.761000000000	0.800000000000000	0.800000000000000	0.800000000000000	0	0	1	0.0110800000000000	0	0	0	8.37285000000000	0	0
0.471780000000000	0.760260000000000	0.0416700000000000	176.562000000000	0.800000000000000	0.800000000000000	0.800000000000000	-1	0	0	0.0111600000000000	0	0	0	8.35862000000000	0	0
-0.157260000000000	0.735470000000000	0.500000000000000	141.367000000000	0.800000000000000	0.800000000000000	0.800000000000000	0	0	-1	0.0110800000000000	0	0	0	8.64104000000000	0	0
0.306240000000000	0.305760000000000	0	710.883970000000	0.800000000000000	0.800000000000000	0.800000000000000	0	0	1	0.0110800000000000	0	0	0	8.81857000000000	0	0
0.355900000000000	0.280970000000000	0.500000000000000	235.098010000000	0.800000000000000	0.800000000000000	0.800000000000000	0	0	-1	0.0110800000000000	0	0	0	8.36165000000000	0	0
-0.281410000000000	0.314020000000000	0.500000000000000	208.985990000000	0.800000000000000	0.800000000000000	0.800000000000000	0	0	-1	0.0110800000000000	0	0	0	8.43708000000000	0	0
0.438670000000000	0.636310000000000	0.500000000000000	132.513000000000	0.800000000000000	0.800000000000000	0.800000000000000	0	0	-1	0.0110800000000000	0	0	0	8.68539000000000	0	0
-0.471780000000000	0.925540000000000	0.308330000000000	108.584000000000	0.800000000000000	0.800000000000000	0.800000000000000	1	0	0	0.0111600000000000	0	0	0	8.79508000000000	0	0
0.389010000000000	0.909010000000000	0.500000000000000	96.3420000000000	0.800000000000000	0.800000000000000	0.800000000000000	0	0	-1	0.0110800000000000	0	0	0	8.47030000000000	0	0
0.0827700000000000	-0.909010000000000	0	203.560000000000	0.800000000000000	0.800000000000000	0.800000000000000	0	0	1	0.0110800000000000	0	0	0	8.19117000000000	0	0
0.140710000000000	-0.677630000000000	0.500000000000000	199.156010000000	0.800000000000000	0.800000000000000	0.800000000000000	0	0	-1	0.0110800000000000	0	0	0	8.42757000000000	0	0
0.107600000000000	0.256180000000000	0	710.012020000000	0.800000000000000	0.800000000000000	0.800000000000000	0	0	1	0.0110800000000000	0	0	0	9.49238000000000	0	0
-0.289690000000000	-0.834640000000000	0	236.399000000000	0.800000000000000	0.800000000000000	0.800000000000000	0	0	1	0.0110800000000000	0	0	0	8.34452000000000	0	0
0.430390000000000	-0.115690000000000	0	591.968990000000	0.800000000000000	0.800000000000000	0.800000000000000	0	0	1	0.0110800000000000	0	0	0	9.08948000000000	0	0
-0.0910400000000000	0.925540000000000	0	152.154010000000	0.800000000000000	0.800000000000000	0.800000000000000	0	0	1	0.0110800000000000	0	0	0	8.71381000000000	0	0
0.215200000000000	-0.942070000000000	0.0166700000000000	247.403000000000	0.800000000000000	0.800000000000000	0.800000000000000	0	1	0	0.0111700000000000	0	0	0	8.14043000000000	0	0
0.339350000000000	-0.553670000000000	0.500000000000000	198.897000000000	0.800000000000000	0.800000000000000	0.800000000000000	0	0	-1	0.0110800000000000	0	0	0	8.21610000000000	0	0
0.471780000000000	0.462770000000000	0.0916700000000000	399.609010000000	0.800000000000000	0.800000000000000	0.800000000000000	-1	0	0	0.0111600000000000	0	0	0	9.02757000000000	0	0
-0.240030000000000	-0.561930000000000	0.500000000000000	253.405000000000	0.800000000000000	0.800000000000000	0.800000000000000	0	0	-1	0.0110800000000000	0	0	0	8.36224000000000	0	0
-0.314520000000000	-0.190070000000000	0	1255.18604000000	0.800000000000000	0.800000000000000	0.800000000000000	0	0	1	0.0110800000000000	0	0	0	11.6615100000000	0	0
-0.430390000000000	0.165270000000000	0.500000000000000	219.422000000000	0.800000000000000	0.800000000000000	0.800000000000000	0	0	-1	0.0110800000000000	0	0	0	8.10539000000000	0	0
-0.355900000000000	0.859430000000000	0	136.401990000000	0.800000000000000	0.800000000000000	0.800000000000000	0	0	1	0.0110800000000000	0	0	0	8.59122000000000	0	0
-0.389010000000000	0.942070000000000	0.141670000000000	176.037000000000	0.800000000000000	0.800000000000000	0.800000000000000	0	-1	0	0.0111700000000000	0	0	0	8.54202000000000	0	0
-0.306240000000000	-0.776790000000000	0.500000000000000	170.912990000000	0.800000000000000	0.800000000000000	0.800000000000000	0	0	-1	0.0110800000000000	0	0	0	8.26907000000000	0	0
-0.00828000000000000	0.942070000000000	0.258330000000000	211.325000000000	0.800000000000000	0.800000000000000	0.800000000000000	0	-1	0	0.0111700000000000	0	0	0	8.38170000000000	0	0
0.471780000000000	0.0909000000000000	0.366670000000000	405.196010000000	0.800000000000000	0.800000000000000	0.800000000000000	-1	0	0	0.0111600000000000	0	0	0	8.98865000000000	0	0
-0.157260000000000	-0.578460000000000	0	492.231990000000	0.800000000000000	0.800000000000000	0.800000000000000	0	0	1	0.0110800000000000	0	0	0	9.21356000000000	0	0
0.0331100000000000	-0.859430000000000	0	226.514010000000	0.800000000000000	0.800000000000000	0.800000000000000	0	0	1	0.0110800000000000	0	0	0	8.01525000000000	0	0
0.00828000000000000	0.752000000000000	0	214.614000000000	0.800000000000000	0.800000000000000	0.800000000000000	0	0	1	0.0110800000000000	0	0	0	8.42254000000000	0	0
0.471780000000000	-0.00826000000000000	0.0916700000000000	521.054990000000	0.800000000000000	0.800000000000000	0.800000000000000	-1	0	0	0.0111600000000000	0	0	0	9.74422000000000	0	0
0.264860000000000	-0.231380000000000	0.500000000000000	235.503010000000	0.800000000000000	0.800000000000000	0.800000000000000	0	0	-1	0.0110800000000000	0	0	0.0329700000000000	7.40915000000000	0.844320000000000	0
....
....
....
....

Now I want to regress the values of column 4, but these values are imbalanced and can vary from the range of 0-10000. For example for the sample above I have split my values in groups, grps = [0 100 250 500 750 1000 2000 5000 10000] and as you can see the majority of my values lay between the range 250-500:

Now the question is how to apply LDS/FDS based on the values in column 4. Is this done before you load the data in data loader or after and while you are applying the training/testing?

Thanks.

p.s. I attach also an example of a point cloud with the corresponding complete feature vectors just in case it is useful. pcd1.txt

Jun 16 '21 10:06 ttsesm

Hi @YyzHarry, any feedback regarding my question above and possibly a step-by-step guide how to apply LDS/FDS.

Jun 28 '21 09:06 ttsesm

@ttsesm Sorry for the late reply!

Now the question is how to apply LDS/FDS based on the values in column 4. Is this done before you load the data in data loader or after and while you are applying the training/testing?

This is done after you load the data. For LDS, basically you first get the histogram as you show here for the labels, then we apply smoothing to estimate another "effective" density. After this, typically, LDS is used with loss re-weighting --- you have a weight for each sample to balance the loss. In our case, the implementation for the aforementioned steps can be found here.

For FDS, it is done during training --- just a module like BatchNorm --- inserted into your neural network (see example here). And after each training epoch, you will update the running statistics and smoothed statistics (example here). FDS does not depend on how your label is distributed (do not need the histogram for computation), but you need to first define the number of bins (see the initialization of FDS module here, bucket_num as how many bins you need).

Hope these help. Let me know if you have further questions.

Jun 28 '21 15:06 YyzHarry

@ttsesm Sorry for the late reply!

Now the question is how to apply LDS/FDS based on the values in column 4. Is this done before you load the data in data loader or after and while you are applying the training/testing?

This is done after you load the data. For LDS, basically you first get the histogram as you show here for the labels, then we apply smoothing to estimate another "effective" density. After this, typically, LDS is used with loss re-weighting --- you have a weight for each sample to balance the loss. In our case, the implementation for the aforementioned steps can be found here.

For FDS, it is done during training --- just a module like BatchNorm --- inserted into your neural network (see example here). And after each training epoch, you will update the running statistics and smoothed statistics (example here). FDS does not depend on how your label is distributed (do not need the histogram for computation), but you need to first define the number of bins (see the initialization of FDS module here, bucket_num as how many bins you need).

Hope these help. Let me know if you have further questions.

@YyzHarry thanks a lot for the feedback, it was indeed helpful. So as I understand it with LDS for each target (label) value you create a weight which you then use to balance the loss in a way like the following (this is also what I got from the suppl. material pseudo code in the paper, bellow I use L1Loss as an example):

    def forward(self, x, y, weights):
        errors = torch.abs(x - y)
        return torch.mean(errors * weights)

I played a bit with the LDS, based also on the link that you provided and I created the following running toy example in order to obtain the weights:

import os
import logging
import numpy as np
from scipy.ndimage import convolve1d
from torch.utils import data
import pandas as pd

from utils import get_lds_kernel_window


def _prepare_weights(labels, reweight, max_target=121, lds=False, lds_kernel='gaussian', lds_ks=5, lds_sigma=2):
    assert reweight in {'none', 'inverse', 'sqrt_inv'}
    assert reweight != 'none' if lds else True, \
        "Set reweight to \'sqrt_inv\' (default) or \'inverse\' when using LDS"

    value_dict = {x: 0 for x in range(max_target)}
    # labels = self.df['age'].values
    for label in labels:
        value_dict[min(max_target - 1, int(label))] += 1
    if reweight == 'sqrt_inv':
        value_dict = {k: np.sqrt(v) for k, v in value_dict.items()}
    elif reweight == 'inverse':
        value_dict = {k: np.clip(v, 5, 1000) for k, v in value_dict.items()}  # clip weights for inverse re-weight
    num_per_label = [value_dict[min(max_target - 1, int(label))] for label in labels]
    if not len(num_per_label) or reweight == 'none':
        return None
    print(f"Using re-weighting: [{reweight.upper()}]")

    if lds:
        lds_kernel_window = get_lds_kernel_window(lds_kernel, lds_ks, lds_sigma)
        print(f'Using LDS: [{lds_kernel.upper()}] ({lds_ks}/{lds_sigma})')
        smoothed_value = convolve1d(
            np.asarray([v for _, v in value_dict.items()]), weights=lds_kernel_window, mode='constant')
        num_per_label = [smoothed_value[min(max_target - 1, int(label))] for label in labels]

    weights = [np.float32(1 / x) for x in num_per_label]
    scaling = len(weights) / np.sum(weights)
    weights = [scaling * x for x in weights]
    return weights


def main():
    data = pd.read_csv("./pcd1.txt", header=None, delimiter=',', low_memory=False).to_numpy(dtype='float')

    labels = data[:,3]

    weights = _prepare_weights(labels, reweight='sqrt_inv', lds=True, lds_kernel='gaussian', lds_ks=5, lds_sigma=2)

    return



if __name__ == '__main__':
    print('Start!!!!')

    main()
    print('End!!!!')
    os._exit(0)

which seem to work fine.

I have a couple of questions though which I couldn't find the answer or I might have overlooked it:

What is the difference between the two re-weighting options i.e. inverse and sqrt_inv, and why should I choose one over the other?
Except the gaussian kernel I noticed that there are the options for triang, laplace. So my question is, are these have any major difference to the calculated weights and again is there any specific reason that I should choose one over the other?
What is the affect of the max_target hyper parameter and why you have it to 121 as the default value?
Is clipping necessary (this might be relevant with questions 2 and 3, since clipping is used only in the inverse option as well as the max_target parameter), for example in my case my target values my vary from 0 to 25000 where the amount of values above 1500 is quite small. My guess is that for these values the weight will be quite high clipping it to a lower value wouldn't have an affect or not really?
Regarding FDS and the number of bins, as I understood this is dependent on the extremes of your values. Is that correct? For the age for example you consider the ages from 0-99 so your bins are 100. In my case I guess that since my value are varying from 0 up to 25000 my number of bins should be to that range, right?

Jul 01 '21 23:07 ttsesm

What is the difference between the two re-weighting options i.e. inv and sqrt_inv, and why should I choose one over the other?

Actually, we use sqrt_inv by default for certain tasks (like age estimation). The details of these baselines could be found on Page 6 of the paper. Either sqrt_inv or inv belongs to the category of cost-sensitive re-weighting methods; the reason to use sqrt inverse sometimes, is because after inverse re-weighting, some weights might be very large (e.g., consider 5,000 images for age 30, and only 1 image for age 100, then after inverse re-weighting, the weight ratio could be extremely high). This could cause optimization problems. Again, the choices also depend on tasks you are tackling.

Except the gaussian kernel I noticed that there are the options for triang, laplace. So my question is, are these have any major difference to the calculated weights and again is there any specific reason that I should choose one over the other?

These just provide more choices. In Appendix E.1 of our paper, we studied some choices of kernel types. Overall, they should give similar results, but some might be better in certain tasks.

What is the affect of the max_target hyper parameter and why you have it to 121 as the default value?

The number is just based on the label distribution for the particular age dataset. Since the number of samples with age larger than 120 is very small, we can just aggregate and assign the same weight. The reason is as you said, by applying re-weighting, we do not want the weight to be too high and cause optimization issue.

Is clipping necessary

Your understanding is correct. This is related to the above questions.

Regarding FDS and the number of bins, as I understood this is dependent on the extremes of your values. Is that correct? For the age for example you consider the ages from 0-99 so your bins are 100. In my case I guess that since my value are varying from 0 up to 25000 my number of bins should be to that range, right?

Yes, your understanding is correct. As for your case, it also depends on what the minimum resolution you care (i.e., the bin size). For age, the minimum resolution we care is 1 year, so the bins are 100 if we consider the ages from 0-99. If your minimum resolution that matters is 10, your bins could be 2500 in accordance. Smaller # of bins will make the statistics estimation more accurate, as more samples are considered in each bin. Again, the choices should depend on tasks you are tackling.

Jul 05 '21 15:07 YyzHarry

Hi @YyzHarry, thanks for the feedback and your time. I will try to play a bit with the different settings and I will let you know if I have any further questions.

Jul 06 '21 08:07 ttsesm

Hi @YyzHarry, Thanks for your github link The following image is the label/target distribution of my regression dataset. I tried your Boston dataset colab notebook file and applied on my dataset. Capture1_sample I am getting following output . From which I am understanding MSE value is not gradually decreasing..very much fluctuating in fact. Please let me know Do i need to add some extra lines of code/customization? Capture2

Jul 06 '21 15:07 snigdhasen

Hi @snigdhasen It seems the loss is gradually decreasing (though very slow). I guess the value in the parentheses is the average value for MSE/L1 loss.

Jul 09 '21 23:07 YyzHarry

@YyzHarry Thanks . yes thats the average loss. But MSE is too high around .99. Can you suggest any customization to reduce loss here. L1 loss is ok around .39.

Jul 14 '21 13:07 snigdhasen

Hi @YyzHarry, I want to use LDS/FDS to estimate the job processing time, but the time distribution range is relatively large, and I pay more attention to the small-scale samples of time, so I want to use logDuration as the division unit of bin size. May I do this, what is it for symmetric kernels' Requirements, what kind of adjustments need to be made to the hyperparameters? PAE_0723 - Jupyter Notebook

Aug 09 '21 01:08 zhaosongyi

Hi @zhaosongyi - this is an interesting point. In our work, we use a symmetric kernel since we assume the distance with respect to an anchor point in the target space should not depend on the sign (e.g., for age estimation, 10 year-old and 14 year-old have the same distance to a 12 year-old). Another implicit benefit from symmetric kernels is that they are theoretically guaranteed to make the distribution "smoother" (has lower lipschitz constant).

Going back to your case, when you apply a log transformation to the target labels (and if we still assume the distance for the original processing-time labels does not depend on the sign), I guess you might want to try an asymmetric kernel. A simple implementation with a Gaussian kernel could be a combination of two half Gaussians with different \sigma, where you have a larger \sigma for the left half and a smaller one for the right half.

Aug 11 '21 03:08 YyzHarry

@YyzHarry Hii I applied only LDS on my Dataset but I am not seeing any improvement in training or validation. Do I need to apply both FDS and LDS on Boston like dataset ? @ttsesm if this method worked for you please ping me on [email protected]

Nov 29 '21 16:11 maneeshsagar

Hi @YyzHarry I have some question about only use LDS on my dataset，however， errors are always reported during operation. Now I want to know that if there are specific format requirements for the input data？I'm dealing with spatiotemporal data with longitude and latitude. I don't know if I can?

Feb 04 '22 05:02 ytkmy5555

Hi, I have a question when training my custom dataset My dataset have the target values in range of (0, 4), and bins are 0.1 The training loss seems fine, however the validation loss is crazy. Debugging shows that the output of the model is very big, like 2e+15.

Can you give some idea about that?

Sep 25 '23 04:09 thangnh0608

imbalanced-regression imbalanced-regression copied to clipboard

Using FDS/LDS with a custom model and data

imbalanced-regression
imbalanced-regression copied to clipboard