open-metric-learning Decrease memory foot print of computing FMR metric

Initial discussion is in https://github.com/OML-Team/open-metric-learning/pull/250

Dec 01 '22 22:12 AlekseySh

from @dapladoc :

I thought about some algorithm to find quantiles online, in a batch mode. I found that there is P^2 algorithm (link) and here is a discussion about it (link). Then I found more articles about quantile estimation: see references to this article (link). It looks like all these algorithms work only element-by-element. So, no way to parallelize it. I'm not a specialist in this, though. Maybe it could be done effectively in cython. I'll try to implement the P^2 (or some other algorithm) in cython.

Another solution is to approximate the quantile (and the fnmr@fmr) by means of histogram, that could be computed incrementally. I can try to estimate accuracy of this approach on some syntetic data.

At the moment I don't have other ideas, but I will think on it, and maybe something new will come up.

Dec 01 '22 22:12 AlekseySh

from me:

@dapladoc thank you for your thoughts and links! please, don't start working on cython implementation (except if you really want this), let's discuss it for now. I am afraid that the possible profit is way less than the potential effort needed to implement this algorithm. Let's just discuss possibilities for now.

I like the idea of an approximation. I thought about an even simpler solution: we can consider only the part of pairwise distances if the matrix is too big. For example, randomly pick 10% or something like this.

It's still unclear what is the problem: the footprint of copying distances or memory overhead inside the quantile algorithm.

Dec 01 '22 22:12 AlekseySh

I like the idea of an approximation. I thought about an even simpler solution: we can consider only the part of pairwise distances if the matrix is too big. For example, randomly pick 10% or something like this.

I don't like this idea. It is very uncontrollable approach. I think there could be very high variance for this approach.

Here is a small simulation. I drew pos_dist and neg_dist from the gamma distribution. The size of pos_dist is 1000, and the size of neg_dist is 1,000,000. Then in a loop I take randomly 10% of negative distances, evaluate fnmr@fmr and store it. Afterwards, I plot histograms of fnmr@fmr approximations and the true fnmr@fmr for different fmr_vals.

The simulation of fnmr@fmr approximation on random subsets.

pos_dist = torch.from_numpy(np.random.gamma(shape=1, scale=1, size=1000))
neg_dist = torch.from_numpy(np.random.gamma(shape=6, scale=1, size=1_000_000))
fmr_vals = (0.1, 0.5, 1, 5)
fnmr_true = calc_fnmr_at_fmr(pos_dist, neg_dist, fmr_vals).numpy()
fnmr_approx = []
for i in tqdm(range(10000)):
    _neg_dist = neg_dist[torch.rand(neg_dist.shape) > 0.9]
    fnmr_approx.append(calc_fnmr_at_fmr(pos_dist, _neg_dist, fmr_vals).tolist())
fnmr_approx = np.array(fnmr_approx)

fig, ax = plt.subplots(len(fnmr_true))
for i in range(len(fnmr_true)):
    ax[i].hist(fnmr_approx[:, i], bins=100)
    ax[i].axvline(fnmr_true[i], color='r')
    ax[i].set_title(f'fmr_val = {fmr_vals[i]}')
    print(f'Std(fnmr@fmr({fmr_vals[i]})) = {fnmr_approx[:, i].std():.4e}')
    print(f'Bias(fnmr@fmr({fmr_vals[i]})) = {np.mean(np.abs(fnmr_approx[:, i] - fnmr_true[i])):.4e}')
fig.set_size_inches(6.4, 4.8 * 3)

And here is its output

Std(fnmr@fmr(0.1)) = 5.5389e-01
Bias(fnmr@fmr(0.1)) = 4.0927e-01
Std(fnmr@fmr(0.5)) = 3.2004e-01
Bias(fnmr@fmr(0.5)) = 2.2448e-01
Std(fnmr@fmr(1)) = 1.5432e-01
Bias(fnmr@fmr(1)) = 6.6530e-02
Std(fnmr@fmr(5)) = 9.9451e-02
Bias(fnmr@fmr(5)) = 7.1290e-02

Here are histograms of fnmr@fmr approximations. The red lines are "ground truth".

fnmr_approximations_errors

Usually we are interested in fnmr@fmr for small values of fmr, and it will have the biggest bias.

Dec 02 '22 13:12 dapladoc

The graphs above seem okay to me, it's I consider fnmr@fmr only as an additional (auxiliary) metric. So, it's not super important 30% or 32% of positive distances are smaller than 10% of the negative ones. Of course, If I submit these results somewhere, it's not accurate enough.

Dec 02 '22 14:12 AlekseySh

If we want to work on this problem systematically, we have to measure the following first (for example, on SOP dataset):

Memory is needed to compute 3 basic metrics (CMC, precision, ap)
Memory is needed to do p.1 + copy the positive and negative distances
Memory is needed to do p.2 + compute fnm@fmr metric

Then it would be clear what and how should be optimized

PS. My IMHO is that we can postpone the problem for now (https://github.com/OML-Team/open-metric-learning/pull/250 is enough) and focus on the tasks with higher priorities. In the end, it's probably not optimal to optimize fmr metric without (at least) having examples from the verification domain.

Dec 02 '22 14:12 AlekseySh

The graphs above seem okay to me, it's I consider fnmr@fmr only as an additional (auxiliary) metric. So, it's not super important 30% or 32% of positive distances are smaller than 10% of the negative ones. Of course, If I submit these results somewhere, it's not accurate enough.

It really depends on the domain. For security related tasks (i.e. biometrics) 99.9% and 99.99% are noticebly different results.

PS. My IMHO is that we can postpone the problem for now (#250 is enough) and focus on the tasks with higher priorities. In the end, it's probably not optimal to optimize fmr metric without (at least) having examples from the verification domain.

I agree with you. There should be more thorough memory measurements done in order to find possible ways to solve this issue.

Dec 02 '22 16:12 dapladoc

A bit optimizer it along with OML 3.0 release (removed unwilling convertations): https://github.com/OML-Team/open-metric-learning/blob/main/oml/functional/metrics.py#L375

Jun 07 '24 15:06 AlekseySh

open-metric-learning open-metric-learning copied to clipboard

Decrease memory foot print of computing FMR metric

open-metric-learning
open-metric-learning copied to clipboard