top-k-mallows icon indicating copy to clipboard operation
top-k-mallows copied to clipboard

Arbitrary precision Mallows Model under Hamming distance + solved numpy float type deprecation bug

Open alopezrivera opened this issue 9 months ago • 0 comments

Hi!

First of all thank you for your work! This package has been really useful to understand distance-based statistical models for permutations.

Arbitrary precision sampling for Mallows Model under Hamming distance

This PR implements arbitrary precision math (using the Python mpmath library) in the sampling of permutations using the Mallows Model under the Hamming distance. This makes it possible to sample from very long permutations without running into overflows of the built-in Python integer type, which result in 0s or NaNs appearing in the calculation of the number of permutations at large Hamming distances.

I am working on large Traveling-Salesman-like problems (1000s of destinations) where it's interesting to sample around known "good" solutions (see RAAN walks in multi-target trajectory optimization for spacecraft), so this capability has been quite useful.

I have tested permutations of up to 5000 elements. Sampling 10000 permutations of 5000 elements takes approximately 1 minute. You can see the resulting distance histogram and the code used to generate it below. image

import numpy as np
import matplotlib.pyplot as plt

import mallows_hamming as mh

theta = 7
n_samples = 10000
problem_size = 5000

sample = mh.sample(n=problem_size, m=n_samples, theta=theta, precision=1000)
distances = np.array([mh.distance(perm) for perm in sample])

bins = np.arange(- 0.5, problem_size + 1, 1)
plt.hist(distances, bins=bins, alpha=0.25, label='Arbitrary precision')
plt.legend(loc='best')
plt.show()

As expected, the new sample function returns identical output to that of the original one for smaller permutations: image

Bug fix: Solved NumPy float type deprecation bug

This PR replaces np.float by np.float64 in the code to solve the following Numpy deprecation error:

AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations. Did you mean: 'cfloat'?

Cheers!

alopezrivera avatar May 01 '24 12:05 alopezrivera