top-k-mallows
top-k-mallows copied to clipboard
Arbitrary precision Mallows Model under Hamming distance + solved numpy float type deprecation bug
Hi!
First of all thank you for your work! This package has been really useful to understand distance-based statistical models for permutations.
Arbitrary precision sampling for Mallows Model under Hamming distance
This PR implements arbitrary precision math (using the Python mpmath
library) in the sampling of permutations using the Mallows Model under the Hamming distance. This makes it possible to sample from very long permutations without running into overflows of the built-in Python integer type, which result in 0s or NaNs appearing in the calculation of the number of permutations at large Hamming distances.
I am working on large Traveling-Salesman-like problems (1000s of destinations) where it's interesting to sample around known "good" solutions (see RAAN walks in multi-target trajectory optimization for spacecraft), so this capability has been quite useful.
I have tested permutations of up to 5000 elements. Sampling 10000 permutations of 5000 elements takes approximately 1 minute. You can see the resulting distance histogram and the code used to generate it below.
import numpy as np
import matplotlib.pyplot as plt
import mallows_hamming as mh
theta = 7
n_samples = 10000
problem_size = 5000
sample = mh.sample(n=problem_size, m=n_samples, theta=theta, precision=1000)
distances = np.array([mh.distance(perm) for perm in sample])
bins = np.arange(- 0.5, problem_size + 1, 1)
plt.hist(distances, bins=bins, alpha=0.25, label='Arbitrary precision')
plt.legend(loc='best')
plt.show()
As expected, the new sample
function returns identical output to that of the original one for smaller permutations:
Bug fix: Solved NumPy float type deprecation bug
This PR replaces np.float
by np.float64
in the code to solve the following Numpy deprecation error:
AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations. Did you mean: 'cfloat'?
Cheers!