Semi-deterministic output even though randon_state is set
Hello everybody,
While adding some tests to a project of mine, I noticed some really weird behaviour. Two different instances initialised with the same parameters (including random_state) output a different result for fit_transform during an execution. But when running the program again, the output does not change.
Am I missing something obvious? Or has anybody an idea why this is happening. Thanks for looking into it.
Reproduction Steps
import umap
din = [[39.715797424316406, 5.328598499298096],
[40.119140625, 6.10653018951416],
[39.6290283203125, 6.134637832641602],
[39.19687271118164, 5.85951566696167],
[9.60939884185791, 9.586419105529785],
[-6.015710353851318, -11.25406265258789],
[9.012431144714355, 8.989534378051758],
[9.283456802368164, 9.261088371276855],
[-5.681527614593506, -10.919998168945312],
[-5.479494571685791, -10.71765422821045]]
a = umap.UMAP(random_state=42, n_neighbors=2, n_components=2).fit_transform(din).tolist()
b = umap.UMAP(random_state=42, n_neighbors=2, n_components=2).fit_transform(din).tolist()
print(a)
print(b)
assert a == b
with the output being:
/Users/op/.pyenv/versions/3.9.18/lib/python3.9/site-packages/umap/umap_.py:1945: UserWarning: n_jobs value 1 overridden to 1 by setting random_state. Use no seed for parallelism.
warn(f"n_jobs value {self.n_jobs} overridden to 1 by setting random_state. Use no seed for parallelism.")
/Users/op/.pyenv/versions/3.9.18/lib/python3.9/site-packages/umap/umap_.py:1945: UserWarning: n_jobs value 1 overridden to 1 by setting random_state. Use no seed for parallelism.
warn(f"n_jobs value {self.n_jobs} overridden to 1 by setting random_state. Use no seed for parallelism.")
[[20.164154052734375, 1.3494281768798828], [21.097431182861328, 0.2964009642601013], [20.777090072631836, 0.6684482097625732], [20.44692611694336, 1.0685573816299438], [11.67434310913086, 17.12160301208496], [-3.4501354694366455, 15.270648002624512], [11.07783031463623, 17.718942642211914], [11.350298881530762, 17.448848724365234], [-3.1171295642852783, 15.60583209991455], [-2.912529230117798, 15.80562973022461]]
[[4.337563514709473, 8.263677597045898], [3.3291709423065186, 7.28176212310791], [3.681276798248291, 7.623903751373291], [4.056679725646973, 7.981446743011475], [-2.7793023586273193, 16.567930221557617], [8.226690292358398, -1.6328837871551514], [-3.3761720657348633, 17.164905548095703], [-3.1042020320892334, 16.89451026916504], [7.8915557861328125, -1.2999401092529297], [7.691712379455566, -1.0954537391662598]]
Traceback (most recent call last):
File "/Users/op/Documents/ETHZ/IVIA/umap-test/umap-lol.py", line 21, in <module>
assert a == b
AssertionError
Versions
joblib==1.3.2
llvmlite==0.42.0
numba==0.59.1
numpy==1.26.4
pynndescent==0.5.12
scikit-learn==1.4.1.post1
scipy==1.12.0
threadpoolctl==3.4.0
tqdm==4.66.2
umap-learn==0.5.6
Note that umap is directly installed from github but behaviour stays the same if installed via pypi.
Hi, I get the same warning. Any ideas on resolving it?
i met this problem in my project recently and the warning's exactly the same: n_jobs value 1 overridden to 1 issue #1081 seems to resolve this! try this
a = umap.UMAP(random_state=42, n_jobs=1, n_neighbors=2, n_components=2).fit_transform(din)
it runs with no warning but i don't really understand why... n_jobs value 1 overridden to 1? so the warning means that the default n_jobs has some problems, the original value 1 had a wrong class or something? search the warning in the original code link and it says
if self.n_jobs != 1 and self.random_state is not None:
self.n_jobs = 1
warn(f"n_jobs value {self.n_jobs} overridden to 1 by setting random_state. Use no seed for parallelism.")
the warning changed the problem parameter before reporting it...
then i wonder how it set the default value, and i find n_jobs=-1 in initial function, and nowhere tend to change it
after all i find it interesting, i'm having a look at the umap paper 2018 densMAP paper 2021
np.random.seed(42) solved my problem
Hi all, I just wanted to bump this thread and mention that I found when using random_state 0 or random_state 42 the results are non-deterministic. As a result, I believe this issue persists and should likely be addressed in future changes. My testing was performed using Python 3.9 on MacOS + Apple Silicone with umap-learn==0.5.7.