dbscan-python icon indicating copy to clipboard operation
dbscan-python copied to clipboard

macOS crash

Open kevinjohncutler opened this issue 10 months ago • 1 comments

Just reporting that on Apple Silicon macs, I need to set os.environ["PARLAY_NUM_THREADS"] = "1" to avoid a kernel crash. Any higher values cause instability when using dbscan in quick succession. This is not necessary on x86 linux.

P.S. This package is incredible, it speeds up Omnipose by nearly 2x. Will be sure to cite you.

kevinjohncutler avatar Jan 30 '25 09:01 kevinjohncutler

I add this comment just to say that I encountered the same error. I run dbscan in a jupyter notebook, and executing the same cell in quick succession causes high cpu usage and long execution time. When everything works, dbscan returns labels in few ms.

NorwegianGoat avatar Feb 08 '25 15:02 NorwegianGoat

I try to run the DBSCAN on ARM64 platform. After a few loop calls program will hung. (need to use kill -9 to force terminate the process)

sunplus@ubuntu ~/d/dbscan_comparison (main)> python check_memory_leaks.py
Memory usage after DBSCAN: 10.00 MB
Initial memory usage: 82.21 MB
Memory usage after DBSCAN: 110.24 MB
Memory difference: 28.03 MB
Iteration 1: Memory usage: 123.95 MB
Iteration 2: Memory usage: 140.57 MB
Iteration 3: Memory usage: 140.57 MB
fish: Job 1, 'python check_memory_leaks.py' terminated by signal SIGKILL (Forced quit)

The cause of the problem may be the same. We are testing on platforms that are not x86.
The DBSCAN from sklearn does not have this issue. Can simply reproduce the issue.

import os
import psutil
import numpy as np
import sklearn.cluster
from dbscan import DBSCAN

def get_memory_usage():
    process = psutil.Process(os.getpid())
    return process.memory_info().rss

initial_memory = get_memory_usage()
print(f"Initial memory usage: {initial_memory / 1024 / 1024:.2f} MB")

points = np.random.rand(100000, 3)
pcd_np = np.asarray(points)

labels_another, core_samples_mask = DBSCAN(pcd_np, eps=0.025, min_samples=10)

after_memory = get_memory_usage()
print(f"Memory usage after DBSCAN: {after_memory / 1024 / 1024:.2f} MB")

memory_diff = after_memory - initial_memory
print(f"Memory difference: {memory_diff / 1024 / 1024:.2f} MB")

num_iterations = 100

for i in range(num_iterations):
    points = np.random.rand(100000, 3)
    pcd_np = np.asarray(points)
    labels, core_samples = DBSCAN(pcd_np, eps=0.025, min_samples=10)
    # db = sklearn.cluster.DBSCAN(eps=0.025, min_samples=10, n_jobs=-1)
    # db.fit(pcd_np)
    current_memory = get_memory_usage()
    print(f"Iteration {i+1}: Memory usage: {current_memory / 1024 / 1024:.2f} MB")

wjxianjian avatar Apr 24 '25 07:04 wjxianjian