cuml
cuml copied to clipboard
kNN Classifier Accuracy deviating from scikit-learn[BUG]
Describe the bug I was comparing the results of my work converted to use cuML over scikit-learn, with respect to the kNN Classification. For cuML when I run a test size of 10% my test accuracy crosses above my training accuracy around k=100 but the same code ran on normal scikit-learn the accuracy curves stay strictly separated with no crossover. Then, when i increase the test size to 20% i get the opposite result with my cuML accuracy curves staying strictly separated and my scikit-learn curves beginning their crossover around k=60. will include a screenshot in the attachments.
Steps/Code to reproduce bug I have provided both sets of code using cuml and scikit-learn
Expected behavior I would expect the accuracy to be relatively the same using cuml and scikit-learn, however I am producing deviations.
Environment details (please complete the following information):
-
Environment location: home pc
-
Linux Distro/Architecture: Pop!_OS 22.04 LTS x86_64
-
GPU Model/Driver: NVIDIA GeForce RTX 4090
-
CPU Model: Ryzen 9 7950x
-
CUDA: when i run nvcc -V inside the rapids environment I get: Cuda compilation tools, release 11.5, V11.5.119 Build cuda_11.5.r11.5/compiler.30672275_0 when I run nvidia-smi i get: NVIDIA-SMI 545.29.06 Driver Version: 545.29.06 CUDA Version: 12.3
aethyn@pop-os:~$ neofetch ///////////// aethyn@pop-os ///////////////////// ------------- ///////767//////////////// OS: Pop!_OS 22.04 LTS x86_64 //////7676767676////////////// Kernel: 6.6.10-76060610-generic /////76767//7676767////////////// Uptime: 5 hours, 9 mins /////767676///76767/////////////// Packages: 1982 (dpkg), 25 (flatpak) ///////767676///76767.///7676/////// Shell: bash 5.1.16 /////////767676//76767///767676//////// Resolution: 3840x2160, 3840x2160, 3840x2160 //////////76767676767////76767///////// DE: GNOME 42.5 ///////////76767676//////7676////////// WM: Mutter ////////////,7676,///////767/////////// WM Theme: Pop /////////////*7676///////76//////////// Theme: Pop-dark [GTK2/3] ///////////////7676//////////////////// Icons: Pop [GTK2/3] ///////////////7676///767//////////// Terminal: gnome-terminal //////////////////////'//////////// CPU: AMD Ryzen 9 7950X (32) @ 5.881GHz //////.7676767676767676767,////// GPU: AMD ATI 6c:00.0 Device 164e /////767676767676767676767///// GPU: NVIDIA 01:00.0 NVIDIA Corporation Device 2684 /////////////////////////// Memory: 18801MiB / 63423MiB /////////////////////
- Method of cuDF & cuML install: miniconda3 (rapids-23.12) aethyn@pop-os:~/PycharmProjects/pythonProject$ conda list
packages in environment at /home/aethyn/miniconda3/envs/rapids-23.12:
Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_kmp_llvm conda-forge
absl-py 2.1.0 pyhd8ed1ab_0 conda-forge
aiohttp 3.9.1 py310h2372a71_0 conda-forge
aiosignal 1.3.1 pyhd8ed1ab_0 conda-forge
alsa-lib 1.2.10 hd590300_0 conda-forge
annotated-types 0.6.0 pyhd8ed1ab_0 conda-forge
anyio 4.2.0 pyhd8ed1ab_0 conda-forge
aom 3.8.1 h59595ed_0 conda-forge
appdirs 1.4.4 pyh9f0ad1d_0 conda-forge
argon2-cffi 23.1.0 pyhd8ed1ab_0 conda-forge
argon2-cffi-bindings 21.2.0 py310h2372a71_4 conda-forge
arrow 1.3.0 pyhd8ed1ab_0 conda-forge
asttokens 2.4.1 pyhd8ed1ab_0 conda-forge
astunparse 1.6.3 pyhd8ed1ab_0 conda-forge
async-lru 2.0.4 pyhd8ed1ab_0 conda-forge
async-timeout 4.0.3 pyhd8ed1ab_0 conda-forge
attr 2.5.1 h166bdaf_1 conda-forge
attrs 23.2.0 pyh71513ae_0 conda-forge
aws-c-auth 0.7.11 h0b4cabd_1 conda-forge
aws-c-cal 0.6.9 h14ec70c_3 conda-forge
aws-c-common 0.9.12 hd590300_0 conda-forge
aws-c-compression 0.2.17 h572eabf_8 conda-forge
aws-c-event-stream 0.4.1 h97bb272_2 conda-forge
aws-c-http 0.8.0 h9129f04_2 conda-forge
aws-c-io 0.14.0 hf8f278a_1 conda-forge
aws-c-mqtt 0.10.1 h2b97f5f_0 conda-forge
aws-c-s3 0.4.9 hca09fc5_0 conda-forge
aws-c-sdkutils 0.1.13 h572eabf_1 conda-forge
aws-checksums 0.1.17 h572eabf_7 conda-forge
aws-crt-cpp 0.26.0 h04327c0_8 conda-forge
aws-sdk-cpp 1.11.210 hba3e011_10 conda-forge
azure-core-cpp 1.10.3 h91d86a7_1 conda-forge
azure-storage-blobs-cpp 12.10.0 h00ab1b0_0 conda-forge
azure-storage-common-cpp 12.5.0 hb858b4b_2 conda-forge
babel 2.14.0 pyhd8ed1ab_0 conda-forge
beautifulsoup4 4.12.3 pyha770c72_0 conda-forge
bleach 6.1.0 pyhd8ed1ab_0 conda-forge
blinker 1.7.0 pyhd8ed1ab_0 conda-forge
blosc 1.21.5 h0f2a231_0 conda-forge
bokeh 3.3.4 pyhd8ed1ab_0 conda-forge
branca 0.7.1 pyhd8ed1ab_0 conda-forge
brotli 1.1.0 hd590300_1 conda-forge
brotli-bin 1.1.0 hd590300_1 conda-forge
brotli-python 1.1.0 py310hc6cd4ac_1 conda-forge
brunsli 0.1 h9c3ff4c_0 conda-forge
bzip2 1.0.8 hd590300_5 conda-forge
c-ares 1.26.0 hd590300_0 conda-forge
c-blosc2 2.13.2 hb4ffafa_0 conda-forge
ca-certificates 2024.2.2 hbcca054_0 conda-forge
cached-property 1.5.2 hd8ed1ab_1 conda-forge
cached_property 1.5.2 pyha770c72_1 conda-forge
cachetools 5.3.2 pyhd8ed1ab_0 conda-forge
cairo 1.18.0 h3faef2a_0 conda-forge
certifi 2024.2.2 py310h06a4308_0
cffi 1.16.0 py310h2fee648_0 conda-forge
cfitsio 4.3.1 hbdc6101_0 conda-forge
charls 2.4.2 h59595ed_0 conda-forge
charset-normalizer 3.3.2 pyhd8ed1ab_0 conda-forge
click 8.1.7 unix_pyh707e725_0 conda-forge
click-plugins 1.1.1 py_0 conda-forge
cligj 0.7.2 pyhd8ed1ab_1 conda-forge
cloudpickle 3.0.0 pyhd8ed1ab_0 conda-forge
colorama 0.4.6 pyhd8ed1ab_0 conda-forge
colorcet 3.0.1 pyhd8ed1ab_0 conda-forge
comm 0.2.1 pyhd8ed1ab_0 conda-forge
contourpy 1.2.0 py310hd41b1e2_0 conda-forge
cryptography 42.0.2 py310hb8475ec_0 conda-forge
cubinlinker 0.3.0 py310hfdf336d_0 rapidsai
cucim 23.12.01 cuda11_py310_231211_ga3445df_0 rapidsai
cuda-profiler-api 11.8.86 0 nvidia
cuda-python 11.8.3 py310h70a93da_0 conda-forge
cuda-version 11.5 h6c6c5af_2 conda-forge
cudatoolkit 11.5.2 hbdc67f6_13 conda-forge
cudf 23.12.01 cuda11_py310_231208_g2ce46216b5_0 rapidsai
cudf_kafka 23.12.01 cuda11_py310_231208_g2ce46216b5_0 rapidsai
cudnn 8.8.0.121 hcdd5f01_4 conda-forge
cugraph 23.12.00 cuda11_py310_231206_g1309813f_0 rapidsai
cuml 23.12.00 cuda11_py310_231206_gad2bd2b65_0 rapidsai
cuproj 23.12.01 cuda11_py310_231207_g16727064_0 rapidsai
cupy 13.0.0 py310h189a05f_3 conda-forge
cupy-core 13.0.0 py310h506062a_3 conda-forge
cuspatial 23.12.01 cuda11_py310_231207_g16727064_0 rapidsai
custreamz 23.12.01 cuda11_py310_231208_g2ce46216b5_0 rapidsai
cuxfilter 23.12.00 cuda11_py310_231206_g63dabeb_0 rapidsai
cycler 0.12.1 pyhd8ed1ab_0 conda-forge
cyrus-sasl 2.1.27 h54b06d7_7 conda-forge
cytoolz 0.12.3 py310h2372a71_0 conda-forge
dash 2.15.0 pyhd8ed1ab_0 conda-forge
dask 2023.11.0 pyhd8ed1ab_0 conda-forge
dask-core 2023.11.0 pyhd8ed1ab_0 conda-forge
dask-cuda 23.12.00 py310_231206_ge1638ae_0 rapidsai
dask-cudf 23.12.01 cuda11_py310_231208_g2ce46216b5_0 rapidsai
dask-sql 2024.1.0 py310hac45122_0 conda-forge
datashader 0.16.0 pyhd8ed1ab_0 conda-forge
dav1d 1.2.1 hd590300_0 conda-forge
dbus 1.13.18 hb2f20db_0
debugpy 1.8.1 py310hc6cd4ac_0 conda-forge
decorator 5.1.1 pyhd8ed1ab_0 conda-forge
defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge
distributed 2023.11.0 pyhd8ed1ab_0 conda-forge
dlpack 0.5 h9c3ff4c_0 conda-forge
entrypoints 0.4 pyhd8ed1ab_0 conda-forge
exceptiongroup 1.2.0 pyhd8ed1ab_2 conda-forge
executing 2.0.1 pyhd8ed1ab_0 conda-forge
expat 2.5.0 hcb278e6_1 conda-forge
fastapi 0.103.0 pyhd8ed1ab_0 conda-forge
fastrlock 0.8.2 py310hc6cd4ac_2 conda-forge
filelock 3.13.1 pyhd8ed1ab_0 conda-forge
fiona 1.9.5 py310h0a1e91f_2 conda-forge
flask 3.0.2 pyhd8ed1ab_0 conda-forge
flatbuffers 23.5.26 h59595ed_1 conda-forge
fmt 9.1.0 h924138e_0 conda-forge
folium 0.15.1 pyhd8ed1ab_0 conda-forge
font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge
font-ttf-inconsolata 3.000 h77eed37_0 conda-forge
font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge
font-ttf-ubuntu 0.83 h77eed37_1 conda-forge
fontconfig 2.14.2 h14ed4e7_0 conda-forge
fonts-conda-ecosystem 1 0 conda-forge
fonts-conda-forge 1 0 conda-forge
fonttools 4.48.1 py310h2372a71_0 conda-forge
fqdn 1.5.1 pyhd8ed1ab_0 conda-forge
freetype 2.12.1 h267a509_2 conda-forge
freexl 2.0.0 h743c826_0 conda-forge
frozenlist 1.4.1 py310h2372a71_0 conda-forge
fsspec 2024.2.0 pyhca7485f_0 conda-forge
gast 0.5.4 pyhd8ed1ab_0 conda-forge
gdal 3.8.1 py310haaa150b_3 conda-forge
gdk-pixbuf 2.42.10 h829c605_4 conda-forge
geopandas 0.14.3 pyhd8ed1ab_0 conda-forge
geopandas-base 0.14.3 pyha770c72_0 conda-forge
geos 3.12.1 h59595ed_0 conda-forge
geotiff 1.7.1 hf074850_14 conda-forge
gettext 0.21.1 h27087fc_0 conda-forge
gflags 2.2.2 he1b5a44_1004 conda-forge
giflib 5.2.1 h0b41bf4_3 conda-forge
glib 2.78.3 hfc55251_0 conda-forge
glib-tools 2.78.3 hfc55251_0 conda-forge
glog 0.6.0 h6f12383_0 conda-forge
gmock 1.14.0 ha770c72_1 conda-forge
gmp 6.3.0 h59595ed_0 conda-forge
gmpy2 2.1.2 py310h3ec546c_1 conda-forge
google-auth 2.27.0 pyhca7485f_0 conda-forge
google-auth-oauthlib 1.2.0 pyhd8ed1ab_0 conda-forge
google-pasta 0.2.0 pyh8c360ce_0 conda-forge
graphistry 0.33.0 pyhd8ed1ab_0 conda-forge
graphite2 1.3.14 h295c915_1
grpcio 1.59.3 py310h1b8f574_0 conda-forge
gst-plugins-base 1.22.9 h8e1006c_0 conda-forge
gstreamer 1.22.9 h98fc4e7_0 conda-forge
gtest 1.14.0 h00ab1b0_1 conda-forge
h11 0.14.0 pyhd8ed1ab_0 conda-forge
h2 4.1.0 pyhd8ed1ab_0 conda-forge
h5py 3.10.0 nompi_py310h65828d5_101 conda-forge
harfbuzz 8.3.0 h3d44ed6_0 conda-forge
hdf4 4.2.15 h2a13503_7 conda-forge
hdf5 1.14.3 nompi_h4f84152_100 conda-forge
holoviews 1.18.2 pyhd8ed1ab_0 conda-forge
hpack 4.0.0 pyh9f0ad1d_0 conda-forge
httpcore 1.0.2 pyhd8ed1ab_0 conda-forge
httpx 0.26.0 pyhd8ed1ab_0 conda-forge
hyperframe 6.0.1 pyhd8ed1ab_0 conda-forge
icu 73.2 h59595ed_0 conda-forge
idna 3.6 pyhd8ed1ab_0 conda-forge
imagecodecs 2024.1.1 py310h496a806_0 conda-forge
imageio 2.33.1 pyh8c1a49c_0 conda-forge
importlib-metadata 7.0.1 pyha770c72_0 conda-forge
importlib_metadata 7.0.1 hd8ed1ab_0 conda-forge
importlib_resources 6.1.1 pyhd8ed1ab_0 conda-forge
ipykernel 6.29.2 pyhd33586a_0 conda-forge
ipython 8.21.0 pyh707e725_0 conda-forge
ipywidgets 8.0.4 py310h06a4308_0
isoduration 20.11.0 pyhd8ed1ab_0 conda-forge
itsdangerous 2.1.2 pyhd8ed1ab_0 conda-forge
jbig 2.1 h7f98852_2003 conda-forge
jedi 0.19.1 pyhd8ed1ab_0 conda-forge
jinja2 3.1.3 pyhd8ed1ab_0 conda-forge
joblib 1.3.2 pyhd8ed1ab_0 conda-forge
json-c 0.17 h7ab15ed_0 conda-forge
json5 0.9.14 pyhd8ed1ab_0 conda-forge
jsonpointer 2.4 py310hff52083_3 conda-forge
jsonschema 4.21.1 pyhd8ed1ab_0 conda-forge
jsonschema-specifications 2023.12.1 pyhd8ed1ab_0 conda-forge
jsonschema-with-format-nongpl 4.21.1 pyhd8ed1ab_0 conda-forge
jupyter 1.0.0 py310h06a4308_8
jupyter-lsp 2.2.2 pyhd8ed1ab_0 conda-forge
jupyter-server-proxy 4.1.0 pyhd8ed1ab_0 conda-forge
jupyter_client 8.6.0 pyhd8ed1ab_0 conda-forge
jupyter_console 6.6.3 py310h06a4308_0
jupyter_core 5.7.1 py310hff52083_0 conda-forge
jupyter_events 0.9.0 pyhd8ed1ab_0 conda-forge
jupyter_server 2.12.5 pyhd8ed1ab_0 conda-forge
jupyter_server_terminals 0.5.2 pyhd8ed1ab_0 conda-forge
jupyterlab 4.1.0 pyhd8ed1ab_0 conda-forge
jupyterlab_pygments 0.3.0 pyhd8ed1ab_1 conda-forge
jupyterlab_server 2.25.2 pyhd8ed1ab_0 conda-forge
jupyterlab_widgets 3.0.9 py310h06a4308_0
jxrlib 1.1 hd590300_3 conda-forge
kealib 1.5.3 h2f55d51_0 conda-forge
keras 2.15.0 pyhd8ed1ab_0 conda-forge
keyutils 1.6.1 h166bdaf_0 conda-forge
kiwisolver 1.4.5 py310hd41b1e2_1 conda-forge
krb5 1.21.2 h659d440_0 conda-forge
lame 3.100 h7b6447c_0
lazy_loader 0.3 pyhd8ed1ab_0 conda-forge
lcms2 2.16 hb7c19ff_0 conda-forge
ld_impl_linux-64 2.40 h41732ed_0 conda-forge
lerc 4.0.0 h27087fc_0 conda-forge
libabseil 20230802.1 cxx17_h59595ed_0 conda-forge
libaec 1.1.2 h59595ed_1 conda-forge
libarchive 3.7.2 h2aa1ff5_1 conda-forge
libarrow 14.0.2 h84dd17c_3_cpu conda-forge
libarrow-acero 14.0.2 h59595ed_3_cpu conda-forge
libarrow-dataset 14.0.2 h59595ed_3_cpu conda-forge
libarrow-flight 14.0.2 h120cb0d_3_cpu conda-forge
libarrow-flight-sql 14.0.2 h61ff412_3_cpu conda-forge
libarrow-gandiva 14.0.2 hacb8726_3_cpu conda-forge
libarrow-substrait 14.0.2 h61ff412_3_cpu conda-forge
libavif16 1.0.4 h1dcd450_0 conda-forge
libblas 3.9.0 21_linux64_openblas conda-forge
libboost-headers 1.84.0 ha770c72_0 conda-forge
libbrotlicommon 1.1.0 hd590300_1 conda-forge
libbrotlidec 1.1.0 hd590300_1 conda-forge
libbrotlienc 1.1.0 hd590300_1 conda-forge
libcap 2.69 h0f662aa_0 conda-forge
libcblas 3.9.0 21_linux64_openblas conda-forge
libclang 15.0.7 default_hb11cfb5_4 conda-forge
libclang13 15.0.7 default_ha2b6cf4_4 conda-forge
libcrc32c 1.1.2 h9c3ff4c_0 conda-forge
libcublas 11.11.3.6 0 nvidia
libcublas-dev 11.11.3.6 0 nvidia
libcucim 23.12.01 cuda11_231211_ga3445df_0 rapidsai
libcudf 23.12.01 cuda11_231208_g2ce46216b5_0 rapidsai
libcudf_kafka 23.12.01 cuda11_231208_g2ce46216b5_0 rapidsai
libcufft 10.9.0.58 0 nvidia
libcufile 1.4.0.31 0 nvidia
libcufile-dev 1.4.0.31 0 nvidia
libcugraph 23.12.00 cuda11_231206_g1309813f_0 rapidsai
libcugraph_etl 23.12.00 cuda11_231206_g1309813f_0 rapidsai
libcugraphops 23.12.00 cuda11_231206_g42d08202_0 nvidia
libcuml 23.12.00 cuda11_231206_gad2bd2b65_0 rapidsai
libcumlprims 23.12.00 cuda11_231206_gc120fe0_0 nvidia
libcups 2.3.3 h4637d8d_4 conda-forge
libcurand 10.3.0.86 0 nvidia
libcurand-dev 10.3.0.86 0 nvidia
libcurl 8.5.0 hca28451_0 conda-forge
libcusolver 11.4.1.48 0 nvidia
libcusolver-dev 11.4.1.48 0 nvidia
libcusparse 11.7.5.86 0 nvidia
libcusparse-dev 11.7.5.86 0 nvidia
libcuspatial 23.12.01 cuda11_231207_g16727064_0 rapidsai
libdeflate 1.19 hd590300_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 hd590300_2 conda-forge
libevent 2.1.12 hf998b51_1 conda-forge
libexpat 2.5.0 hcb278e6_1 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libflac 1.4.3 h59595ed_0 conda-forge
libgcc-ng 13.2.0 h807b86a_5 conda-forge
libgcrypt 1.10.3 hd590300_0 conda-forge
libgdal 3.8.1 h4b8bffa_3 conda-forge
libgfortran-ng 13.2.0 h69a702a_5 conda-forge
libgfortran5 13.2.0 ha4646dd_5 conda-forge
libglib 2.78.3 h783c2da_0 conda-forge
libgoogle-cloud 2.12.0 h5206363_4 conda-forge
libgpg-error 1.47 h71f35ed_0 conda-forge
libgrpc 1.59.3 hd6c4280_0 conda-forge
libhwloc 2.9.3 default_h554bfaf_1009 conda-forge
libiconv 1.17 hd590300_2 conda-forge
libjpeg-turbo 3.0.0 hd590300_1 conda-forge
libkml 1.3.0 h01aab08_1018 conda-forge
libkvikio 23.12.00 cuda11_231206_gf90bfbe_0 rapidsai
liblapack 3.9.0 21_linux64_openblas conda-forge
libllvm14 14.0.6 hcd5def8_4 conda-forge
libllvm15 15.0.7 hb3ce162_4 conda-forge
libmagma 2.7.2 h09159a4_2 conda-forge
libmagma_sparse 2.7.2 h09159a4_2 conda-forge
libnetcdf 4.9.2 nompi_h9612171_113 conda-forge
libnghttp2 1.58.0 h47da74e_1 conda-forge
libnl 3.9.0 hd590300_0 conda-forge
libnsl 2.0.1 hd590300_0 conda-forge
libntlm 1.4 h7f98852_1002 conda-forge
libnuma 2.0.16 h0b41bf4_1 conda-forge
libogg 1.3.5 h27cfd23_1
libopenblas 0.3.26 pthreads_h413a1c8_0 conda-forge
libopus 1.3.1 h7b6447c_0
libparquet 14.0.2 h352af49_3_cpu conda-forge
libpng 1.6.42 h2797004_0 conda-forge
libpq 16.2 h33b98f1_0 conda-forge
libprotobuf 4.24.4 hf27288f_0 conda-forge
libraft 23.12.00 cuda11_231206_g9e2d6277_0 rapidsai
libraft-headers 23.12.00 cuda11_231206_g9e2d6277_0 rapidsai
libraft-headers-only 23.12.00 cuda11_231206_g9e2d6277_0 rapidsai
librdkafka 1.9.2 ha5a0de0_2 conda-forge
libre2-11 2023.06.02 h7a70373_0 conda-forge
librmm 23.12.00 cuda11_231206_g2db5cbb3_0 rapidsai
librttopo 1.1.0 h8917695_15 conda-forge
libsndfile 1.2.2 hc60ed4a_1 conda-forge
libsodium 1.0.18 h36c2ea0_1 conda-forge
libspatialindex 1.9.3 h9c3ff4c_4 conda-forge
libspatialite 5.1.0 h72606ae_3 conda-forge
libsqlite 3.45.1 h2797004_0 conda-forge
libssh2 1.11.0 h0841786_0 conda-forge
libstdcxx-ng 13.2.0 h7e041cc_5 conda-forge
libsystemd0 255 h3516f8a_0 conda-forge
libthrift 0.19.0 hb90f79a_1 conda-forge
libtiff 4.6.0 ha9c0a0a_2 conda-forge
libtorch 2.1.2 cuda112_hce05544_300 conda-forge
libutf8proc 2.8.0 h166bdaf_0 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libuv 1.46.0 hd590300_0 conda-forge
libvorbis 1.3.7 h7b6447c_0
libwebp 1.3.2 h658648e_1 conda-forge
libwebp-base 1.3.2 hd590300_0 conda-forge
libxcb 1.15 h0b41bf4_0 conda-forge
libxcrypt 4.4.36 hd590300_1 conda-forge
libxgboost 1.7.6 rapidsai_he275d05_7 rapidsai
libxkbcommon 1.6.0 hd429924_1 conda-forge
libxml2 2.12.5 h232c23b_0 conda-forge
libzip 1.10.1 h2629f0a_3 conda-forge
libzlib 1.2.13 hd590300_5 conda-forge
libzopfli 1.0.3 h9c3ff4c_0 conda-forge
linkify-it-py 2.0.3 pyhd8ed1ab_0 conda-forge
llvm-openmp 17.0.6 h4dfa4b3_0 conda-forge
llvmlite 0.40.1 py310h1b8f574_0 conda-forge
locket 1.0.0 pyhd8ed1ab_0 conda-forge
lz4 4.3.3 py310h350c4a5_0 conda-forge
lz4-c 1.9.4 hcb278e6_0 conda-forge
lzo 2.10 h516909a_1000 conda-forge
magma 2.7.2 h2cf16e7_2 conda-forge
mapclassify 2.6.1 pyhd8ed1ab_0 conda-forge
markdown 3.5.2 pyhd8ed1ab_0 conda-forge
markdown-it-py 3.0.0 pyhd8ed1ab_0 conda-forge
markupsafe 2.1.5 py310h2372a71_0 conda-forge
matplotlib-base 3.8.2 py310h62c0568_0 conda-forge
matplotlib-inline 0.1.6 pyhd8ed1ab_0 conda-forge
mdit-py-plugins 0.4.0 pyhd8ed1ab_0 conda-forge
mdurl 0.1.2 pyhd8ed1ab_0 conda-forge
minizip 4.0.4 h0ab5242_0 conda-forge
mistune 3.0.2 pyhd8ed1ab_0 conda-forge
mkl 2023.2.0 h84fe81f_50496 conda-forge
ml_dtypes 0.2.0 py310hcc13569_2 conda-forge
mpc 1.3.1 hfe3b2da_0 conda-forge
mpfr 4.2.1 h9458935_0 conda-forge
mpg123 1.32.4 h59595ed_0 conda-forge
mpmath 1.3.0 pyhd8ed1ab_0 conda-forge
msgpack-python 1.0.7 py310hd41b1e2_0 conda-forge
multidict 6.0.5 py310h2372a71_0 conda-forge
multipledispatch 0.6.0 py_0 conda-forge
munkres 1.1.4 pyh9f0ad1d_0 conda-forge
mysql-common 8.0.33 hf1915f5_6 conda-forge
mysql-libs 8.0.33 hca2cd23_6 conda-forge
nbclient 0.8.0 pyhd8ed1ab_0 conda-forge
nbconvert 7.16.0 pyhd8ed1ab_0 conda-forge
nbconvert-core 7.16.0 pyhd8ed1ab_0 conda-forge
nbconvert-pandoc 7.16.0 pyhd8ed1ab_0 conda-forge
nbformat 5.9.2 pyhd8ed1ab_0 conda-forge
nccl 2.19.4.1 h0800d71_0 conda-forge
ncurses 6.4 h59595ed_2 conda-forge
nest-asyncio 1.6.0 pyhd8ed1ab_0 conda-forge
networkx 3.2.1 pyhd8ed1ab_0 conda-forge
nodejs 20.9.0 hb753e55_0 conda-forge
noise 1.2.2 py310h2372a71_1005 conda-forge
notebook 7.0.6 py310h06a4308_0
notebook-shim 0.2.3 pyhd8ed1ab_0 conda-forge
nspr 4.35 h27087fc_0 conda-forge
nss 3.97 h1d7d5a4_0 conda-forge
numba 0.57.1 py310h0f6aa51_0 conda-forge
numpy 1.23.4 py310h53a5b5f_1 conda-forge
nvcomp 3.0.4 h838ba91_1 conda-forge
nvtx 0.2.8 py310h2372a71_1 conda-forge
oauthlib 3.2.2 pyhd8ed1ab_0 conda-forge
openjpeg 2.5.0 h488ebb8_3 conda-forge
openslide 3.4.1 h58ba908_12 conda-forge
openssl 3.2.1 hd590300_0 conda-forge
opt_einsum 3.3.0 pyhc1e730c_2 conda-forge
orc 1.9.2 h4b38347_0 conda-forge
overrides 7.7.0 pyhd8ed1ab_0 conda-forge
packaging 23.2 pyhd8ed1ab_0 conda-forge
palettable 3.3.3 pyhd8ed1ab_0 conda-forge
pandas 1.5.3 py310h9b08913_1 conda-forge
pandoc 3.1.11.1 ha770c72_0 conda-forge
pandocfilters 1.5.0 pyhd8ed1ab_0 conda-forge
panel 1.3.8 pyhd8ed1ab_0 conda-forge
param 2.0.2 pyhca7485f_0 conda-forge
parso 0.8.3 pyhd8ed1ab_0 conda-forge
partd 1.4.1 pyhd8ed1ab_0 conda-forge
pcre2 10.42 hcad00b1_0 conda-forge
pexpect 4.9.0 pyhd8ed1ab_0 conda-forge
pickleshare 0.7.5 py_1003 conda-forge
pillow 10.2.0 py310h01dd4db_0 conda-forge
pip 24.0 pyhd8ed1ab_0 conda-forge
pixman 0.43.2 h59595ed_0 conda-forge
pkgutil-resolve-name 1.3.10 pyhd8ed1ab_1 conda-forge
platformdirs 4.2.0 pyhd8ed1ab_0 conda-forge
plotly 5.18.0 pyhd8ed1ab_0 conda-forge
ply 3.11 py310h06a4308_0
poppler 23.12.0 h590f24d_0 conda-forge
poppler-data 0.4.12 hd8ed1ab_0 conda-forge
postgresql 16.2 h7387d8b_0 conda-forge
proj 9.3.0 h1d62c97_2 conda-forge
prometheus_client 0.19.0 pyhd8ed1ab_0 conda-forge
prompt-toolkit 3.0.42 pyha770c72_0 conda-forge
prompt_toolkit 3.0.42 hd8ed1ab_0 conda-forge
protobuf 4.24.4 py310h620c231_0 conda-forge
psutil 5.9.8 py310h2372a71_0 conda-forge
pthread-stubs 0.4 h36c2ea0_1001 conda-forge
ptxcompiler 0.8.1 py310h70a93da_2 conda-forge
ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge
pulseaudio-client 16.1 hb77b528_5 conda-forge
pure_eval 0.2.2 pyhd8ed1ab_0 conda-forge
py-xgboost 1.7.6 rapidsai_py310h4c2db5f_7 rapidsai
pyarrow 14.0.2 py310hf9e7431_3_cpu conda-forge
pyarrow-hotfix 0.6 pyhd8ed1ab_0 conda-forge
pyasn1 0.5.1 pyhd8ed1ab_0 conda-forge
pyasn1-modules 0.3.0 pyhd8ed1ab_0 conda-forge
pycparser 2.21 pyhd8ed1ab_0 conda-forge
pyct 0.5.0 py310h06a4308_0
pyct-core 0.5.0 pyhd8ed1ab_0 conda-forge
pydantic 2.6.1 pyhd8ed1ab_0 conda-forge
pydantic-core 2.16.2 py310hcb5633a_1 conda-forge
pyee 8.1.0 pyhd8ed1ab_0 conda-forge
pygments 2.17.2 pyhd8ed1ab_0 conda-forge
pyjwt 2.8.0 pyhd8ed1ab_1 conda-forge
pylibcugraph 23.12.00 cuda11_py310_231206_g1309813f_0 rapidsai
pylibraft 23.12.00 cuda11_py310_231206_g9e2d6277_0 rapidsai
pynvml 11.4.1 pyhd8ed1ab_0 conda-forge
pyopenssl 24.0.0 pyhd8ed1ab_0 conda-forge
pyparsing 3.1.1 pyhd8ed1ab_0 conda-forge
pyppeteer 1.0.2 pyhd8ed1ab_0 conda-forge
pyproj 3.6.1 py310h32c33b7_4 conda-forge
pyqt 5.15.10 py310h6a678d5_0
pyqt5-sip 12.13.0 py310h5eee18b_0
pysocks 1.7.1 pyha2e5f31_6 conda-forge
python 3.10.13 hd12c33a_1_cpython conda-forge
python-confluent-kafka 1.9.2 py310h5764c6d_2 conda-forge
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
python-fastjsonschema 2.19.1 pyhd8ed1ab_0 conda-forge
python-flatbuffers 23.5.26 pyhd8ed1ab_0 conda-forge
python-json-logger 2.0.7 pyhd8ed1ab_0 conda-forge
python_abi 3.10 4_cp310 conda-forge
pytorch 2.1.2 cuda112_py310hce1e03f_300 conda-forge
pytz 2024.1 pyhd8ed1ab_0 conda-forge
pyu2f 0.1.5 pyhd8ed1ab_0 conda-forge
pyviz_comms 3.0.0 pyhd8ed1ab_0 conda-forge
pywavelets 1.4.1 py310h1f7b6fc_1 conda-forge
pyyaml 6.0.1 py310h2372a71_1 conda-forge
pyzmq 25.1.2 py310h795f18f_0 conda-forge
qt-main 5.15.8 h450f30e_18 conda-forge
qtconsole 5.5.0 py310h06a4308_0
qtpy 2.4.1 py310h06a4308_0
raft-dask 23.12.00 cuda11_py310_231206_g9e2d6277_0 rapidsai
rapids 23.12.00 cuda11_py310_231206_g1d8bed4_0 rapidsai
rapids-dask-dependency 23.12.01 0 rapidsai
rapids-xgboost 23.12.00 cuda11_py310_231206_g1d8bed4_0 rapidsai
rav1e 0.6.6 he8a937b_2 conda-forge
rdma-core 50.0 hd3aeb46_0 conda-forge
re2 2023.06.02 h2873b5e_0 conda-forge
readline 8.2 h8228510_1 conda-forge
referencing 0.33.0 pyhd8ed1ab_0 conda-forge
requests 2.31.0 pyhd8ed1ab_0 conda-forge
requests-oauthlib 1.3.1 pyhd8ed1ab_0 conda-forge
retrying 1.3.3 py_2 conda-forge
rfc3339-validator 0.1.4 pyhd8ed1ab_0 conda-forge
rfc3986-validator 0.1.1 pyh9f0ad1d_0 conda-forge
rich 13.7.0 pyhd8ed1ab_0 conda-forge
rmm 23.12.00 cuda11_py310_231206_g2db5cbb3_0 rapidsai
rpds-py 0.17.1 py310hcb5633a_0 conda-forge
rsa 4.9 pyhd8ed1ab_0 conda-forge
rtree 1.2.0 py310hbdcdc62_0 conda-forge
s2n 1.4.1 h06160fa_0 conda-forge
scikit-image 0.21.0 py310hc6cd4ac_0 conda-forge
scikit-learn 1.4.0 py310h1fdf081_0 conda-forge
scipy 1.12.0 py310hb13e2d6_2 conda-forge
seaborn 0.12.2 py310h06a4308_0
send2trash 1.8.2 pyh41d4057_0 conda-forge
setuptools 69.0.3 pyhd8ed1ab_0 conda-forge
shapely 2.0.2 py310hc3e127f_1 conda-forge
simpervisor 1.0.0 pyhd8ed1ab_0 conda-forge
sip 6.7.12 py310h6a678d5_0
six 1.16.0 pyh6c4a22f_0 conda-forge
sleef 3.5.1 h9b69904_2 conda-forge
snappy 1.1.10 h9fff704_0 conda-forge
sniffio 1.3.0 pyhd8ed1ab_0 conda-forge
sortedcontainers 2.4.0 pyhd8ed1ab_0 conda-forge
soupsieve 2.5 pyhd8ed1ab_1 conda-forge
spdlog 1.11.0 h9b3ece8_1 conda-forge
sqlite 3.45.1 h2c6b66d_0 conda-forge
squarify 0.4.3 py_0 conda-forge
stack_data 0.6.2 pyhd8ed1ab_0 conda-forge
starlette 0.27.0 pyhd8ed1ab_0 conda-forge
streamz 0.6.4 pyh6c4a22f_0 conda-forge
svt-av1 1.8.0 h59595ed_0 conda-forge
sympy 1.12 pypyh9d50eac_103 conda-forge
tabulate 0.9.0 pyhd8ed1ab_1 conda-forge
tbb 2021.11.0 h00ab1b0_1 conda-forge
tblib 3.0.0 pyhd8ed1ab_0 conda-forge
tenacity 8.2.3 pyhd8ed1ab_0 conda-forge
tensorboard 2.15.2 pyhd8ed1ab_0 conda-forge
tensorboard-data-server 0.7.0 py310h75e40e8_1 conda-forge
tensorflow 2.15.0 cpu_py310h7825f03_2 conda-forge
tensorflow-base 2.15.0 cpu_py310h7e4d085_2 conda-forge
tensorflow-estimator 2.15.0 cpu_py310haacee6a_2 conda-forge
termcolor 2.4.0 pyhd8ed1ab_0 conda-forge
terminado 0.18.0 pyh0d859eb_0 conda-forge
threadpoolctl 3.2.0 pyha21a80b_0 conda-forge
tifffile 2024.1.30 pyhd8ed1ab_0 conda-forge
tiledb 2.18.4 h4386cac_0 conda-forge
tinycss2 1.2.1 pyhd8ed1ab_0 conda-forge
tk 8.6.13 noxft_h4845f30_101 conda-forge
tomli 2.0.1 pyhd8ed1ab_0 conda-forge
toolz 0.12.1 pyhd8ed1ab_0 conda-forge
tornado 6.3.3 py310h2372a71_1 conda-forge
tqdm 4.66.2 pyhd8ed1ab_0 conda-forge
traitlets 5.14.1 pyhd8ed1ab_0 conda-forge
treelite 3.9.1 py310h4a6579d_0 conda-forge
treelite-runtime 3.9.1 pypi_0 pypi
types-python-dateutil 2.8.19.20240106 pyhd8ed1ab_0 conda-forge
typing-extensions 4.9.0 hd8ed1ab_0 conda-forge
typing_extensions 4.9.0 pyha770c72_0 conda-forge
typing_utils 0.1.0 pyhd8ed1ab_0 conda-forge
tzcode 2024a h3f72095_0 conda-forge
tzdata 2024a h0c530f3_0 conda-forge
tzlocal 5.2 py310hff52083_0 conda-forge
uc-micro-py 1.0.3 pyhd8ed1ab_0 conda-forge
ucx 1.15.0 h75e419f_3 conda-forge
ucx-proc 1.0.0 gpu rapidsai
ucx-py 0.35.00 py310_231206_gb5f60ca_0 rapidsai
unicodedata2 15.1.0 py310h2372a71_0 conda-forge
uri-template 1.3.0 pyhd8ed1ab_0 conda-forge
uriparser 0.9.7 hcb278e6_1 conda-forge
urllib3 1.26.18 pyhd8ed1ab_0 conda-forge
uvicorn 0.27.1 py310hff52083_0 conda-forge
wcwidth 0.2.13 pyhd8ed1ab_0 conda-forge
webcolors 1.13 pyhd8ed1ab_0 conda-forge
webencodings 0.5.1 pyhd8ed1ab_2 conda-forge
websocket-client 1.7.0 pyhd8ed1ab_0 conda-forge
websockets 10.4 py310h5764c6d_1 conda-forge
werkzeug 3.0.1 pyhd8ed1ab_0 conda-forge
wheel 0.42.0 pyhd8ed1ab_0 conda-forge
widgetsnbextension 4.0.5 py310h06a4308_0
wrapt 1.14.1 py310h5764c6d_1 conda-forge
xarray 2024.1.1 pyhd8ed1ab_0 conda-forge
xarray-spatial 0.3.7 pyhd8ed1ab_0 conda-forge
xcb-util 0.4.0 hd590300_1 conda-forge
xcb-util-image 0.4.0 h8ee46fc_1 conda-forge
xcb-util-keysyms 0.4.0 h8ee46fc_1 conda-forge
xcb-util-renderutil 0.3.9 hd590300_1 conda-forge
xcb-util-wm 0.4.1 h8ee46fc_1 conda-forge
xerces-c 3.2.5 hac6953d_0 conda-forge
xgboost 1.7.6 rapidsai_py310h4c2db5f_7 rapidsai
xkeyboard-config 2.41 hd590300_0 conda-forge
xorg-kbproto 1.0.7 h7f98852_1002 conda-forge
xorg-libice 1.1.1 hd590300_0 conda-forge
xorg-libsm 1.2.4 h7391055_0 conda-forge
xorg-libx11 1.8.7 h8ee46fc_0 conda-forge
xorg-libxau 1.0.11 hd590300_0 conda-forge
xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge
xorg-libxext 1.3.4 h0b41bf4_2 conda-forge
xorg-libxrender 0.9.11 hd590300_0 conda-forge
xorg-renderproto 0.11.1 h7f98852_1002 conda-forge
xorg-xextproto 7.3.0 h0b41bf4_1003 conda-forge
xorg-xf86vidmodeproto 2.3.1 h7f98852_1002 conda-forge
xorg-xproto 7.0.31 h7f98852_1007 conda-forge
xyzservices 2023.10.1 pyhd8ed1ab_0 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
yaml 0.2.5 h7f98852_2 conda-forge
yarl 1.9.4 py310h2372a71_0 conda-forge
zeromq 4.3.5 h59595ed_0 conda-forge
zfp 1.0.1 h59595ed_0 conda-forge
zict 3.0.0 pyhd8ed1ab_0 conda-forge
zipp 3.17.0 pyhd8ed1ab_0 conda-forge
zlib 1.2.13 hd590300_5 conda-forge
zlib-ng 2.0.7 h0b41bf4_0 conda-forge
zstd 1.5.5 hfc55251_0 conda-forge
(rapids-23.12) aethyn@pop-os:~/PycharmProjects/pythonProject$
Additional context duplicate_this.zip
@evanhowington Thanks for the issue! You mentioned on Slack that the zip file with your data wasn't uploaded. Can you try that again? There is a 25 MB file size limit for zip files, so you may need to split up the data (you mentioned the size was a few megabytes).
@bdice I updated the original post to include the zip file at the bottom of it under "Additional Context".
I did some digging and it appears scikit-learn uses a numpy random state instance while cuML uses a cupy random state instance by default with an option of using a numpy random state instance. https://scikit-learn.org/stable/glossary.html#term-random_state https://docs.rapids.ai/api/cuml/stable/api/#preprocessing-metrics-and-utilities
I have not had a chance to test the numpy random state instance on cuML yet. I'm still trying to figure out to invoke the optional numpy random state instance in cuML. Is it just calling numpy.random.RandomState
in the cuML as follows: random_state = numpy.random.RandomState
?
If it is the random_state causing the discrepancy perhaps something like train_test_split(X, y, test_size=0.1, random_state=42, random_state_environment={"cupy", "numpy"})
where one specifies where to pull the random state from. Also, maybe the default could be numpy so that the results would match up with someone running the same code on scikit-learn, with the option to be to choose cupy. I only suggest that because if the desire is for them to produce equivalent results out of the box with cuML offering a speedup, we recognize that scikit-learn cant always call a cupy random state on all devices so the cuML default could be a numpy random state for the sake of reproducible results.
Thanks for the issue @evanhowington, I had written a response and closed my tab before submitting :(.
The issue very likely is not coming from using the random state either from numpy or cupy. Haven't yet tested it myself, but given the difference in the parallel/CUDA code it might just be an inherent difference.