PHATE
PHATE copied to clipboard
s_gd2 typeerror
TypeError Traceback (most recent call last)
<ipython-input-1-9418f70a3d50> in <module>
1 import phate
----> 2 Y = phate.PHATE(knn_dist='precomputed').fit_transform(A)
/mnt/eider_environments/EiderPython/local/apollo/env/EiderPython/python3.7/lib/python3.7/site-packages/phate/phate.py in fit_transform(self, X, **kwargs)
939 with _logger.task("PHATE"):
940 self.fit(X)
--> 941 embedding = self.transform(**kwargs)
942 return embedding
943
/mnt/eider_environments/EiderPython/local/apollo/env/EiderPython/python3.7/lib/python3.7/site-packages/phate/phate.py in transform(self, X, t_max, plot_optimal_t, ax)
908 n_jobs=self.n_jobs,
909 seed=self.random_state,
--> 910 verbose=max(self.verbose - 1, 0),
911 )
912 if isinstance(self.graph, graphtools.graphs.LandmarkGraph):
/mnt/eider_environments/EiderPython/local/apollo/env/EiderPython/python3.7/lib/python3.7/site-packages/phate/mds.py in embed_MDS(X, ndim, how, distance_metric, solver, n_jobs, seed, verbose)
228 try:
229 # use sgd2 if it is available
--> 230 Y = sgd(X_dist, n_components=ndim, random_state=seed, init=Y_classic)
231 if np.any(~np.isfinite(Y)):
232 _logger.warning("Using SMACOF because SGD returned NaN")
</mnt/eider_environments/EiderPython/local/apollo/env/EiderPython/lib/python3.7/site-packages/decorator.py:decorator-gen-157> in sgd(D, n_components, random_state, init)
/mnt/eider_environments/EiderPython/local/apollo/env/EiderPython/python3.7/lib/python3.7/site-packages/scprep/utils.py in _with_pkg(fun, pkg, min_version, *args, **kwargs)
81 check_version(pkg, min_version=min_version)
82 __imported_pkgs.add((pkg, min_version))
---> 83 return fun(*args, **kwargs)
84
85
/mnt/eider_environments/EiderPython/local/apollo/env/EiderPython/python3.7/lib/python3.7/site-packages/phate/mds.py in sgd(D, n_components, random_state, init)
82 D = squareform(D)
83 # Metric MDS from s_gd2
---> 84 Y = s_gd2.mds_direct(N, D, init=init, random_seed=random_state)
85 return Y
86
/mnt/eider_environments/EiderPython/local/apollo/env/EiderPython/python3.7/lib/python3.7/site-packages/s_gd2/s_gd2.py in mds_direct(n, d, w, etas, num_dimensions, random_seed, init)
82
83 # do mds
---> 84 cpp.mds_direct(X, d, w, etas, random_seed)
85 return X
86
TypeError: Array of type 'double' required. A 'unknown type' was given
Is there a resolution to this error? I keep running into this problem. I've been using pandas dataframes and I've tried changing data types with the same result.
Thanks!
Could you post the data and code you're using that produces the error? I'm having a hard time reproducing it.
In the meantime, you can avoid the error by using mds_solver='smacof'
.
I can't post all the data, but I've included a small print out of the data below.
data = pd.read_csv("path/to/data.csv", nrows=100)
data = data.set_index("sample_id")
data = data.astype(np.float64)
data_phate = phate_op.fit_transform(data)
Here is the error this code outputs.
002 003 004 005 006 007 008 009 010 ... 44786754 44786774 44786872 44787062 44816559 45771331 46234829 46235085 46235338
sample_id ...
1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 82.975610 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 91.886364 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 85.580645 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 89.466667 0.0 0.0 0.0
5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
96 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 97.828571 0.0 0.0 0.0
97 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 97.408163 0.0 0.0 0.0
98 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 97.040816 0.0 0.0 0.0
99 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 94.113924 0.0 0.0 0.0
100 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 88.694444 0.0 0.0 0.0
[100 rows x 3335 columns]
Calculating PHATE...
Running PHATE on 100 observations and 3335 variables.
Calculating graph and diffusion operator...
/data/users/trberg/anaconda3/lib/python3.7/site-packages/graphtools/graphs.py:121: UserWarning: Building a kNNGraph on data of shape (100, 3335) is expensive. Consider setting n_pca.
UserWarning,
Calculating KNN search...
Calculated KNN search in 0.11 seconds.
Calculating affinities...
Calculated graph and diffusion operator in 0.20 seconds.
Calculating optimal t...
Automatically selected t = 9
Calculated optimal t in 0.04 seconds.
Calculating diffusion potential...
Calculating metric MDS...
Calculated metric MDS in 0.01 seconds.
Calculated PHATE in 0.26 seconds.
Traceback (most recent call last):
File "feature_reduction.py", line 74, in <module>
data_phate = phate_op.fit_transform(data)
File "/data/users/trberg/anaconda3/lib/python3.7/site-packages/phate/phate.py", line 941, in fit_transform
embedding = self.transform(**kwargs)
File "/data/users/trberg/anaconda3/lib/python3.7/site-packages/phate/phate.py", line 910, in transform
verbose=max(self.verbose - 1, 0),
File "/data/users/trberg/anaconda3/lib/python3.7/site-packages/phate/mds.py", line 230, in embed_MDS
Y = sgd(X_dist, n_components=ndim, random_state=seed, init=Y_classic)
File "</data/users/trberg/anaconda3/lib/python3.7/site-packages/decorator.py:decorator-gen-146>", line 2, in sgd
File "/data/users/trberg/anaconda3/lib/python3.7/site-packages/scprep/utils.py", line 83, in _with_pkg
return fun(*args, **kwargs)
File "/data/users/trberg/anaconda3/lib/python3.7/site-packages/phate/mds.py", line 84, in sgd
Y = s_gd2.mds_direct(N, D, init=init, random_seed=random_state)
File "/data/users/trberg/anaconda3/lib/python3.7/site-packages/s_gd2/s_gd2.py", line 84, in mds_direct
cpp.mds_direct(X, d, w, etas, random_seed)
TypeError: Array of type 'double' required. A 'unknown type' was given
Could you please run the following:
data = pd.read_csv("path/to/data.csv", nrows=100)
data = data.set_index("sample_id")
data = data.astype(np.float64)
data.to_pickle("data.pickle.gz")
and then drag data.pickle.gz
into your reply? That should be small enough to post.
The issue isn't the size of the data, it's sensitive biomedical data that I don't have permission to upload in full.
But what you're seeing in my comment above is pretty much what it looks like.
Unfortunately if I'm unable to view the data it's going to be difficult to diagnose. I tried to replicate data like yours and it runs fine.
>>> import numpy as np
>>> import pandas as pd
>>> import phate
>>> data = pd.DataFrame(np.random.normal(0, 1, (100, 3335)))
>>> data.index.name = "sample_id"
>>> data = data.astype(np.float64)
>>> phate_op = phate.PHATE()
>>> data_phate = phate_op.fit_transform(data)
Calculating PHATE...
Running PHATE on 100 observations and 3335 variables.
Calculating graph and diffusion operator...
/home/scottgigante/.local/lib/python3.8/site-packages/graphtools/graphs.py:118: UserWarning: Building a kNNGraph on data of shape (100, 3335) is expensive. Consider setting n_pca.
warnings.warn(
Calculating KNN search...
Calculated KNN search in 0.08 seconds.
Calculating affinities...
Calculated affinities in 0.01 seconds.
Calculated graph and diffusion operator in 0.10 seconds.
Calculating optimal t...
Automatically selected t = 3
Calculated optimal t in 0.02 seconds.
Calculating diffusion potential...
Calculating metric MDS...
Calculated metric MDS in 0.01 seconds.
Calculated PHATE in 0.14 seconds.
Some diagnostics that might help:
import phate
import s_gd2
print(phate.__version__)
print(s_gd2.__version__)
print(np.all([d == np.dtype('float64') for d in data.dtypes]))
print(data.sum(axis=0).tolist())
print(data.sum(axis=1).tolist())
print(np.all(np.isfinite(data)))
So here are some results from this code.
print(phate.__version__) 1.0.4
print(s_gd2.__version__) 1.7
print(np.all([d == np.dtype('float64') for d in data.dtypes])) True
print(np.all(np.isfinite(data))) True
print (data.values.min(), data.values.max()) 0.0 10000000.0
First thing I would do is upgrade both of those packages and try again. If you're still having trouble, you could send me just the PHATE kernel which wouldn't contain any identifying information from your original data:
import pickle
import gzip
with gzip.open('kernel.pickle.gz', 'wb') as f:
pickle.dump(phate_op.graph.kernel, f)
So the update didn't fix the issue and when I ran the zipping and pickling code, I got this error.
Traceback (most recent call last):
File "feature_reduction.py", line 94, in <module>
get_phate_transform(data)
File "feature_reduction.py", line 62, in get_phate_transform
pickle.dump(phate_op.graph.kernel, f)
AttributeError: 'NoneType' object has no attribute 'kernel'
Oops, sorry -- you'll need to run phate_op.fit(data)
first.
Here is the kernal. kernel.pickle.gz
I've tested this on python 3.6 on windows subsystem for linux, python 3.7 (anaconda) on windows, and python 3.8 on arch linux. All work fine.
>>> import phate
>>> import pickle
>>> import gzip
>>> with gzip.open("kernel.pickle.gz") as f:
... K = pickle.load(f)
>>> phate_op = phate.PHATE(knn_dist='precomputed_affinity')
>>> phate_op.fit_transform(K)
Can you check the version of the following packages? (you'll need to run in powershell and double the slashes if on windows.)
python -VV
pip freeze | grep "^\(cycler\|decorator\|Deprecated\|future\|graphtools\|joblib\|kiwisolver\|matplotlib\|numpy\|packaging\|pandas\|phate\|Pillow\|PyGSP\|pyparsing\|python\-dateutil\|pytz\|s\-gd2\|scikit\-learn\|scipy\|scprep\|six\|tasklogger\|threadpoolctl\|wrapt\)=="
My versions, for reference:
On Arch virtualenv:
Python 3.8.5 (default, Sep 5 2020, 10:50:12)
[GCC 10.2.0]
cycler==0.10.0
decorator==4.4.2
Deprecated==1.2.11
future==0.18.2
graphtools==1.5.2
joblib==1.0.0
kiwisolver==1.3.1
matplotlib==3.3.4
numpy==1.20.0
packaging==20.9
pandas==1.2.1
phate==1.0.6
Pillow==8.1.0
PyGSP==0.5.1
pyparsing==2.4.7
python-dateutil==2.8.1
pytz==2021.1
s-gd2==1.8
scikit-learn==0.24.1
scipy==1.6.0
scprep==1.0.12
six==1.15.0
tasklogger==1.0.0
threadpoolctl==2.1.0
wrapt==1.12.1
On Arch:
Python 3.8.5 (default, Sep 5 2020, 10:50:12)
[GCC 10.2.0]
cycler==0.10.0
decorator==4.4.2
Deprecated==1.2.10
future==0.18.2
graphtools==1.5.2
joblib==0.16.0
kiwisolver==1.2.0
matplotlib==3.3.1
numpy==1.19.4
packaging==20.4
pandas==1.1.2
phate==1.0.4
Pillow==7.2.0
PyGSP==0.5.1
pyparsing==2.4.7
python-dateutil==2.8.1
pytz==2020.1
s-gd2==1.7
scikit-learn==0.23.2
scipy==1.5.2
six==1.15.0
tasklogger==1.0.0
threadpoolctl==2.1.0
wrapt==1.12.1
On WSL:
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
[GCC 8.3.0]
cycler==0.10.0
decorator==4.4.2
Deprecated==1.2.10
future==0.18.2
graphtools==1.5.2
joblib==0.16.0
kiwisolver==1.2.0
matplotlib==3.3.0
numpy==1.19.4
packaging==20.4
pandas==1.0.5
phate==1.0.4
Pillow==7.2.0
PyGSP==0.5.1
pyparsing==2.4.7
python-dateutil==2.8.1
pytz==2020.1
s-gd2==1.8
scikit-learn==0.23.1
scipy==1.5.2
scprep==1.0.10
six==1.15.0
tasklogger==1.0.0
threadpoolctl==2.1.0
wrapt==1.12.1
On Windows:
Python 3.7.6 (default, Jan 8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)]
cycler==0.10.0
decorator==4.4.2
Deprecated==1.2.10
future==0.18.2
graphtools==1.5.1
joblib==0.14.1
kiwisolver==1.1.0
matplotlib==3.2.1
numpy==1.18.1
packaging==20.3
pandas==1.0.3
phate==1.0.4
Pillow==7.0.0
PyGSP==0.5.1
pyparsing==2.4.6
python-dateutil==2.8.1
pytz==2019.3
s-gd2==1.7
scikit-learn==0.22.2.post1
scipy==1.4.1
scprep==1.0.4
six==1.14.0
tasklogger==1.0.0
wrapt==1.12.1