SUPPA icon indicating copy to clipboard operation
SUPPA copied to clipboard

Clustering with NaN values

Open sundahlm opened this issue 4 years ago • 4 comments

Hello,

I get an error when I run the clusterEvents tool as follows:

python suppa.py clusterEvents --dpsi AL_Ts_dPSI_TableForClustering.dpsi –psivec AL_Ts_PSIvec_TableForClustering.psivec --groups 1-5,6-10,11-14,15-18 -o AL_Ts_

ERROR:lib.cluster_tools:Unknown error: (<class 'ValueError'>, ValueError("Input contains NaN, infinity or a value too large for dtype('float64').",), <traceback object at 0x7f6080a71f48>)

My input .dpsi and .psivec files have many nan values.

Any help will be greatly appreciated.

AL_Ts_dPSI_TableForClustering.dpsi.txt AL_Ts_PSIvec_TableForClustering.psivec.txt

sundahlm avatar Apr 29 '20 03:04 sundahlm

Hi,

thanks for the message. Have you tried removing the events with only nan's in the input file?

I hope this helps

E.

On Wed, 29 Apr 2020 at 13:47, sundahlm [email protected] wrote:

Hello,

I get an error when I run the clusterEvents tool as follows:

python suppa.py clusterEvents --dpsi AL_Ts_dPSI_TableForClustering.dpsi –psivec AL_Ts_PSIvec_TableForClustering.psivec --groups 1-5,6-10,11-14,15-18 -o AL_Ts_

ERROR:lib.cluster_tools:Unknown error: (<class 'ValueError'>, ValueError("Input contains NaN, infinity or a value too large for dtype('float64').",), <traceback object at 0x7f6080a71f48>)

My input .dpsi and .psivec files have many nan values.

Any help will be greatly appreciated.

AL_Ts_dPSI_TableForClustering.dpsi.txt https://github.com/comprna/SUPPA/files/4549706/AL_Ts_dPSI_TableForClustering.dpsi.txt AL_Ts_PSIvec_TableForClustering.psivec.txt https://github.com/comprna/SUPPA/files/4549707/AL_Ts_PSIvec_TableForClustering.psivec.txt

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/80, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB6RZPIMMFZTHV3XZCDRO6PMHANCNFSM4MTLZSKQ .

-- Prof. E Eyras EMBL Australia Group Leader The John Curtin School of Medical Research - Australian National University https://github.com/comprna http://scholar.google.com/citations?user=LiojlGoAAAAJ

EduEyras avatar Apr 29 '20 06:04 EduEyras

Hi Eduardo,

Thank you very much for your quick reply.

I have removed the events with only nan's in the PSIvec file OR in the dPSI file - resulting PSIvec and dPSI have the same events. Unfortunately, I still get an error:

python suppa.py clusterEvents --dpsi AL_Ts_dPSI_TableForClustering.dpsi --psivec AL_Ts_PSIvec_TableForClustering.psivec --groups 1-5,6-10,11-14,15-18 -o AL_Ts_

ERROR:lib.cluster_tools:Unknown error: (<class 'ValueError'>, ValueError("Input contains NaN, infinity or a value too large for dtype('float64').",), <traceback object at 0x7f5ea2d03ac8>)

Thanks!

AL_Ts_dPSI_TableForClustering.dpsi.txt AL_Ts_PSIvec_TableForClustering.psivec.txt

sundahlm avatar Apr 29 '20 18:04 sundahlm

This seems to come from the Python library for clustering. I am wondering whether the are still some events with all nan in a given group of samples. Or whether it is an issue with the library not being able to handle these cases.

I cc JL and JC in case they can provide some insight about why this might not be working.

E.

On Thu, 30 Apr 2020 at 04:42, sundahlm [email protected] wrote:

Hi Eduardo,

Thank you very much for your quick reply.

I have removed the events with only nan's in the PSIvec file OR in the dPSI file - resulting PSIvec and dPSI have the same events. Unfortunately, I still get an error:

python suppa.py clusterEvents --dpsi AL_Ts_dPSI_TableForClustering.dpsi --psivec AL_Ts_PSIvec_TableForClustering.psivec --groups 1-5,6-10,11-14,15-18 -o AL_Ts_

ERROR:lib.cluster_tools:Unknown error: (<class 'ValueError'>, ValueError("Input contains NaN, infinity or a value too large for dtype('float64').",), <traceback object at 0x7f5ea2d03ac8>)

Thanks!

AL_Ts_dPSI_TableForClustering.dpsi.txt https://github.com/comprna/SUPPA/files/4553743/AL_Ts_dPSI_TableForClustering.dpsi.txt AL_Ts_PSIvec_TableForClustering.psivec.txt https://github.com/comprna/SUPPA/files/4553744/AL_Ts_PSIvec_TableForClustering.psivec.txt

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/80#issuecomment-621392151, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB2M2CEUS7IAVCOLGFDRPBYJXANCNFSM4MTLZSKQ .

-- Prof. E Eyras EMBL Australia Group Leader The John Curtin School of Medical Research - Australian National University https://github.com/comprna http://scholar.google.com/citations?user=LiojlGoAAAAJ

EduEyras avatar Apr 30 '20 02:04 EduEyras

Hi Eduardo,

Thanks again for the quick reply.

As you mentioned, maybe there are events with all nan's within a group - with one or more numbers in samples from other groups. I with check this and get back to you.

Best, Carla

On Wed, Apr 29, 2020, 22:35 Eduardo Eyras [email protected] wrote:

This seems to come from the Python library for clustering. I am wondering whether the are still some events with all nan in a given group of samples. Or whether it is an issue with the library not being able to handle these cases.

I cc JL and JC in case they can provide some insight about why this might not be working.

E.

On Thu, 30 Apr 2020 at 04:42, sundahlm [email protected] wrote:

Hi Eduardo,

Thank you very much for your quick reply.

I have removed the events with only nan's in the PSIvec file OR in the dPSI file - resulting PSIvec and dPSI have the same events. Unfortunately, I still get an error:

python suppa.py clusterEvents --dpsi AL_Ts_dPSI_TableForClustering.dpsi --psivec AL_Ts_PSIvec_TableForClustering.psivec --groups 1-5,6-10,11-14,15-18 -o AL_Ts_

ERROR:lib.cluster_tools:Unknown error: (<class 'ValueError'>, ValueError("Input contains NaN, infinity or a value too large for dtype('float64').",), <traceback object at 0x7f5ea2d03ac8>)

Thanks!

AL_Ts_dPSI_TableForClustering.dpsi.txt < https://github.com/comprna/SUPPA/files/4553743/AL_Ts_dPSI_TableForClustering.dpsi.txt

AL_Ts_PSIvec_TableForClustering.psivec.txt < https://github.com/comprna/SUPPA/files/4553744/AL_Ts_PSIvec_TableForClustering.psivec.txt

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/80#issuecomment-621392151, or unsubscribe < https://github.com/notifications/unsubscribe-auth/ADCZKB2M2CEUS7IAVCOLGFDRPBYJXANCNFSM4MTLZSKQ

.

-- Prof. E Eyras EMBL Australia Group Leader The John Curtin School of Medical Research - Australian National University https://github.com/comprna http://scholar.google.com/citations?user=LiojlGoAAAAJ

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/80#issuecomment-621577357, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFNRXJ7AVJM4DVOHGG7SC3DRPDPVJANCNFSM4MTLZSKQ .

sundahlm avatar Apr 30 '20 02:04 sundahlm