DP_GP_cluster icon indicating copy to clipboard operation
DP_GP_cluster copied to clipboard

Size of input file

Open kanekalla opened this issue 6 years ago • 5 comments

Hello Ian,

Is there a size limitation for the input data, I am using a matrix of 100000 *1300. After running for a long time, I am not getting an error or output ? any solution ?

kanekalla avatar Mar 09 '18 18:03 kanekalla

Hi Kishore,

There is no hard boundary on input size, but that is way way too big. My algorithm works well in practice with on the order of hundreds to a couple thousands of genes and on the order of 10 time points. I would recommend using another algorithm.

Best, Ian

On Fri, Mar 9, 2018 at 1:31 PM, Kishore R. Anekalla < [email protected]> wrote:

Hello Ian,

Is there a size limitation for the input data, I am using a matrix of 100000 *1300. After running for a long time, I am not getting an error or output ? any solution ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_PrincetonUniversity_DP-5FGP-5Fcluster_issues_5&d=DwMCaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=QGd0b_06v3hDpkoxZUB9c2iQ5RMTOts_DJ98uEZzA-o&m=Wh9ue9csM9IK050zeMCceDc6r6TFy7nz3vJ4eTPfmPs&s=48W-VfX0Dzf-ihbegFR6p0a4CaGYOsJ5-wuUC3Ab1pg&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AFnc9EfrxKHBy7TPeE-5Fyw0yVzNaWWE-2DOks5tcsqSgaJpZM4Skrko&d=DwMCaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=QGd0b_06v3hDpkoxZUB9c2iQ5RMTOts_DJ98uEZzA-o&m=Wh9ue9csM9IK050zeMCceDc6r6TFy7nz3vJ4eTPfmPs&s=0WFjV6nJ2xsyw7IDq10V4j9kUdtzbSAm_WjEh0qxfnw&e= .

-- Ian McDowell Bioinformatician II Duke University

IanMcDowell avatar Mar 09 '18 18:03 IanMcDowell

Thanks Ian, Final question, should the time_t in the matrix be continuous or can also be a factor?

kanekalla avatar Mar 09 '18 18:03 kanekalla

Hi Ian,

Just to know for sure, when yo say "a couple of thousands" how many is the maximum you recommend?

Thanks!!

cartal avatar Mar 12 '18 12:03 cartal

It will depend on the number of time points as well as the characteristics of the gene expression responses themselves, so there truly is no hard-and-fast rule. You can subset your genes based on the degree of differential expression to reduce the demand placed on the algorithm. You could also just run the algorithm with the --fast option and monitor progress, which is printed to stdout. (Note that the chain with speed up after burn-in.)

On Mon, Mar 12, 2018 at 8:20 AM, Carlos Talavera-López < [email protected]> wrote:

Hi Ian,

Just to know for sure, when yo say "a couple of thousands" how many is the maximum you recommend?

Thanks!!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_PrincetonUniversity_DP-5FGP-5Fcluster_issues_5-23issuecomment-2D372291168&d=DwMCaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=QGd0b_06v3hDpkoxZUB9c2iQ5RMTOts_DJ98uEZzA-o&m=A51FoCTxYW_g3jnMP4DMtlBBgpenYhkokjyv3-33JVY&s=7Uk-rP9jClucB1u7Sg2r4gsS-qBWoHjn745GsH4Q1gE&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AFnc9Kgid3qlS1etyJpqKmaik3uPEItNks5tdmgegaJpZM4Skrko&d=DwMCaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=QGd0b_06v3hDpkoxZUB9c2iQ5RMTOts_DJ98uEZzA-o&m=A51FoCTxYW_g3jnMP4DMtlBBgpenYhkokjyv3-33JVY&s=s37W9IALVQNjmYbKtJ-kmFv6_W0kZbgsTf1d2X-hkLk&e= .

-- Ian McDowell Bioinformatician II Duke University

IanMcDowell avatar Mar 12 '18 14:03 IanMcDowell