FRASER
FRASER copied to clipboard
FRASER maxing out memory on sge cluster
Hello!
We are attempting to use FRASER on a cohort of 400 samples. We've been experiencing issues completing FRASER when sending the job to our sge queue, even when providing 1.5TB of memory. It seems the parallelization of the PSI calculation (fds <- calculatePSIValues(fds,BPPARAM=BPPARAM)) is causing the job to go over our h_vmem allocation. We've attempted to force FRASER to run in serial (BPPARAM=SerialParam()), but encounter the same issue maxing out of the memory.
Is it possible that FRASER is ignoring the serial setting?
Is there a quick fix for this? Or does a solution similar to the link below have to be implemented: https://github.com/gagneurlab/OUTRIDER/issues/11
Thank you for the tool, we've really enjoyed running it on some previous cohorts.
Hi @Jessen-Erik ,
thanks for trying out FRASER!
Regarding your problem, can you check the dimension of your fds
object that you use as input for this step? As we typically run this step before filtering, I suspect that you could have a quite large fds
object and that this is causing the problem rather than the parallelization itself. If this is indeed the case, you could try applying the minExpressionInOneSample
filter before the PSI calculation step (we provide the option to do this as part of the countRNAData
function), as this typically reduces the number of junctions inside the fds
object a lot.
I checked the dimensions and size of the file: dim(fds) [1] 3605173 406 object.size(fds) 65659480 bytes
What is the default minExpressionInOneSample? Just 1 read?
From: Ines Scheller @.> Sent: Thursday, September 30, 2021 8:58 AM To: c-mertes/FRASER @.> Cc: Jessen, Erik, Ph.D. @.>; Mention @.> Subject: [EXTERNAL] Re: [c-mertes/FRASER] FRASER maxing out memory on sge cluster (#29)
Hi @Jessen-Erikhttps://github.com/Jessen-Erik , thanks for trying out FRASER! Regarding your problem, can you check the dimension of your fds object that you use as input for this step? As we typically run this step before filtering, I suspect that you could have a quite large fds object and that this is causing the problem rather than the parallelization itself. If this is indeed the case, you could try applying the minExpressionInOneSample filter before the PSI calculation step (we provide the option to do this as part of the countRNAData function), as this typically reduces the number of junctions inside the fds object a lot.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/c-mertes/FRASER/issues/29#issuecomment-931347502, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANAFPABEJKKDWKCCCQGI7R3UERUFDANCNFSM5EWGEXNA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
We are sorry that we did not reply anymore. Thanks for sharing the dimensions. 3mio junctions is big, but should not require 1.5Tb memory. For this purpose 1 read is enough as it is used only to remove random alignments.
Since there was no further response, I assume that the minExpressionInOneSample
filter step helped here.