leafcutter icon indicating copy to clipboard operation
leafcutter copied to clipboard

Potentially incorrect use of minclureads and minreads

Open anikethjr opened this issue 2 years ago • 4 comments

Hi,

I was trying to use LeafCutter to process some junction reads and noticed a potential bug in leafcutter_cluster.py and leafcutter_cluster_regtools.py:

  1. When the clusters are refined using the refine_clusters function, leafcutter_cluster.py passes the minreads parameter to refine_cluster - I think this leads to some clusters with less than minclureads being included in further steps. There must be check to filter out clusters with less than minclureads by possibly providing it as another argument to refine_cluster.

  2. In leafcutter_cluster_regtools.py, there is no minreads parameter and the minclureads parameter is passed to refine_cluster. This leads to a stricter threshold being applied to the number of reads for each intron. I am not sure why the minreads parameter was removed but fixes similar to those for leafcutter_cluster.py could be applied here too.

Please let me know if I am missing something!

Thank you, Aniketh

anikethjr avatar Dec 18 '22 23:12 anikethjr

Thanks for pointing this out. We are working on a newer version of LeafCutter with these fixes.

Best, Yang

On Sun, Dec 18, 2022, 18:43 Aniketh Janardhan Reddy < @.***> wrote:

Hi,

I was trying to use LeafCutter to process some junction reads and noticed a potential bug in leafcutter_cluster.py and leafcutter_cluster_regtools.py:

When the clusters are refined using the refine_clusters function, leafcutter_cluster.py passes the minreads parameter to refine_cluster

  • I think this leads to some clusters with less than minclureads being included in further steps. There must be check to filter out clusters with less than minclureads by possibly providing it as another argument to refine_cluster.

In leafcutter_cluster_regtools.py, there is no minreads parameter and the minclureads parameter is passed to refine_cluster. This leads to a stricter threshold being applied to the number of reads for each intron. I am not sure why the minreads parameter was removed but fixes similar to those for leafcutter_cluster.py could be applied here too.

Please let me know if I am missing something!

Thank you, Aniketh

— Reply to this email directly, view it on GitHub https://github.com/davidaknowles/leafcutter/issues/222, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGWTCLRDCOJ6W3IIUSDKALWN6OQZANCNFSM6AAAAAATCY44OI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

goldenflaw avatar Dec 19 '22 02:12 goldenflaw

Great, thank you! Looking forward to the new release.

anikethjr avatar Dec 28 '22 10:12 anikethjr

Hi,

I was trying to use LeafCutter to process some junction reads and noticed a potential bug in leafcutter_cluster.py and leafcutter_cluster_regtools.py:

  1. When the clusters are refined using the refine_clusters function, leafcutter_cluster.py passes the minreads parameter to refine_cluster - I think this leads to some clusters with less than minclureads being included in further steps. There must be check to filter out clusters with less than minclureads by possibly providing it as another argument to refine_cluster.
  2. In leafcutter_cluster_regtools.py, there is no minreads parameter and the minclureads parameter is passed to refine_cluster. This leads to a stricter threshold being applied to the number of reads for each intron. I am not sure why the minreads parameter was removed but fixes similar to those for leafcutter_cluster.py could be applied here too.

Please let me know if I am missing something!

Thank you, Aniketh

Hi,

sorry to disturb! I might encounter a similar problem like you said. when I run the leafcutter_cluster_regtools.py with --minclureads as 50, it makes the number of reads for every intron in one cluster more than 50, instead of summing all reads for the cluster.

But my poor python and English ability don't allow me to understand the source code well, so is that because "there is no minreads parameter and the minclureads parameter is passed to refine_cluster" and it makes a stricter threshold for intron?

But why🧐, Isn't the parameter --minclureads set for the cluster, so how does it influence the intron? or is it because the script of leafcutter_cluster_regtools.py has been changed recently and the former one don't have this issue?

Any reply are appreciated!

Thank you, Jeep

CuteGold0407 avatar Apr 12 '23 16:04 CuteGold0407

Thank you, this note has been noted and we will fix this, along with other requests in an upcoming leafcutter release.

Best, Yang

On Wed, Apr 12, 2023 at 11:56 AM CuteGold0407 @.***> wrote:

Hi,

I was trying to use LeafCutter to process some junction reads and noticed a potential bug in leafcutter_cluster.py and leafcutter_cluster_regtools.py:

  1. When the clusters are refined using the refine_clusters function, leafcutter_cluster.py passes the minreads parameter to refine_cluster
  • I think this leads to some clusters with less than minclureads being included in further steps. There must be check to filter out clusters with less than minclureads by possibly providing it as another argument to refine_cluster.
  1. In leafcutter_cluster_regtools.py, there is no minreads parameter and the minclureads parameter is passed to refine_cluster. This leads to a stricter threshold being applied to the number of reads for each intron. I am not sure why the minreads parameter was removed but fixes similar to those for leafcutter_cluster.py could be applied here too.

Please let me know if I am missing something!

Thank you, Aniketh

Hi,

sorry to disturb! I might encounter a similar problem like you said. when I run the leafcutter_cluster_regtools.py with --minclureads as 50, it makes the number of reads for every intron in one cluster more than 50, instead of summing all reads for the cluster.

But my poor python and English ability don't allow me to understand the source code well, so is that because "there is no minreads parameter and the minclureads parameter is passed to refine_cluster" and it makes a stricter threshold for intron?

But why🧐, Isn't the parameter --minclureads set for the cluster, so how does it influence the intron? or is it because the script of leafcutter_cluster_regtools.py has been changed recently and the former one don't have this issue?

Any reply are appreciated!

Thank you, Jeep

— Reply to this email directly, view it on GitHub https://github.com/davidaknowles/leafcutter/issues/222#issuecomment-1505611972, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGWTCPKUI42YGRSQHYDOZTXA3NE5ANCNFSM6AAAAAATCY44OI . You are receiving this because you commented.Message ID: @.***>

goldenflaw avatar Apr 12 '23 19:04 goldenflaw