ABC-Enhancer-Gene-Prediction icon indicating copy to clipboard operation
ABC-Enhancer-Gene-Prediction copied to clipboard

Refseq gene list

Open yussufhajjaj opened this issue 3 years ago • 3 comments

Dear All,

I want to use the Ensemble gene list instead of refseq with hg38 assembly, is there a way to create a similar file as yours with the TSS being the largest region that contains all possible isoforms of a certain gene? and thanks in advance.

Best Regards,

Yussuf

yussufhajjaj avatar Oct 14 '21 13:10 yussufhajjaj

The key principle that they used is look at what is the most common TSS between serveral transcripts of a gene. Then, it will be the TSS of this one. I try to do the similar thing with same version refseq but can not cover 100% what they have in Chr22 as example. It miss some part due to the fact that there may not have the only largest but may have 2 or even 3. It is a big deal, you have to choose one but they did not mention how to do it next. Hope my comment help you a liitle.

nttg8100 avatar Mar 18 '22 08:03 nttg8100

Hi Thah,

I already used the Ensemble canonical track, in which they provide one TSS per gene depending on it to be the most conserved and highest expressed transcript for a gene in the different tissues. It works well for me, you might need to remove some gene types like pseudogenes and etc. It was nice from you to remind to drop a comment here, I totally forgot to update the issue. Thanks.

yussufhajjaj avatar Mar 18 '22 09:03 yussufhajjaj

Hi, In the canonical track, is the ISS position the start position of the transcript labeled as Ensembl_canonical in the gtf file?

jxcao98 avatar Dec 19 '23 01:12 jxcao98