create_cisTarget_databases icon indicating copy to clipboard operation
create_cisTarget_databases copied to clipboard

How to check which genes are in the database?

Open linxy29 opened this issue 2 years ago • 4 comments

Hi,

I'm using pySCENIC to analyze human iPSC data. We are interested in some genes and have the following questions:

  1. We cannot find TBXT in the pySCENIC. Instead, we found T which is another name of TBXT, can we regard T as TBXT?
  2. We are interested in HOPX and SLIT2. We cannot find the information on these two genes either. I found out another https://github.com/aertslab/create_cisTarget_databases/issues/21. I'm wondering is there any way we can get gene regulatory information about these two genes, or we can't get meaningful information even if we add these two genes to the database?

The database we used are 'hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather', 'hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.feather', 'motifs-v9-nr.hgnc-m0.001-o0.0.tbl'.

Thank you for your help.

Best

linxy29 avatar Aug 25 '22 03:08 linxy29

You can use:

# cd create_cisTarget_Databases

import feather_v1_or_v2


all_columns_in_ctx_db = get_all_column_names_from_feather(feather_file="hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather')

all_columns_in_ctx_db

Gene names for hg38 are HGNC symbols as linked to RefSeq r80.

ghuls avatar Sep 19 '22 12:09 ghuls

Hi @ghuls ,

Thank you very much for your help. I'm still having some trouble getting what I want.

  1. I tried to enter the 'create_cisTarget_databases' folder and ran the code you posted. I got the error: NameError: name 'get_all_column_names_from_feather' is not defined.

  2. Then, I tried to install the create_cisTarget_databases by following the installation guide. I got the error: ld return 1 exit status. I tried several things to debug, but I still failed to install the create_cisTarget_databases module.

1
  1. I googled HGNC and RefSeq r80, but I still have no idea whether TBXT, HOPX, and SLIT2 are in the database.

I checked the website 'https://resources.aertslab.org/cistarget/' and found out a tf_lists/allTFs_hg38.txt file。

I'm wondering 1) whether this 'allTFs_hg38.txt' file contains all the genes in the database? Or 2) what should I do to make the 'get_all_column_names_from_feather' function works?

Thank you for your help.

linxy29 avatar Oct 12 '22 05:10 linxy29

You don't need to compile Cluster-Buster to be able to check the feather databases. You just need to create a conda environment with the python dependencies and then when you are in this cloned repo, import feather_v1_or_v2.

You can even just load the whole feather database with pandas in the worst case:

import pandas as pd

df = pd.read_feather("hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather')

df.columns

ghuls avatar Dec 15 '22 08:12 ghuls

Hello, I would like to ask how to obtain the gene_ID of the. feature file on a Linux terminal? I would greatly appreciate it if you could provide some suggestions.

ChenJH-scau avatar Jul 29 '23 15:07 ChenJH-scau