TOBIAS icon indicating copy to clipboard operation
TOBIAS copied to clipboard

Hi, I have the following three questions for CreateNetwork:

Open hyBio opened this issue 2 months ago • 1 comments

          Hi, I have the following three questions for CreateNetwork: 
  1. The two columns in the motif2gene_mapping.txt file are supposed to be the motif name \t gene name or the gene name \t gene product name as shown below, which is very confusing to me. image
  2. does the first column of motif2gene_mapping.txt need to match with the fourth column of TFBS, and do I need to adjust accordingly if I customize the motif name?
  3. If it is a non-model species, how should I get the motif and its regulated gene set, can I just use the motif2gene_mapping.txt in test data? Looking forward to your reply, thanks a lot.

Originally posted by @hyBio in https://github.com/loosolab/TOBIAS/issues/260#issuecomment-2067085858

hyBio avatar Apr 20 '24 04:04 hyBio

Hey @hyBio,

thank you for using TOBIAS.

  1. The image you refer to from the wiki does not depict the motif2gene_mapping.txt file, but rather how CreateNetwork operates to build TF binding networks. Basically, you need to tell the the tool which TF is expressed by which gene. Then, the tool checks which TFBS identified by BINDetect are associated with the expression of these genes. That way, we see if one TF is influencing the expression of another TF further down the network, because we know which gene expresses it and which TFs bind to the promoter of this TFs gene. The figure your provided image stems from depicts this concept, not the format of the --origin file, though I can see why this can be confusing. So to answer the question about the columns of the motif2gene_mapping.txt file: First column is the TF/motif name, second column is the ID of the gene that encodes it.
  2. Yes, they have to match. Otherwise, the tool cannot find any TFBS associated with your given TFs and genes. If you did adjustments to the motif names before, you have to use these adjusted names in the --origin file as well. Perhaps the --naming parameter of TOBIAS BINDetect is of interest to you in this context.
  3. The motif2gene_mapping.txt file in test_data is for human genes, they will not work for other species. This issue describes how to create the file from scratch. You can just take the genes.gtf file for your organism and get all lines where gene_name matches one of your TF motif names from your JASPAR file. Each line left contains both the gene_name and gene_id, which you can then use to fill your two columns for you --origin file.

I hope this clears up your questions. If you are in need of further assistance, let me know!

Best regards, Moritz

mohobein avatar Apr 22 '24 12:04 mohobein