TOBIAS
TOBIAS copied to clipboard
Hi, I have the following three questions for CreateNetwork:
Hi, I have the following three questions for CreateNetwork:
- The two columns in the motif2gene_mapping.txt file are supposed to be the motif name \t gene name or the gene name \t gene product name as shown below, which is very confusing to me.
- does the first column of motif2gene_mapping.txt need to match with the fourth column of TFBS, and do I need to adjust accordingly if I customize the motif name?
- If it is a non-model species, how should I get the motif and its regulated gene set, can I just use the motif2gene_mapping.txt in test data? Looking forward to your reply, thanks a lot.
Originally posted by @hyBio in https://github.com/loosolab/TOBIAS/issues/260#issuecomment-2067085858
Hey @hyBio,
thank you for using TOBIAS.
- The image you refer to from the wiki does not depict the motif2gene_mapping.txt file, but rather how CreateNetwork operates to build TF binding networks. Basically, you need to tell the the tool which TF is expressed by which gene. Then, the tool checks which TFBS identified by BINDetect are associated with the expression of these genes. That way, we see if one TF is influencing the expression of another TF further down the network, because we know which gene expresses it and which TFs bind to the promoter of this TFs gene. The figure your provided image stems from depicts this concept, not the format of the
--origin
file, though I can see why this can be confusing. So to answer the question about the columns of the motif2gene_mapping.txt file: First column is the TF/motif name, second column is the ID of the gene that encodes it. - Yes, they have to match. Otherwise, the tool cannot find any TFBS associated with your given TFs and genes. If you did adjustments to the motif names before, you have to use these adjusted names in the
--origin
file as well. Perhaps the--naming
parameter ofTOBIAS BINDetect
is of interest to you in this context. - The motif2gene_mapping.txt file in test_data is for human genes, they will not work for other species. This issue describes how to create the file from scratch. You can just take the genes.gtf file for your organism and get all lines where
gene_name
matches one of your TF motif names from your JASPAR file. Each line left contains both thegene_name
andgene_id
, which you can then use to fill your two columns for you--origin
file.
I hope this clears up your questions. If you are in need of further assistance, let me know!
Best regards, Moritz