create_cisTarget_databases
create_cisTarget_databases copied to clipboard
Will zebrafish cisTarget databases be available in the future?
Thank you for developing this amazing tool! I am wondering if zebrafish cisTarget databases will be available in the future, like for human/mouse/drosophila?
Creating databases for species we don't use in the lab, probably needs to be a community effort as validating that the input regions used to create the database work properly is quite some work.
There was an effort to create a zebrafish database a while back: https://github.com/aertslab/create_cisTarget_databases/issues/8
Recently I was contacted by a group that was trying to make a zebrafish database. Once they feel confident enough that it works properly, they were willing to share it.
For creating a cisTarget database for zebrafish I used:
-
a FASTA file containing genomic regions 5kb upstream of all genes downloaded from UCSC http://www.ensembl.org/biomart/martview/75f230e7207d8d0863f73e15049b71af
-
TF binding motifs from the JASPAR core database:
downloaded from https://jaspar.genereg.net/download/data/2022/CORE/JASPAR2022_CORE_non-redundant_pfms_jaspar.txt
then converted the motifs to the required cluster-buster format, and created the motifs list as described by tropfenameimer (commented [on Apr 20, 2021)] https://github.com/aertslab/create_cisTarget_databases/issues/4#issuecomment-823132619) -
Also used the flag: -c "cbust" -g '#?$' To search for 0 or 1 "#" at the end of the sequence names (will always be 0) to force it to consider all sequence names as genes instead of regions. extract_gene_id_from_region_id_replace="#?$"
Thank you @ghuls @Mesi395 ! I will give it a try.
As soon as you managed to construct the .feather file, you will have to convert the gene names from JASPAR into the symbol names from your genome. To achieve this, we used several different databases:
- Ensembl biomart: First convert the JASPAR gene names to ensembl gene names, then use the orthology databases to convert to similar genes from zebrafish
- Alliance database: https://www.alliancegenome.org/downloads#orthology
- OMA database: https://omabrowser.org/oma/home/
The output file you want to get should look like the files for mouse/human on the aertslab homepage https://resources.aertslab.org/cistarget/motif2tf/. You could also take the mouse file from the website, and directly convert the mouse gene names to zebrafish gene names. This gave us the highest yield of motifs / TF, however we were not entirely sure on how the aertlab .tbl file was constructed by @ghuls.
Creating databases for species we don't use in the lab, probably needs to be a community effort as validating that the input regions used to create the database work properly is quite some work.
There was an effort to create a zebrafish database a while back: #8
Recently I was contacted by a group that was trying to make a zebrafish database. Once they feel confident enough that it works properly, they were willing to share it.
Hello, I am eager to know the current progress of database construction about zebrafish. Whether it can be used and provide detailed build steps? @ghuls
@yanpinlu unfortenately I didn't hear anything back from them so far.
In case it helps, our SCENIC+ public motif collection is now public: https://resources.aertslab.org/cistarget/motif_collections/
So at least you don't have to hunt for your own PWM files anymore.
@yjchen1201 Hi! Were you able to create the zebrafish dataset?
@yjchen1201 @ghuls similarly asking with @mtrebelo, does that mean, as we already have had the "scenifc+" motif collection/motif2TF, which is very comprehensive, we just need to change the TF gene name of it in the .tbl file to zebrafish format for now as a good usage? Thank you!
@yanpinlu unfortenately I didn't hear anything back from them so far.
In case it helps, our SCENIC+ public motif collection is now public: https://resources.aertslab.org/cistarget/motif_collections/
So at least you don't have to hunt for your own PWM files anymore.
https://github.com/JoGraesslin/Zebrafish_SCENIC @JoGraesslin provides his scripts at https://github.com/JoGraesslin/Zebrafish_SCENIC.
A motif2tf table file with zebrafish names provided by him can be found at: https://drive.google.com/file/d/1__P8l-XTLA6Bup_ucKs4M1yGqE-wGbYz/view?usp=sharing. It contains the human.tbl file with orthology names from ensembl, alliance and oma databases.