Try to create the chicken cisdatabase
Hello,I'm trying to create the chicken's cistarget database for my single-cell research analysis, and already creating the GRCg6a.regions_vs_motifs.rankings.feather throught EPD's bed and v10_clust motifs. But when i try to run the pyscenic ctx using the feather file and motifs-v10-nr.chicken-m0.00001-o0.0.tbl , it report that
Can you make some advice for that wrong ?
Never seen that error myself, but I assume the error comes from a mismatch between gene/region names in the database and the ones you are requesting in pySCENIC (which seems to be None according to the error message). For pySCENIC you normally make a gene cisTarget database and not a region cisTarget database.
Never seen that error myself, but I assume the error comes from a mismatch between gene/region names in the database and the ones you are requesting in pySCENIC (which seems to be
Noneaccording to the error message). For pySCENIC you normally make a gene cisTarget database and not a region cisTarget database. So glad to receive your reply, i have successed in making the genes vs motifs file by adding -g '|ENSGALG[0-9]+|ENSGALT[0-9.]+$' . But i still have some questions about this instruction. Because as you konw that many chicken's genes dont have the accurate gene symbol and only named as eg:ENSGALG00010029927, what should i do if i want to these genes also been included in the genes vs motifs.feather file? the below picture is the gene names of my TSS.fa file :Anyway, thanks for your help, hope everyting go well with you! @ghuls
ENSEMBL switched from GRCg6a (galGal6) to GRCg7b (bGalGal1) in recent releases. ENSGALG00010029927, for example, is an ENSEMBL ID for the GRCg7b assembly. So, the coordinates would not match if you are using GRCg6a.
They still provide gene annotations for GRCg6a on their ftp server (at least for ENSEMBL 108). But many gene names are missing. We updated the protein-coding gene names for the GRCg6a version (ENSEMBL 108) for one of our recent data sets, in case that is helpful:
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE262nnn/GSE262321/suppl/GSE262321%5Fgex%5Fgenes%2Etsv%2Egz
I have made two feather file which one is made from 500bp around TSS the other is made from 10KB around TSS , and find that both of them are not ideal ,only 6 to 7 TFs are detected in the results. What's more, the detected TSs are totally different! feel sad about results, EMMMMMMMMMM As your resources has a chicken's tbl , i wonder whehter your them have ever tried to make a chicken's cisdatabases? TAT Antway, thanks for your reply, hope you have a good day ^-^
@LJZYaaa Are you sure that you are using the correct gene annotation GRCg7b (bGalGal1) with the correct FASTA (GRCg7b (bGalGal1)) file?
@ghuls I made the fa file throught Ensembl biomart as below:
And this is the code that i create the feather file:
Is there anything wrong in that? hope for receiving your reply ^^
At first glance it looks OK.
Now that your gene names are from GRCg7b in the Feather database, make sure to convert your expression matrix gene names to GRCg7b too.
For pySCENIC it might be better to use the human motif to TF than the chicken (GRCg6a) one (: https://resources.aertslab.org/cistarget/motif2tf/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl, same with the TF list: https://resources.aertslab.org/cistarget/tf_lists/allTFs_hg38.txt (prune it, so you only have TFs that are actually in your database).
As we mainly work with scATAC data instead of scRNA, I think we might not have a gene-based chicken cisTarget database internally, but only region-based ones. So there is a chance that the gene-based version does not work very well.
@ghuls ok, i will try it. It will be with great regret that your tool can't be used in the non-model specise. Anyway, thanks for your reply.