create_cisTarget_databases icon indicating copy to clipboard operation
create_cisTarget_databases copied to clipboard

Build the Axolotl cisTarget database issue

Open fengweimin-maker opened this issue 3 years ago • 14 comments

Hi,@authors: I built the Axolotl cisTarget database just following the instruction of Gallus gallus cisTarget database. First, I got the Axolotl gene 10kb up- and 10kb downstream of the TSS fasta file: image

Second, I got a set of motifs from wget http://jaspar.genereg.net/download/CORE/JASPAR2020_CORE_vertebrates_non-redundant_pfms_jaspar.txt and replaced the homologous gene for Axolotl gene id,and final changed it for motifs_cb_format.

Luckly, I got the feather file as the follow: image image

Third, I built the motif2tf database by loading the human motif2TF file, and replaced the human gene symbols by homologous genes from my species,but I don't know which feather file can I use to run the TF and I put all the feather file as database reference, and then I run: image

Finally,it came out the motifs_vs_regions.adjacencies.tsv file and got error: image

Also , I got many WARNING mesage like that : 2021-11-09 15:51:53,032 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for AMEX60DD011386 could be mapped to test.cross_species.regions_vs_motifs.rankings. Skipping this module.

I have no idea for that, I will be appreciated it if you can reply me soon, Thank you all! Winnie

fengweimin-maker avatar Nov 09 '21 10:11 fengweimin-maker

use the test.genes_vs_motifs.rankings.feather.

Did you add this option when creating the gene database (it will strip of #some_number from the region names in your fasta file and should match the gene IDs you provide to pySCENIC)? -g "#[0-9]+$"

import pyarrow as pa
import pyarrow.feather as pf
import pandas as pd

motifs_vs_genes_ctx_db = 'test.motifs_vs_genes.rankings.feather'

gene_names_df = pf.read_feather(motifs_vs_genes_ctx_db, columns=['genes'])

print(gene_names_df)


genes_vs_motifs_ctx_db = 'test.genes_vs_motifs.rankings.feather'

genes_vs_motifs_ctx_df = pf.read_feather(genes_vs_motifs_ctx_db)

print(genes_vs_motifs_ctx_df)

ghuls avatar Nov 09 '21 18:11 ghuls

it very nice of you replied me so soon,but I have no test.genes_vs_regions.rankings.feather,I got the file test.cross_species.motifs_vs_regions.rankings.feather test.cross_species.regions_vs_motifs.rankings.feather test.motifs_vs_regions.rankings.feather test.motifs_vs_regions.scores.feather
test.regions_vs_motifs.rankings.feather
test.regions_vs_motifs.scores.feather

May be I have something wrong for building the rank database,I ran your script

import pyarrow as pa
import pyarrow.feather as pf
import pandas as pd

genes_vs_regions_ctx_db = 'test.motifs_vs_regions.rankings.feather'
gene_names_df = pf.read_feather(genes_vs_regions_ctx_db , columns=['genes'])
print(gene_names_df)

but it came out the error: image There are no columns=['genes'] in my test.motifs_vs_regions.rankings.feather data.

Then, I run:

import pyarrow as pa
import pyarrow.feather as pf
import pandas as pd

genes_vs_regions_ctx_db = 'test.motifs_vs_regions.rankings.feather'
gene_names_df = pf.read_feather(genes_vs_regions_ctx_db)
print(gene_names_df)

At last it came out: image

There are 'regions' in my file column. I have no idea the option that you mean, can you get some advance for me to build Axolotl cisTarget database.Thank you very much!

fengweimin-maker avatar Nov 10 '21 13:11 fengweimin-maker

Start in a new dir (or move/delete the current feather files) and create a gene rankings database:

fasta_filename=
motifs_dir=
motifs_list_filename=
db_prefix=

nbr_threads=22


# Create gene rankings database.
"${create_cistarget_databases_dir}/create_cistarget_motif_databases.py" \
    -f "${fasta_filename}" \
    -M "${motifs_dir}" \
    -m "${motifs_list_filename}" \
    -g "#[0-9]+$" \
    -o "${db_prefix}" \
    -t "${nbr_threads}"

ghuls avatar Nov 10 '21 20:11 ghuls

It so nice of you reply me quick and your advance help me a lot. My test_build_database is running now and maybe need some time for giving out the gene rankings database .But I have another question is that when I run pyscenic ctx,whether it need to make the format like gene_id#1 of the rownames of count data since the database gene name with the format of gene_id#1? Just like the following: image Thank you very much!

fengweimin-maker avatar Nov 11 '21 08:11 fengweimin-maker

as long as the "gene" names match in the rankings database and the expression matrix, it should work.

ghuls avatar Nov 11 '21 14:11 ghuls

ok,Thank you for your advance. According to your guidance,after adding the option of -g "#[0-9]+$" \ I got feather files: image

the test.genes_vs_motifs.rankings.feather format: image

and thetest.motifs_vs_genes.rankings.feather format: image

Can I use test.genes_vs_motifs.rankings.feather, test.motifs_vs_genes.rankings.feather? If I can,which one should I use as the input database?

Also,I don't know why I can't get the test.genes_vs_regions.rankings.feather?Are there something wrong in my script? my script : image

Thank you

fengweimin-maker avatar Nov 13 '21 07:11 fengweimin-maker

You can only use test.genes_vs_motifs.rankings.feather. All other feather files can be deleted, they are needed to create test.genes_vs_motifs.rankings.feather.

  • test.motifs_vs_genes.scores.feather: Cluster-Buster creates scores for regions/genes in your FASTA file per motif
  • test.motifs_vs_genes.scores.feather ==> test.motifs_vs_genes.rankings.feather: Scores for regions/genes per motif are converted to a ranking
  • test.motifs_vs_genes.scores.feather ==> test.genes_vs_motifs.rankings.feather: for pySCENIC we need rankings for each motif per region/gene (transposed test.motifs_vs_genes.rankings.feather).

ghuls avatar Nov 13 '21 12:11 ghuls

I have tested only use test.genes_vs_motifs.rankings.feather,.But it also gave an error of No columns to parse from file when I ran pyscenic ctx. image

May be the colnames in my is not correct? But I think all the feather file/motif2TF file/scRNA matrix gene names are the same format(gene id),why it didn't match ? feather file format: image

motif2TF file format: ( replaced the human gene symbols by homologous gene for Axolotl gene id,if it can't be replaced, it will be retian human gene symbols): image

scRNA matrix format: image

Another questions is that you told me I need to use test.genes_vs_regions.rankings.feather and I need to create a gene rankings database. But now I got test.genes_vs_motifs.rankings.feather after your instruction, and I need to only use test.genes_vs_motifs.rankings.feather, I am confused that whether both the test.genes_vs_regions.rankings.feather and test.genes_vs_motifs.rankings.feather are the same? Motifs also means regions?

So kind of you reply me a lot, Thank you!

fengweimin-maker avatar Nov 14 '21 11:11 fengweimin-maker

Instead of test.genes_vs_regions.rankings.feather, it should have been test.genes_vs_motifs.rankings.feather, my bad.

ghuls avatar Nov 14 '21 11:11 ghuls

ok,it‘s doesn't matter. When I ran pyscenic ctx,the input data format just as the previous said, but it also came out an error: image

Why the Signatures dataframe is empty? May be your advance can help me a lot, and thank you for you reply!

fengweimin-maker avatar Nov 16 '21 06:11 fengweimin-maker

How does your signatures file look like?

ghuls avatar Nov 16 '21 09:11 ghuls

I‘m sorry I don't konw which is the signatures file. Now I need to try to run it again and learn from the tutorial. If it also comes error,I will get in touch with you.Thank you for reply me so quick.

fengweimin-maker avatar Nov 17 '21 12:11 fengweimin-maker

ok,it‘s doesn't matter. When I ran pyscenic ctx,the input data format just as the previous said, but it also came out an error: image

Why the Signatures dataframe is empty? May be your advance can help me a lot, and thank you for you reply!

Hi, do you solve this problem? I face the same problem. I will be very appreciate you if you have some solution. Thanks. Lee

frucelee avatar Mar 06 '22 09:03 frucelee

Sorry,I didn't solve the problem. if you have the solution,please tell me,Thanks

fengweimin-maker avatar Mar 06 '22 14:03 fengweimin-maker