xpore
xpore copied to clipboard
Issue running dataprep
I've been trying to use xpore to identify m6A modifications in mRNA reads of honey bees.
I've already prepared my data using minimap2, samtools and nanopolish, but when I run dataprep using this code:
xpore dataprep
--eventalign /project02/insect_multiomics/camila/xpore/Bee_Thorax/data/WT/nanopolish/eventalign.txt
--gtf_or_gff /project02/insect_multiomics/camila/xpore/Bee_Thorax/GCF_003254395.2_Amel_HAv3.1_genomic.gff
--transcript_fasta /project02/insect_multiomics/camila/xpore/Bee_Thorax/GCF_003254395.2_Amel_HAv3.1_cds_from_genomic.fna
--out_dir dataprep
--genome
I'm getting this error and I do not know how to fix it, if someone please could help me I'll appreciate it.
/opt/anaconda3/2021.05/lib/python3.8/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance. pos_end += eventalign_result.loc[index]['line_length'].sum() /opt/anaconda3/2021.05/lib/python3.8/site-packages/xpore/scripts/dataprep.py:72: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
chunk_split['line_length'] = np.array(lines)
Traceback (most recent call last):
File "/opt/anaconda3/2021.05/bin/xpore", line 10, in
Hi @acarmas1,
Thank you for reporting the bug! Do you mind sharing your gff
file with us, please? I think this is due to the incompatibility of the gff
file with our annotation processing function.
Thank you!
Best wishes, Yuk Kei
Hi @yuukiiwa
Thanks for answering, this is my gff file: https://1drv.ms/u/s!AokqkR3muxL0g5QuiTk0rORmIBAHdg?e=WjqL1P
It looks like this:
Hi @acarmas1,
Thank you for sharing! I will look into this and get back to you hopefully by Friday.
Best wishes, Yuk Kei
Hi @acarmas1,
I have updated the readAnnotation()
function in the ncbi_honeybee_gff
branch which runs with the gff
file you provided. Do you mind installing xpore from the ncbi_honeybee_gff branch
and testing whether it works for you?
git clone https://github.com/GoekeLab/xpore.git
cd xpore
git checkout origin/ncbi_honeybee_gff
sudo python3 setup.py install
Thank you!
Best wishes, Yuk Kei
Hi Yuk,
I tried what you suggested and now I'm getting this error:
/opt/anaconda3/2021.05/lib/python3.8/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance. pos_end += eventalign_result.loc[index]['line_length'].sum() /opt/anaconda3/2021.05/lib/python3.8/site-packages/xpore/scripts/dataprep.py:72: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
chunk_split['line_length'] = np.array(lines)
Traceback (most recent call last):
File "/opt/anaconda3/2021.05/bin/xpore", line 33, in
Hi @acarmas1,
Do you mind sending me your fasta
file, please? Or do you have a gtf
file for honey bee instead? The gff
option currently "works with GENCODE or ENSEMBL FASTA files".
Thank you!
Best wishes, Yuk Kei
Hi Yuk,
This is my fasta file: https://1drv.ms/u/s!AokqkR3muxL0g5Q6tbjcR5wEDrwLrg?e=kL5f6I and yes I also have this gtf file: https://1drv.ms/u/s!AokqkR3muxL0g5Q7thCGIpc3mwk4Rw?e=IIgcUp
Thanks, Camila
Hi Camila (I will tag you here @acarmas1),
Sorry for the delayed reply (it was Lunar New Year out here)! I updated xpore dataprep
on the ncbi_honeybee
branch (https://github.com/GoekeLab/xpore/tree/ncbi_honeybee), which now works with your provided fasta
and gtf
files from the previous comment.
Thanks!
Best wishes, Yuk Kei
Hi,
Thank you so much, it worked. I just have one last question, I run xpore dataprep with a different fasta file, that looks like this:
Which has all the coding regions for the bees genome. Is there any difference between giving as an argument in --transcript_fasta this cds.fasta file or it the file has to be the reference genome?
Hi Camila (I will tag you here @acarmas1),
I am glad that the fix worked! We suggest running xpore dataprep --genome
with a cDNA.fasta
. Due to the formatting of the >
lines of your cds.fasta
file, xpore dataprep
will not work with it.
Thanks!
Best wishes, Yuk Kei
Hi Yuk, I run dataprep with the reference genome of honey bees, and it worked, but I got a diffmod.table empty, also I realized my data.json file is empty, too. I don't if that means the transcriptome of honeybees does not have the m6A modification, or something was wrong during the process.
Hi Camila,
Can you screenshot how the first few lines of your eventalign.txt
, please? Also, were you using the same cDNA.fasta file for nanopolish eventalign
?
Thanks!
Best wishes, Yuk Kei
Hi Yuk,
This is my eventalign.txt files and yes I used the same fasta file that I run in nanopolish eventalign.
After running xpore diffmode I got this message: Using the signal of unmodified RNA from /opt/anaconda3/2021.05/lib/python3.8/site-packages/xpore/diffmod/model_kmer.csv 0 ids to be testing ... And the diffmod.table is empty.
I am running the data prep on with the command I had used the command for this sample xpore dataprep --eventalign eventalign.txt --out_dir 04_dataprep --n_processes 30 --readcount_min 5 the I am running into multiple issues with it.
for one sample it ran fine. but for others I get different issues.
for one sample it is the error below but is still proceeding:
/hpcnfs/data/cgb/conda_envs/xpore2.0/bin/xpore:33: DtypeWarning: Columns (7) have mixed types.Specify dtype option on import or set low_memory=False.
sys.exit(load_entry_point('xpore==2.0', 'console_scripts', 'xpore')())
Traceback (most recent call last):
File "/hpcnfs/data/cgb/conda_envs/xpore2.0/bin/xpore", line 33, in
The other is gg/xpore/scripts/dataprep.py:72: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy chunk_split['line_length'] = np.array(lines) Traceback (most recent call last): File "/hpcnfs/software/anaconda/anaconda3/envs/env_p37/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1618, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1626, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'transcript_id'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/hpcnfs/data/cgb/conda_envs/xpore2.0/bin/xpore", line 33, in