xpore
xpore copied to clipboard
error in datapre step
Hello, thanks for your great tool!
Recently I am trying to run xpore on my data, however, there is an error stating that:
File "pandas/_libs/lib.pyx", line 2411, in pandas._libs.lib.maybe_convert_numeric ValueError: Unable to parse string "114.817,115.635,109.092,117.816,123.814,102.277" at position 0
could you please help me fix this problem? Thank you very much!
Hi @huawen-poppy,
Do you mind sharing the command you use and head
all the inputs (eventalign.txt
, gtf
, and fasta
) you use, please?
Thanks!
Best wishes, Yuk Kei
Thank you for your response!
I was using the command xpore dataprep --eventalign eventalign.txt --gtf_or_gff CC7.gtf --transcript_fasta aip.genome_models.no_isoforms.no_duplication.mRNA.fa --out_dir ./output --n_process 32
I figured out the error sourced form the eventalign file, in which I have an extra column containing the strings '114.817,115.635,109.092,117.816,123.814,102.277'. Now I deleted the extra column. But it comes with another error:
`/home/zhonh0b/miniconda3/envs/epigenetic/lib/python3.8/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance.
pos_end += eventalign_result.loc[index]['line_length'].sum()
/home/zhonh0b/miniconda3/envs/epigenetic/lib/python3.8/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance.
pos_end += eventalign_result.loc[index]['line_length'].sum()
/home/zhonh0b/miniconda3/envs/epigenetic/lib/python3.8/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance.
pos_end += eventalign_result.loc[index]['line_length'].sum()
/home/zhonh0b/miniconda3/envs/epigenetic/lib/python3.8/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance.
pos_end += eventalign_result.loc[index]['line_length'].sum()
/home/zhonh0b/miniconda3/envs/epigenetic/lib/python3.8/site-packages/xpore/scripts/dataprep.py:72: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy chunk_split['line_length'] = np.array(lines)`
The header of eventalign.txt file is
the header of the gtf file is:
the header of the fasta file is:
Could you please help me sovle this problem? Thanks!
Hi @huawen-poppy,
To convert the transcript position to genome position, you will have to include the --genome
flag too:
xpore dataprep --eventalign eventalign.txt --gtf_or_gff CC7.gtf --transcript_fasta aip.genome_models.no_isoforms.no_duplication.mRNA.fa --genome --out_dir ./output --n_process 32
Do you mind sharing the full error message from xpore diffmod
, please?
Thanks!
Best wishes, Yuk Kei
Hi. Thanks for your reply! I am still running the xpore diffmod process. So far there is no error messages. The head of the diffmod.table looks like:
Do you think I should cancel the current job and run the xpore dataprep process with adding flag --genome?
Hi @huawen-poppy,
If you don't need to convert the transcript coordinates to genomic coordinates, then you don't need the --genome
, --gtf_or_gff
, and --transcript_fasta
flags.
I have just noticed that the sequences of your fasta file are not capitalized. If you need transcript-to-genomic coordinate conversion, you can try capitalizing them.
Thanks!
Best wishes, Yuk Kei
Hi,
I have encountered " PerformanceWarning" as well. I still can have eventalign.index generated. Does it affect result?
Thanks!
Andrea
Thank you for your response! I was using the command
xpore dataprep --eventalign eventalign.txt --gtf_or_gff CC7.gtf --transcript_fasta aip.genome_models.no_isoforms.no_duplication.mRNA.fa --out_dir ./output --n_process 32
I figured out the error sourced form the eventalign file, in which I have an extra column containing the strings '114.817,115.635,109.092,117.816,123.814,102.277'. Now I deleted the extra column. But it comes with another error: `/home/zhonh0b/miniconda3/envs/epigenetic/lib/python3.8/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance. pos_end += eventalign_result.loc[index]['line_length'].sum() /home/zhonh0b/miniconda3/envs/epigenetic/lib/python3.8/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance. pos_end += eventalign_result.loc[index]['line_length'].sum() /home/zhonh0b/miniconda3/envs/epigenetic/lib/python3.8/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance. pos_end += eventalign_result.loc[index]['line_length'].sum() /home/zhonh0b/miniconda3/envs/epigenetic/lib/python3.8/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance. pos_end += eventalign_result.loc[index]['line_length'].sum() /home/zhonh0b/miniconda3/envs/epigenetic/lib/python3.8/site-packages/xpore/scripts/dataprep.py:72: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value insteadSee the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy chunk_split['line_length'] = np.array(lines)`
The header of eventalign.txt file is
the header of the gtf file is:
the header of the fasta file is:
Could you please help me sovle this problem? Thanks!
Hi @AndreaYCT,
This is warning that doesn't affect the results.
Thanks!
Best wishes, Yuk Kei