xpore icon indicating copy to clipboard operation
xpore copied to clipboard

Errors while running xpore dataprep

Open JBerthelier opened this issue 3 years ago • 7 comments

Dear GoekeLab,

I am trying to run xpore on the cluster of our institute, everythings goes well using the demo data, however I got this error/warning while running xpore dataprep with my own data, by chance do you have any ideas of the causes and how to fix it ?


Error. nthreads cannot be larger than environment variable "NUMEXPR_MAX_THREADS" (64)/home/mycomputer/.local/ lib/python3.7/site-packages/xpore-2.1-py3.7.egg/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance. pos_end += eventalign_result.loc[index]['line_length'].sum() /home/mycomputer/.local/lib/python3.7/site-packages/xpore-2.1-py3.7.egg/xpore/scripts/dataprep.py:72: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead


Best regards,

Jeremy

JBerthelier avatar Nov 02 '21 07:11 JBerthelier

Hi Jeremy,

Thanks for reaching out! It will be great if you can provide the command you used for running xpore dataprep. Other than that, you can also look into the following two things:

  1. After you see this error/warning, was xpore dataprep still generating the dataprep/data.json file (see whether it increases in size by ls -lh dataprep/data.json)? If yes, xpore dataprep is still running fine.
  2. What is the value you put in for --n_processes? Is this value larger than your environment variable "NUMEXPR_MAX_THREADS"? If yes, you might want to either change --n_processes to a smaller value or increase the value of your environment variable "NUMEXPR_MAX_THREADS"

Best wishes, Yuk Kei

yuukiiwa avatar Nov 08 '21 01:11 yuukiiwa

Hi Yuk Kei,

I’m working with Jeremy on running xpore dataprep.

Here is the command I used for running xpore data prep:

xpore dataprep \
--eventalign “eventalign_Araport11_GTF_genes_transposons-col0.txt" \
--gtf_or_gff “Araport11_GTF_genes_transposons_final_xpore.sorted.gtf" \
--transcript_fasta “Araport11_GTF_genes_transposons.fa" \
--out_dir dataprep \
--genome

After seeing the error/warning, xpore dataprep only generated the eventalign.index file. No other output files are generated when I try to run xpore dataprep.

Best, Erika

erika-fukuhara avatar Nov 10 '21 03:11 erika-fukuhara

Hi Erika,

Thank you for the information! Do you mind showing me the head of eventalign_Araport11_GTF_genes_transposons-col0.txt, Araport11_GTF_genes_transposons_final_xpore.sorted.gtf, and Araport11_GTF_genes_transposons.fa, please? I am suspecting that this might be due to a customized gtf file.

Thanks!

Best wishes, Yuk Kei

yuukiiwa avatar Nov 11 '21 08:11 yuukiiwa

Hi Yuk Kei,

Here is the head for the eventalign.txt, GTF, and FASTA files.

eventalign_Araport11_GTF_genes_transposons-col0.txt:

contig	position	reference_kmer	read_index	strand	event_index	event_level_mean	event_stdv	event_length	model_kmer	model_meamodel_stdv	standardized_level	start_idx	end_idx
AT1G01020.2	426	TTCTG	29	t	429	78.67	1.821	0.00664	TTCTG	79.59	2.07	-0.36	29062	29082
AT1G01020.2	426	TTCTG	29	t	430	82.91	1.990	0.00332	TTCTG	79.59	2.07	1.32	29052	29062
AT1G01020.2	427	TCTGA	29	t	431	95.35	1.866	0.00232	TCTGA	91.37	2.85	1.15	29045	29052
AT1G01020.2	427	TCTGA	29	t	432	99.25	1.877	0.00631	TCTGA	91.37	2.85	2.27	29026	29045
AT1G01020.2	427	TCTGA	29	t	433	94.57	2.016	0.00266	TCTGA	91.37	2.85	0.92	29018	29026
AT1G01020.2	427	TCTGA	29	t	434	98.04	1.761	0.00797	TCTGA	91.37	2.85	1.92	28994	29018
AT1G01020.2	428	CTGAT	29	t	435	122.09	3.429	0.00730	CTGAT	111.64	4.49	1.91	28972	28994
AT1G01020.2	428	CTGAT	29	t	436	117.08	2.426	0.00299	CTGAT	111.64	4.49	0.99	28963	28972
AT1G01020.2	429	TGATT	29	t	437	136.43	6.966	0.00266	TGATT	127.73	5.10	1.40	28955	28963

Araport11_GTF_genes_transposons_final_xpore.sorted.gtf:

1	Araport11	transcript	3631	5899	.	+	.	gene_id "AT1G01010"; transcript_id "AT1G01010.1";
1	Araport11	exon	3631	3913	.	+	.	gene_id "AT1G01010"; transcript_id "AT1G01010.1";
1	Araport11	exon	3996	4276	.	+	.	gene_id "AT1G01010"; transcript_id "AT1G01010.1";
1	Araport11	exon	4486	4605	.	+	.	gene_id "AT1G01010"; transcript_id "AT1G01010.1";
1	Araport11	exon	4706	5095	.	+	.	gene_id "AT1G01010"; transcript_id "AT1G01010.1";
1	Araport11	exon	5174	5326	.	+	.	gene_id "AT1G01010"; transcript_id "AT1G01010.1";
1	Araport11	exon	5439	5899	.	+	.	gene_id "AT1G01010"; transcript_id "AT1G01010.1";
1	Araport11	exon	6788	7069	.	-	.	gene_id "AT1G01020"; transcript_id "AT1G01020.2";
1	Araport11	exon	6788	7069	.	-	.	gene_id "AT1G01020"; transcript_id "AT1G01020.6";
1	Araport11	exon	6788	7069	.	-	.	gene_id "AT1G01020"; transcript_id "AT1G01020.1";

Araport11_GTF_genes_transposons.fa:

>AT1G01010.1
AAATTATTAGATATACCAAACCAGAGAAAACAAATACATAATCGGAGAAATACAGATTACAGAGAGCGAG
AGAGATCGACGGCGAAGCTCTTTACCCGGAAACCATTGAAATCGGACGGTTTAGTGAAAATGGAGGATCA
AGTTGGGTTTGGGTTCCGTCCGAACGACGAGGAGCTCGTTGGTCACTATCTCCGTAACAAAATCGAAGGA
AACACTAGCCGCGACGTTGAAGTAGCCATCAGCGAGGTCAACATCTGTAGCTACGATCCTTGGAACTTGC
GCTTCCAGTCAAAGTACAAATCGAGAGATGCTATGTGGTACTTCTTCTCTCGTAGAGAAAACAACAAAGG
GAATCGACAGAGCAGGACAACGGTTTCTGGTAAATGGAAGCTTACCGGAGAATCTGTTGAGGTCAAGGAC
CAGTGGGGATTTTGTAGTGAGGGCTTTCGTGGTAAGATTGGTCATAAAAGGGTTTTGGTGTTCCTCGATG
GAAGATACCCTGACAAAACCAAATCTGATTGGGTTATCCACGAGTTCCACTACGACCTCTTACCAGAACA
TCAGAGGACATATGTCATCTGCAGACTTGAGTACAAGGGTGATGATGCGGACATTCTATCTGCTTATGCA

Thank you, Erika

erika-fukuhara avatar Nov 12 '21 05:11 erika-fukuhara

Hi Erika,

Thank you for sharing the eventalign.txt, GTF, and FASTA files! Those should be compatible with xpore dataprep.

I think you should look into the first line of the error message Error. nthreads cannot be larger than environment variable "NUMEXPR_MAX_THREADS", which contacting the cluster maintainers of your institute will help.

Thanks!

Best wishes, Yuk Kei

yuukiiwa avatar Nov 15 '21 13:11 yuukiiwa

hello Yuk Kei

I am also trying to use xpore dataprep and Encountered the same problem,the dataprep/eventalign.index is generating, but data.json , data.index, data.log and data.readcount is empty, I have no idea about it and may I ask for your help? The command I running xpore dataprep is

 xpore dataprep \
--eventalign data/${file}/nanopolish/eventalign.txt \
--gtf_or_gff all.gtf \
 --transcript_fasta ref.fa \
--out_dir data/${file}/dataprep \
--genome

I got error

/mycomputer/miniconda3/lib/python3.9/site-packages/xpore-2.1-py3.9.egg/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance.
  pos_end += eventalign_result.loc[index]['line_length'].sum()
/mycomputer/miniconda3/lib/python3.9/site-packages/xpore-2.1-py3.9.egg/xpore/scripts/dataprep.py:72: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

And my eventalign.txt, GTF, and FASTA all seem like @erika-fukuhara, do you solve this problem or have any suggestion?

Thank you! Jeffer

jeffersmith avatar Feb 06 '22 14:02 jeffersmith

Hey,

I'm having the same problem. I run xpore dataprep but the data.json data.log and other files are empty.

Do you know how we can fix it?

acarmas1 avatar Apr 26 '22 23:04 acarmas1