ValueError: operands could not be broadcast
I try to run the PastML protocol using a tree with 562 tips, however when i start the run this notification comes to my shell
Traceback (most recent call last):
File "/opt/miniconda3/envs/pastmlenv/bin/pastml", line 8, in
I currently use a MacOS terminal in the conda environment whit the comand line pastml --tree AY3.1_test_rooted_raxml --data Database_test_AY3.1.csv --columns SequenceID.1 PHZ --html_compressed AY3.1_map.html --data_sep , -v
Im very new using this program hope you can help me
Dear @AlbertoA3 ,
Thank you for reporting this issue. Could you please tell me which PastML version you are using (you can check that with "pastml --version" command)?
Many thanks for your replay
I currently use the pastml 1.9.46 version
Dear @AlbertoA3 ,
Is there any chance you could give me an example of data for which this problem occurs (you can send it by email to [email protected] if you prefer)? It is hard for me to reproduce the problem and see its source otrherwise.
Thank you
Dear Anna,
I'm running into the same error on the same line, on a much smaller tree. Also on mac, in a conda environment, but using it in a jupyter notebook.
From pastml.__dict__: 'PASTML_VERSION': '1.9.50'
Data files (tree is *.nwk, had to change suffix for github to accept the upload):
concatenated.characters.csv introits_tree4_divtime.txt
Code is from the README "Basic usage in Python". The problem starts appearing when I run this on more than 2 columns.
from pastml.acr import pastml_pipeline
data = "concatenated.characters.csv"
tree = "introits_tree4_divtime.nwk"
# Columns present in the annotation table,
# for which we want to reconstruct ancestral states.
# Works for start_column = 0, n_columns < 3.
start_column = 0
n_columns = 2
columns = ['c{}'.format(start_column + i) for i in range(n_columns)]
pastml_pipeline(data=data, data_sep=',',
columns=columns,
tree=tree,
html_compressed="acr_introits_tree4.html",
html= "acr_introits_tree4.html",
verbose=True)
Also, I tested this with different configurations of start_column and n_columns, and different errors started appearing.
For start_column=115 and n_columns=4, I got the same broadcast error but with parameters (3,3) and 4.
For start_column=110 and n_columns=4, I got the error below.
Given how inconsistently errors appear with respect to the choice of columns, my suspicion is that maybe the data breaks some assumptions, because it's from a completely different domain (Gregorian chant melodies), we just treat them as morphological characters, but I don't have any specific reasons to suspect this.
pastml:DEBUG:10:35:08
=============INPUT DATA VALIDATION=============
pastml:DEBUG:10:35:08 Read the trees introits_tree4_divtime.nwk.
pastml:DEBUG:10:35:08 Read the annotation file concatenated.characters.csv.
pastml:DEBUG:10:35:08 Checked that (at least some of) tip names correspond to annotation file index.
pastml:DEBUG:10:35:08 Finished input validation.
pastml:DEBUG:10:35:08
=============TREE STATISTICS===================
number of tips: 23
number of zero-branch tips: 0
number of internal nodes: 22
max number of children per node: 2
max tip branch length: 577.13130
max internal branch length: 144.64440
min non-zero tip branch length: 28.93167
min non-zero internal branch length: 21.62325
avg non-zero tip branch length: 148.69650
avg non-zero internal branch length: 62.38971
avg non-zero branch length: 107.50462.
pastml:DEBUG:10:35:08
=============ACR===============================
pastml:DEBUG:10:35:08 ACR settings for c110:
Method: MPPA
Model: F81.
pastml:DEBUG:10:35:08 Observed frequencies for c110:
frequency of -: 0.913043
frequency of k: 0.086957.
pastml:DEBUG:10:35:08 ACR settings for c111:
Method: MPPA
Model: F81.
pastml:DEBUG:10:35:08 Observed frequencies for c111:
frequency of l: 1.000000.
pastml:DEBUG:10:35:08 ACR settings for c112:
Method: MPPA
Model: F81.
pastml:DEBUG:10:35:08 Observed frequencies for c112:
frequency of l: 1.000000.
pastml:DEBUG:10:35:08 ACR settings for c113:
Method: MPPA
Model: F81.
pastml:DEBUG:10:35:08 Observed frequencies for c113:
frequency of k: 0.956522
frequency of l: 0.043478.
pastml:DEBUG:10:35:08 Initial values for c111 parameter optimisation:
scaling factor: 0.009302, i.e. 1.000000 changes per avg branch (optimised)
smoothing factor: 0.000000 (fixed)
frequencies (optimised)
l: 1
log likelihood: 0.000000.
pastml:DEBUG:10:35:08 Initial values for c113 parameter optimisation:
scaling factor: 0.009302, i.e. 1.000000 changes per avg branch (optimised)
smoothing factor: 0.000000 (fixed)
frequencies (optimised)
k: 0.5
l: 0.5
log likelihood: -15.550011.
pastml:DEBUG:10:35:08 Initial values for c110 parameter optimisation:
scaling factor: 0.009302, i.e. 1.000000 changes per avg branch (optimised)
smoothing factor: 0.000000 (fixed)
frequencies (optimised)
-: 0.5
k: 0.5
log likelihood: -15.483167.
pastml:DEBUG:10:35:08 Initial values for c112 parameter optimisation:
scaling factor: 0.009302, i.e. 1.000000 changes per avg branch (optimised)
smoothing factor: 0.000000 (fixed)
frequencies (optimised)
l: 1
log likelihood: 0.000000.
pastml:DEBUG:10:35:08 Pre-optimised basic parameters for c113:
scaling factor: 0.002938, i.e. 0.315812 changes per avg branch (optimised)
smoothing factor: 0.000000 (fixed)
log likelihood: -4.112543.
pastml:DEBUG:10:35:08 Pre-optimised basic parameters for c110:
scaling factor: 0.001519, i.e. 0.163275 changes per avg branch (optimised)
smoothing factor: 0.000000 (fixed)
log likelihood: -7.193597.
pastml:DEBUG:10:35:09 Optimised parameters for c113:
scaling factor: 0.002938, i.e. 0.315819 changes per avg branch (optimised)
smoothing factor: 0.000000 (fixed)
frequencies (optimised)
k: 0.956522
l: 0.0434783
log likelihood: -4.112543
pastml:DEBUG:10:35:09 Optimised parameters for c110:
scaling factor: 0.001925, i.e. 0.206981 changes per avg branch (optimised)
smoothing factor: 0.000000 (fixed)
frequencies (optimised)
-: 0.905005
k: 0.0949948
log likelihood: -6.731932
pastml:DEBUG:10:35:09 Log likelihood for c113 after JOINT state selection: -4.868564
pastml:DEBUG:10:35:09 Log likelihood for c110 after JOINT state selection: -7.447170
pastml:DEBUG:10:35:09 Log likelihood for c113 after MAP state selection: -4.868564
pastml:DEBUG:10:35:09 Log likelihood for c110 after MAP state selection: -7.447170
pastml:DEBUG:10:35:09 0 nodes are unresolved (0.00%) for c113 by MPPA, i.e. 1.0000 state per node in average.
pastml:DEBUG:10:35:09 0 nodes are unresolved (0.00%) for c110 by MPPA, i.e. 1.0000 state per node in average.
pastml:DEBUG:10:35:09 Log likelihood for c113 after MPPA state selection: -4.868564
pastml:DEBUG:10:35:09 Log likelihood for c110 after MPPA state selection: -7.447170
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[32], line 21
18 # (Optional) path to the output tree visualisation
19 html = "acr_introits_tree4.html"
---> 21 pastml_pipeline(data=data, data_sep=',',
22 columns=columns,
23 tree=tree,
24 html_compressed=html_compressed,
25 html=html,
26 verbose=True)
File ~/anaconda3/envs/cestf/lib/python3.10/site-packages/pastml/acr.py:542, in pastml_pipeline(tree, data, data_sep, id_index, columns, prediction_method, model, parameters, rate_matrix, name_column, root_date, timeline_type, tip_size_threshold, colours, out_data, html_compressed, html, html_mixed, work_dir, verbose, forced_joint, upload_to_itol, itol_id, itol_project, itol_tree_name, offline, threads, reoptimise, focus, resolve_polytomies, smoothing, frequency_smoothing, pajek, pajek_timing, recursion_limit)
539 if threads < 1:
540 threads = max(os.cpu_count(), 1)
--> 542 acr_results = acr(forest=roots, columns=columns, column2states=column2states,
543 prediction_method=prediction_method, model=model, column2parameters=parameters,
544 column2rates=rates,
545 force_joint=forced_joint, threads=threads, reoptimise=reoptimise, tau=None if smoothing else 0,
546 resolve_polytomies=resolve_polytomies, frequency_smoothing=frequency_smoothing)
548 column2states = {acr_result[CHARACTER]: acr_result[STATES] for acr_result in acr_results}
550 if not out_data:
File ~/anaconda3/envs/cestf/lib/python3.10/site-packages/pastml/acr.py:229, in acr(forest, df, columns, column2states, prediction_method, model, column2parameters, column2rates, force_joint, threads, reoptimise, tau, resolve_polytomies, frequency_smoothing)
226 if threads > 1:
227 with ThreadPool(processes=threads - 1) as pool:
228 acr_results = \
--> 229 pool.map(func=_work, iterable=character2settings.keys())
230 else:
231 acr_results = [_work(character) for character in character2settings.keys()]
File ~/anaconda3/envs/cestf/lib/python3.10/multiprocessing/pool.py:367, in Pool.map(self, func, iterable, chunksize)
362 def map(self, func, iterable, chunksize=None):
363 '''
364 Apply `func` to each element in `iterable`, collecting the results
365 in a list that is returned.
366 '''
--> 367 return self._map_async(func, iterable, mapstar, chunksize).get()
File ~/anaconda3/envs/cestf/lib/python3.10/multiprocessing/pool.py:774, in ApplyResult.get(self, timeout)
772 return self._value
773 else:
--> 774 raise self._value
File ~/anaconda3/envs/cestf/lib/python3.10/multiprocessing/pool.py:125, in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception)
123 job, i, func, args, kwds = task
124 try:
--> 125 result = (True, func(*args, **kwds))
126 except Exception as e:
127 if wrap_exception and func is not _helper_reraises_exception:
File ~/anaconda3/envs/cestf/lib/python3.10/multiprocessing/pool.py:48, in mapstar(args)
47 def mapstar(args):
---> 48 return list(map(*args))
File ~/anaconda3/envs/cestf/lib/python3.10/site-packages/pastml/acr.py:218, in acr.<locals>._work(character)
216 return {CHARACTER: character, STATES: model_or_states, METHOD: prediction_method}
217 if is_ml(prediction_method):
--> 218 return ml_acr(forest=forest, character=character, prediction_method=prediction_method,
219 model=model_or_states,
220 force_joint=force_joint, observed_frequencies=observed_frequencies)
221 if is_parsimonious(prediction_method):
222 return parsimonious_acr(forest=forest, character=character, prediction_method=prediction_method,
223 states=model_or_states,
224 num_nodes=forest_stats.num_nodes, num_tips=forest_stats.num_tips)
File ~/anaconda3/envs/cestf/lib/python3.10/site-packages/pastml/ml.py:659, in ml_acr(forest, character, prediction_method, model, observed_frequencies, force_joint)
643 """
644 Calculates ML states on the trees and stores them in the corresponding feature.
645
(...)
655 :rtype: dict
656 """
657 logger = logging.getLogger('pastml')
658 likelihood = \
--> 659 optimise_likelihood(forest=forest, character=character, model=model,
660 observed_frequencies=observed_frequencies)
661 result = {LOG_LIKELIHOOD: likelihood, CHARACTER: character, METHOD: prediction_method, MODEL: model,
662 STATES: model.states}
664 results = []
File ~/anaconda3/envs/cestf/lib/python3.10/site-packages/pastml/ml.py:895, in optimise_likelihood(forest, character, model, observed_frequencies)
893 if not model.basic_params_fixed():
894 model.fix_extra_params()
--> 895 likelihood = optimize_likelihood_params(forest=forest, character=character, model=model,
896 observed_frequencies=observed_frequencies)
897 if np.any(np.isnan(likelihood) or likelihood == -np.inf):
898 raise PastMLLikelihoodError('Failed to optimise the likelihood for your tree, '
899 'please check that you do not have contradicting {} states specified '
900 'for internal tree nodes, '
901 'and if not - submit a bug at https://github.com/evolbioinfo/pastml/issues'
902 .format(character))
File ~/anaconda3/envs/cestf/lib/python3.10/site-packages/pastml/ml.py:220, in optimize_likelihood_params(forest, character, observed_frequencies, model)
218 model.frequencies = observed_frequencies
219 x0_EFT = model.get_optimised_parameters()
--> 220 log_lh_JC = -get_v(x0_JC)
221 log_lh_EFT = log_lh_JC if not optimise_frequencies else -get_v(x0_EFT)
223 best_log_lh = max(log_lh_JC, log_lh_EFT)
File ~/anaconda3/envs/cestf/lib/python3.10/site-packages/pastml/ml.py:207, in optimize_likelihood_params.<locals>.get_v(ps)
205 return np.nan
206 model.set_params_from_optimised(ps)
--> 207 res = sum(get_bottom_up_loglikelihood(tree=tree, character=character, is_marginal=True, model=model, alter=True)
208 for tree in forest)
209 return np.inf if pd.isnull(res) else -res
File ~/anaconda3/envs/cestf/lib/python3.10/site-packages/pastml/ml.py:207, in <genexpr>(.0)
205 return np.nan
206 model.set_params_from_optimised(ps)
--> 207 res = sum(get_bottom_up_loglikelihood(tree=tree, character=character, is_marginal=True, model=model, alter=True)
208 for tree in forest)
209 return np.inf if pd.isnull(res) else -res
File ~/anaconda3/envs/cestf/lib/python3.10/site-packages/pastml/ml.py:111, in get_bottom_up_loglikelihood(tree, character, model, is_marginal, alter)
109 get_pij = model.get_Pij_t
110 for node in tree.traverse('postorder'):
--> 111 calc_node_bu_likelihood(node, allowed_state_feature, lh_feature, lh_sf_feature, lh_joint_state_feature,
112 is_marginal, get_pij)
113 root_likelihoods = getattr(tree, lh_feature) * model.frequencies
114 root_likelihoods = root_likelihoods.sum() if is_marginal else root_likelihoods.max()
File ~/anaconda3/envs/cestf/lib/python3.10/site-packages/pastml/ml.py:139, in calc_node_bu_likelihood(node, allowed_state_feature, lh_feature, lh_sf_feature, lh_joint_state_feature, is_marginal, get_pij)
137 child_likelihoods = child_likelihoods.max(axis=1)
138 child_likelihoods = np.maximum(child_likelihoods, 0)
--> 139 log_likelihood_array += np.log10(child_likelihoods)
140 if np.all(log_likelihood_array == -np.inf):
141 raise PastMLLikelihoodError("The parent node {} and its child node {} have non-intersecting states, "
142 "and are connected by a zero-length ({:g}) branch. "
143 "This creates a zero likelihood value. "
144 "To avoid this issue check the restrictions on these node states "
145 "and/or use a smoothing factor (tau)."
146 .format(node.name, child.name, child.dist))
ValueError: non-broadcastable output operand with shape (1,) doesn't match the broadcast shape (2,)
Dear hajicj,
Thank you for the detailed problem description. I will have a look at it next week and come back to you.
Any luck on this?
Dear hajicj,
I am sorry it took me so long. Thank you so much for the detailed example, it was extremely helpful in finding the issue! I found the bug and fixed it in version 1.9.51. There was a problem when several columns were analysed at once: the observed equilibrium frequencies were reused between the columns instead of being independent, if in your previous analyses you used only one column - they should be fine and unaffected by this bug.
Sorry for this issue and the time it took me to fix it.
Dear Anna,
just confirming that the updated version works. Thank you very much!
Jan