pastml icon indicating copy to clipboard operation
pastml copied to clipboard

ValueError: operands could not be broadcast

Open AlbertoA3 opened this issue 1 year ago • 3 comments

I try to run the PastML protocol using a tree with 562 tips, however when i start the run this notification comes to my shell

Traceback (most recent call last): File "/opt/miniconda3/envs/pastmlenv/bin/pastml", line 8, in sys.exit(main()) ^^^^^^ File "/opt/miniconda3/envs/pastmlenv/lib/python3.12/site-packages/pastml/acr.py", line 1124, in main pastml_pipeline(**vars(params)) File "/opt/miniconda3/envs/pastmlenv/lib/python3.12/site-packages/pastml/acr.py", line 543, in pastml_pipeline acr_results = acr(forest=roots, columns=columns, column2states=column2states, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/miniconda3/envs/pastmlenv/lib/python3.12/site-packages/pastml/acr.py", line 230, in acr pool.map(func=_work, iterable=character2settings.keys()) File "/opt/miniconda3/envs/pastmlenv/lib/python3.12/multiprocessing/pool.py", line 367, in map return self._map_async(func, iterable, mapstar, chunksize).get() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/miniconda3/envs/pastmlenv/lib/python3.12/multiprocessing/pool.py", line 774, in get raise self._value File "/opt/miniconda3/envs/pastmlenv/lib/python3.12/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) ^^^^^^^^^^^^^^^^^^^ File "/opt/miniconda3/envs/pastmlenv/lib/python3.12/multiprocessing/pool.py", line 48, in mapstar return list(map(*args)) ^^^^^^^^^^^^^^^^ File "/opt/miniconda3/envs/pastmlenv/lib/python3.12/site-packages/pastml/acr.py", line 219, in _work return ml_acr(forest=forest, character=character, prediction_method=prediction_method, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/miniconda3/envs/pastmlenv/lib/python3.12/site-packages/pastml/ml.py", line 659, in ml_acr optimise_likelihood(forest=forest, character=character, model=model, File "/opt/miniconda3/envs/pastmlenv/lib/python3.12/site-packages/pastml/ml.py", line 895, in optimise_likelihood likelihood = optimize_likelihood_params(forest=forest, character=character, model=model, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/miniconda3/envs/pastmlenv/lib/python3.12/site-packages/pastml/ml.py", line 220, in optimize_likelihood_params log_lh_JC = -get_v(x0_JC) ^^^^^^^^^^^^ File "/opt/miniconda3/envs/pastmlenv/lib/python3.12/site-packages/pastml/ml.py", line 207, in get_v res = sum(get_bottom_up_loglikelihood(tree=tree, character=character, is_marginal=True, model=model, alter=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/miniconda3/envs/pastmlenv/lib/python3.12/site-packages/pastml/ml.py", line 207, in res = sum(get_bottom_up_loglikelihood(tree=tree, character=character, is_marginal=True, model=model, alter=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/miniconda3/envs/pastmlenv/lib/python3.12/site-packages/pastml/ml.py", line 111, in get_bottom_up_loglikelihood calc_node_bu_likelihood(node, allowed_state_feature, lh_feature, lh_sf_feature, lh_joint_state_feature, File "/opt/miniconda3/envs/pastmlenv/lib/python3.12/site-packages/pastml/ml.py", line 131, in calc_node_bu_likelihood child_likelihoods = get_pij(child.dist) * getattr(child, lh_feature) ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~ ValueError: operands could not be broadcast together with shapes (10,10) (562,)

I currently use a MacOS terminal in the conda environment whit the comand line pastml --tree AY3.1_test_rooted_raxml --data Database_test_AY3.1.csv --columns SequenceID.1 PHZ --html_compressed AY3.1_map.html --data_sep , -v

Im very new using this program hope you can help me

AlbertoA3 avatar Sep 03 '24 18:09 AlbertoA3

Dear @AlbertoA3 ,

Thank you for reporting this issue. Could you please tell me which PastML version you are using (you can check that with "pastml --version" command)?

annazhukova avatar Sep 04 '24 08:09 annazhukova

Many thanks for your replay

I currently use the pastml 1.9.46 version

AlbertoA3 avatar Sep 04 '24 14:09 AlbertoA3

Dear @AlbertoA3 ,

Is there any chance you could give me an example of data for which this problem occurs (you can send it by email to [email protected] if you prefer)? It is hard for me to reproduce the problem and see its source otrherwise.

Thank you

annazhukova avatar Sep 06 '24 13:09 annazhukova

Dear Anna,

I'm running into the same error on the same line, on a much smaller tree. Also on mac, in a conda environment, but using it in a jupyter notebook.

From pastml.__dict__: 'PASTML_VERSION': '1.9.50'

Data files (tree is *.nwk, had to change suffix for github to accept the upload):

concatenated.characters.csv introits_tree4_divtime.txt

Code is from the README "Basic usage in Python". The problem starts appearing when I run this on more than 2 columns.

from pastml.acr import pastml_pipeline

data = "concatenated.characters.csv"
tree = "introits_tree4_divtime.nwk"

# Columns present in the annotation table,
# for which we want to reconstruct ancestral states.
# Works for start_column = 0,  n_columns < 3.
start_column = 0
n_columns = 2
columns = ['c{}'.format(start_column + i) for i in range(n_columns)]

pastml_pipeline(data=data, data_sep=',', 
                columns=columns, 
                tree=tree,
                html_compressed="acr_introits_tree4.html",
                html= "acr_introits_tree4.html", 
                verbose=True)

Also, I tested this with different configurations of start_column and n_columns, and different errors started appearing. For start_column=115 and n_columns=4, I got the same broadcast error but with parameters (3,3) and 4. For start_column=110 and n_columns=4, I got the error below. Given how inconsistently errors appear with respect to the choice of columns, my suspicion is that maybe the data breaks some assumptions, because it's from a completely different domain (Gregorian chant melodies), we just treat them as morphological characters, but I don't have any specific reasons to suspect this.

pastml:DEBUG:10:35:08 
=============INPUT DATA VALIDATION=============
pastml:DEBUG:10:35:08 Read the trees introits_tree4_divtime.nwk.
pastml:DEBUG:10:35:08 Read the annotation file concatenated.characters.csv.
pastml:DEBUG:10:35:08 Checked that (at least some of) tip names correspond to annotation file index.
pastml:DEBUG:10:35:08 Finished input validation.
pastml:DEBUG:10:35:08 
=============TREE STATISTICS===================
	number of tips:	23
	number of zero-branch tips:	0
	number of internal nodes:	22
	max number of children per node:	2
	max tip branch length:	577.13130
	max internal branch length:	144.64440
	min non-zero tip branch length:	28.93167
	min non-zero internal branch length:	21.62325
	avg non-zero tip branch length:	148.69650
	avg non-zero internal branch length:	62.38971
	avg non-zero branch length:	107.50462.
pastml:DEBUG:10:35:08 
=============ACR===============================
pastml:DEBUG:10:35:08 ACR settings for c110:
	Method:	MPPA
	Model:	F81.
pastml:DEBUG:10:35:08 Observed frequencies for c110:
	frequency of -:	0.913043
	frequency of k:	0.086957.
pastml:DEBUG:10:35:08 ACR settings for c111:
	Method:	MPPA
	Model:	F81.
pastml:DEBUG:10:35:08 Observed frequencies for c111:
	frequency of l:	1.000000.
pastml:DEBUG:10:35:08 ACR settings for c112:
	Method:	MPPA
	Model:	F81.
pastml:DEBUG:10:35:08 Observed frequencies for c112:
	frequency of l:	1.000000.
pastml:DEBUG:10:35:08 ACR settings for c113:
	Method:	MPPA
	Model:	F81.
pastml:DEBUG:10:35:08 Observed frequencies for c113:
	frequency of k:	0.956522
	frequency of l:	0.043478.
pastml:DEBUG:10:35:08 Initial values for c111 parameter optimisation:
	scaling factor:	0.009302, i.e. 1.000000 changes per avg branch	(optimised)
	smoothing factor:	0.000000	(fixed)
	frequencies	(optimised)
		l:	1
	log likelihood:	0.000000.
pastml:DEBUG:10:35:08 Initial values for c113 parameter optimisation:
	scaling factor:	0.009302, i.e. 1.000000 changes per avg branch	(optimised)
	smoothing factor:	0.000000	(fixed)
	frequencies	(optimised)
		k:	0.5
		l:	0.5
	log likelihood:	-15.550011.
pastml:DEBUG:10:35:08 Initial values for c110 parameter optimisation:
	scaling factor:	0.009302, i.e. 1.000000 changes per avg branch	(optimised)
	smoothing factor:	0.000000	(fixed)
	frequencies	(optimised)
		-:	0.5
		k:	0.5
	log likelihood:	-15.483167.
pastml:DEBUG:10:35:08 Initial values for c112 parameter optimisation:
	scaling factor:	0.009302, i.e. 1.000000 changes per avg branch	(optimised)
	smoothing factor:	0.000000	(fixed)
	frequencies	(optimised)
		l:	1
	log likelihood:	0.000000.
pastml:DEBUG:10:35:08 Pre-optimised basic parameters for c113:
	scaling factor:	0.002938, i.e. 0.315812 changes per avg branch	(optimised)
	smoothing factor:	0.000000	(fixed)
	log likelihood:	-4.112543.
pastml:DEBUG:10:35:08 Pre-optimised basic parameters for c110:
	scaling factor:	0.001519, i.e. 0.163275 changes per avg branch	(optimised)
	smoothing factor:	0.000000	(fixed)
	log likelihood:	-7.193597.
pastml:DEBUG:10:35:09 Optimised parameters for c113:
	scaling factor:	0.002938, i.e. 0.315819 changes per avg branch	(optimised)
	smoothing factor:	0.000000	(fixed)
	frequencies	(optimised)
		k:	0.956522
		l:	0.0434783
	log likelihood:	-4.112543
pastml:DEBUG:10:35:09 Optimised parameters for c110:
	scaling factor:	0.001925, i.e. 0.206981 changes per avg branch	(optimised)
	smoothing factor:	0.000000	(fixed)
	frequencies	(optimised)
		-:	0.905005
		k:	0.0949948
	log likelihood:	-6.731932
pastml:DEBUG:10:35:09 Log likelihood for c113 after JOINT state selection:	-4.868564
pastml:DEBUG:10:35:09 Log likelihood for c110 after JOINT state selection:	-7.447170
pastml:DEBUG:10:35:09 Log likelihood for c113 after MAP state selection:	-4.868564
pastml:DEBUG:10:35:09 Log likelihood for c110 after MAP state selection:	-7.447170
pastml:DEBUG:10:35:09 0 nodes are unresolved (0.00%) for c113 by MPPA, i.e. 1.0000 state per node in average.
pastml:DEBUG:10:35:09 0 nodes are unresolved (0.00%) for c110 by MPPA, i.e. 1.0000 state per node in average.
pastml:DEBUG:10:35:09 Log likelihood for c113 after MPPA state selection:	-4.868564
pastml:DEBUG:10:35:09 Log likelihood for c110 after MPPA state selection:	-7.447170

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[32], line 21
     18 # (Optional) path to the output tree visualisation
     19 html = "acr_introits_tree4.html"
---> 21 pastml_pipeline(data=data, data_sep=',', 
     22                 columns=columns, 
     23                 tree=tree,
     24                 html_compressed=html_compressed, 
     25                 html=html, 
     26                 verbose=True)

File ~/anaconda3/envs/cestf/lib/python3.10/site-packages/pastml/acr.py:542, in pastml_pipeline(tree, data, data_sep, id_index, columns, prediction_method, model, parameters, rate_matrix, name_column, root_date, timeline_type, tip_size_threshold, colours, out_data, html_compressed, html, html_mixed, work_dir, verbose, forced_joint, upload_to_itol, itol_id, itol_project, itol_tree_name, offline, threads, reoptimise, focus, resolve_polytomies, smoothing, frequency_smoothing, pajek, pajek_timing, recursion_limit)
    539 if threads < 1:
    540     threads = max(os.cpu_count(), 1)
--> 542 acr_results = acr(forest=roots, columns=columns, column2states=column2states,
    543                   prediction_method=prediction_method, model=model, column2parameters=parameters,
    544                   column2rates=rates,
    545                   force_joint=forced_joint, threads=threads, reoptimise=reoptimise, tau=None if smoothing else 0,
    546                   resolve_polytomies=resolve_polytomies, frequency_smoothing=frequency_smoothing)
    548 column2states = {acr_result[CHARACTER]: acr_result[STATES] for acr_result in acr_results}
    550 if not out_data:

File ~/anaconda3/envs/cestf/lib/python3.10/site-packages/pastml/acr.py:229, in acr(forest, df, columns, column2states, prediction_method, model, column2parameters, column2rates, force_joint, threads, reoptimise, tau, resolve_polytomies, frequency_smoothing)
    226 if threads > 1:
    227     with ThreadPool(processes=threads - 1) as pool:
    228         acr_results = \
--> 229             pool.map(func=_work, iterable=character2settings.keys())
    230 else:
    231     acr_results = [_work(character) for character in character2settings.keys()]

File ~/anaconda3/envs/cestf/lib/python3.10/multiprocessing/pool.py:367, in Pool.map(self, func, iterable, chunksize)
    362 def map(self, func, iterable, chunksize=None):
    363     '''
    364     Apply `func` to each element in `iterable`, collecting the results
    365     in a list that is returned.
    366     '''
--> 367     return self._map_async(func, iterable, mapstar, chunksize).get()

File ~/anaconda3/envs/cestf/lib/python3.10/multiprocessing/pool.py:774, in ApplyResult.get(self, timeout)
    772     return self._value
    773 else:
--> 774     raise self._value

File ~/anaconda3/envs/cestf/lib/python3.10/multiprocessing/pool.py:125, in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception)
    123 job, i, func, args, kwds = task
    124 try:
--> 125     result = (True, func(*args, **kwds))
    126 except Exception as e:
    127     if wrap_exception and func is not _helper_reraises_exception:

File ~/anaconda3/envs/cestf/lib/python3.10/multiprocessing/pool.py:48, in mapstar(args)
     47 def mapstar(args):
---> 48     return list(map(*args))

File ~/anaconda3/envs/cestf/lib/python3.10/site-packages/pastml/acr.py:218, in acr.<locals>._work(character)
    216     return {CHARACTER: character, STATES: model_or_states, METHOD: prediction_method}
    217 if is_ml(prediction_method):
--> 218     return ml_acr(forest=forest, character=character, prediction_method=prediction_method,
    219                   model=model_or_states,
    220                   force_joint=force_joint, observed_frequencies=observed_frequencies)
    221 if is_parsimonious(prediction_method):
    222     return parsimonious_acr(forest=forest, character=character, prediction_method=prediction_method,
    223                             states=model_or_states,
    224                             num_nodes=forest_stats.num_nodes, num_tips=forest_stats.num_tips)

File ~/anaconda3/envs/cestf/lib/python3.10/site-packages/pastml/ml.py:659, in ml_acr(forest, character, prediction_method, model, observed_frequencies, force_joint)
    643 """
    644 Calculates ML states on the trees and stores them in the corresponding feature.
    645 
   (...)
    655 :rtype: dict
    656 """
    657 logger = logging.getLogger('pastml')
    658 likelihood = \
--> 659     optimise_likelihood(forest=forest, character=character, model=model,
    660                         observed_frequencies=observed_frequencies)
    661 result = {LOG_LIKELIHOOD: likelihood, CHARACTER: character, METHOD: prediction_method, MODEL: model,
    662           STATES: model.states}
    664 results = []

File ~/anaconda3/envs/cestf/lib/python3.10/site-packages/pastml/ml.py:895, in optimise_likelihood(forest, character, model, observed_frequencies)
    893 if not model.basic_params_fixed():
    894     model.fix_extra_params()
--> 895     likelihood = optimize_likelihood_params(forest=forest, character=character, model=model,
    896                                             observed_frequencies=observed_frequencies)
    897     if np.any(np.isnan(likelihood) or likelihood == -np.inf):
    898         raise PastMLLikelihoodError('Failed to optimise the likelihood for your tree, '
    899                                     'please check that you do not have contradicting {} states specified '
    900                                     'for internal tree nodes, '
    901                                     'and if not - submit a bug at https://github.com/evolbioinfo/pastml/issues'
    902                                     .format(character))

File ~/anaconda3/envs/cestf/lib/python3.10/site-packages/pastml/ml.py:220, in optimize_likelihood_params(forest, character, observed_frequencies, model)
    218     model.frequencies = observed_frequencies
    219     x0_EFT = model.get_optimised_parameters()
--> 220 log_lh_JC = -get_v(x0_JC)
    221 log_lh_EFT = log_lh_JC if not optimise_frequencies else -get_v(x0_EFT)
    223 best_log_lh = max(log_lh_JC, log_lh_EFT)

File ~/anaconda3/envs/cestf/lib/python3.10/site-packages/pastml/ml.py:207, in optimize_likelihood_params.<locals>.get_v(ps)
    205     return np.nan
    206 model.set_params_from_optimised(ps)
--> 207 res = sum(get_bottom_up_loglikelihood(tree=tree, character=character, is_marginal=True, model=model, alter=True)
    208           for tree in forest)
    209 return np.inf if pd.isnull(res) else -res

File ~/anaconda3/envs/cestf/lib/python3.10/site-packages/pastml/ml.py:207, in <genexpr>(.0)
    205     return np.nan
    206 model.set_params_from_optimised(ps)
--> 207 res = sum(get_bottom_up_loglikelihood(tree=tree, character=character, is_marginal=True, model=model, alter=True)
    208           for tree in forest)
    209 return np.inf if pd.isnull(res) else -res

File ~/anaconda3/envs/cestf/lib/python3.10/site-packages/pastml/ml.py:111, in get_bottom_up_loglikelihood(tree, character, model, is_marginal, alter)
    109 get_pij = model.get_Pij_t
    110 for node in tree.traverse('postorder'):
--> 111     calc_node_bu_likelihood(node, allowed_state_feature, lh_feature, lh_sf_feature, lh_joint_state_feature,
    112                             is_marginal, get_pij)
    113 root_likelihoods = getattr(tree, lh_feature) * model.frequencies
    114 root_likelihoods = root_likelihoods.sum() if is_marginal else root_likelihoods.max()

File ~/anaconda3/envs/cestf/lib/python3.10/site-packages/pastml/ml.py:139, in calc_node_bu_likelihood(node, allowed_state_feature, lh_feature, lh_sf_feature, lh_joint_state_feature, is_marginal, get_pij)
    137     child_likelihoods = child_likelihoods.max(axis=1)
    138 child_likelihoods = np.maximum(child_likelihoods, 0)
--> 139 log_likelihood_array += np.log10(child_likelihoods)
    140 if np.all(log_likelihood_array == -np.inf):
    141     raise PastMLLikelihoodError("The parent node {} and its child node {} have non-intersecting states, "
    142                                 "and are connected by a zero-length ({:g}) branch. "
    143                                 "This creates a zero likelihood value. "
    144                                 "To avoid this issue check the restrictions on these node states "
    145                                 "and/or use a smoothing factor (tau)."
    146                                 .format(node.name, child.name, child.dist))

ValueError: non-broadcastable output operand with shape (1,) doesn't match the broadcast shape (2,)

hajicj avatar Apr 04 '25 08:04 hajicj

Dear hajicj,

Thank you for the detailed problem description. I will have a look at it next week and come back to you.

annazhukova avatar Apr 04 '25 14:04 annazhukova

Any luck on this?

hajicj avatar Apr 22 '25 23:04 hajicj

Dear hajicj,

I am sorry it took me so long. Thank you so much for the detailed example, it was extremely helpful in finding the issue! I found the bug and fixed it in version 1.9.51. There was a problem when several columns were analysed at once: the observed equilibrium frequencies were reused between the columns instead of being independent, if in your previous analyses you used only one column - they should be fine and unaffected by this bug.

Sorry for this issue and the time it took me to fix it.

annazhukova avatar Apr 28 '25 10:04 annazhukova

Dear Anna,

just confirming that the updated version works. Thank you very much!

Jan

hajicj avatar Jul 11 '25 17:07 hajicj