BIRDMAn
BIRDMAn copied to clipboard
More informative failure messages
Below is an example of one of these error messages
Traceback (most recent call last):
File "/home/centos/birdman/run_single_feature.py", line 110, in <module>
model.fit_model(
File "/home/centos/miniconda3/envs/birdman/lib/python3.11/site-packages/birdman/model_base.py", line 173, in fit_model
self.fit = self.sm.sample(
^^^^^^^^^^^^^^^
File "/home/centos/miniconda3/envs/birdman/lib/python3.11/site-packages/cmdstanpy/model.py", line 1201, in sample
raise RuntimeError(msg)
RuntimeError: Error during sampling:
Exception: mismatch in dimension declared and found in context; processing stage=data initialization; variable name=x_TP; position=0; dims declared=(131); dims found=(149) (in '/home/centos/birdman/model.stan', line 10, column 2 to column 38)
Exception: mismatch in dimension declared and found in context; processing stage=data initialization; variable name=x_TP; position=0; dims declared=(131); dims found=(149) (in '/home/centos/birdman/model.stan', line 10, column 2 to column 38)
Exception: mismatch in dimension declared and found in context; processing stage=data initialization; variable name=x_TP; position=0; dims declared=(131); dims found=(149) (in '/home/centos/birdman/model.stan', line 10, column 2 to column 38)
Exception: mismatch in dimension declared and found in context; processing stage=data initialization; variable name=x_TP; position=0; dims declared=(131); dims found=(149) (in '/home/centos/birdman/model.stan', line 10, column 2 to column 38)
Command and output files:
RunSet: chains=4, chain_ids=[1, 2, 3, 4], num_processes=4
cmd (chain 1):
['/home/centos/birdman/model', 'id=1', 'random', 'seed=0', 'data', 'file=/tmp/tmp199y9rwi/z_66ve5j.json', 'output', 'file=/tmp/tmp199y9rwi/modeln3c79wls/model-20230925191211_1.csv', 'method=sample', 'num_samples=500', 'num_warmup=500', 'algorithm=hmc', 'adapt', 'engaged=1']
retcodes=[1, 1, 1, 1]
per-chain output files (showing chain 1 only):
csv_file:
/tmp/tmp199y9rwi/modeln3c79wls/model-20230925191211_1.csv
console_msgs (if any):
/tmp/tmp199y9rwi/modeln3c79wls/model-20230925191211_0-stdout.txt
From the error message, it looks like there is a dimension mismatch. After deeper investigation, it looks like a dimension mismatch between the sampleids in the biom table and the metadata.
Given that this is a common use-case, we could probably include a validation step in the ABC of FeatureModel to check to make sure that the biom table and the metadata are properly synced, and give a more informative error message if they aren't. Something as following may suffice
common_ids = list(set(metadata.index) & set(table.ids()))
metadata = metadata.loc[common_ids]
table.filter(metadata.index, inplace=True)
if len(metadata) == 0 or len(table.ids()) == 0:
raise ValueError('Biom Table sample ids and sample metadata ids are not overlapping')
Yes, I think that makes sense.