One read multiple report steps
Fixes #121.
This pull request stores properties in the Experiment object, updating it each time the properties file is loaded or a fetcher is executed. This way, the properties file is read only the needed times and it is always updated.
To avoid modifications of the properties data when applying filters, instead of modifying the original object a new dictionary is created with references to the original data. Note that this also saves a lot of time and memory compared to the use of copy.deepcopy.
This pull request saves tons of minutes when running multiple reports over the same properties file, but note:
- The documentation is not updated, I have not done it because I don't sure which files should be modified, but I will do it if you suggest me the changes. Of course, I would be thankful if you do it in the best way.
- This is not a breaking change in the highest level API but it is in the low level API. Concretely:
- The
Experimentobject reference must be set in reports now. This is not a problem in the highest level API because it is done in theadd_reportfunction, but when usingadd_stepit must be done manually. - The
Fetcherconstructor now receives aExperimentobject as argument. Again,add_fetcherdoes it for the user.
- The
In addition, this pull request adds a new option to run only reports and more flexibility for the steps parameter, that now admits intervals separated by commas, like: 2-5,6,8-10.
I hope you find this pull request useful despite #121 was closed. I think this pull request is simple enough and added value to Downward Lab :smile: .
Thanks for the pull request! Good to see that you've found your way through the Lab source code.
Regarding the three proposed features:
- Step ranges: this can already be achieved via shell expansions, such as
./myexp.py 1 2 {3..6}, so I wouldn't want to duplicate the functionality inside Lab. We should, however, document this shell feature. - Flag for running only reports: I think the previous feature is more versatile and general.
- Caching properties in memory between reports: the proposed change makes the code more entangled and is tricky to get right (as you can see from the failing tests). Before exploring this solution further, I'd like to understand the problem better. Could you please make a timing profile of an execution that takes too long in your opinion, so that we can see which parts of the code could benefit from optimization (for example with cProfile)? And can you maybe paste your experiment script and the invocation here or send it to me via email?
OK, the additional arguments options were only light helpers for users but shell expansions seem well enough. They are in another commit, so not including those changes is easy.
Respect the failing tests, I surely missed something, since I only focused on fetchers and reports. However, with a bit more refactoring the Experiment class could be the responsible of loading and updating data and that would make the solution more robust.
My use case is simply running several reports over the same data (absolute report, coverage tables, scatter plots, etc.). I think this use case is not rare and other people could benefit of this performance improvement if this feature is merged.
Respect the timing differences, they are so large that cProfile is not needed, a quick look at the log is enough. Below are the cut logs (removing intermediate not useful lines) of executing the following reports script with few reports (sometimes scatter plots are useful in several attributes and comparing among several algorithms for example) in a not really big dataset (30 + 10 algorithms over Autoscale optimal benchmark with only basic attributes), so the timing differences can be much higher.
Reports script:
import platform
from downward.experiment import FastDownwardExperiment
from downward.reports.absolute import AbsoluteReport
from per_domain_comparison import PerDomainComparison
from dominance_comparison import DominanceComparison
NODE = platform.node()
ATTRIBUTES = [
"coverage",
"error",
"expansions",
"expansions_until_last_jump",
"total_time",
"planner_time",
"search_time",
"cegar_abstractions_init_time",
"initial_h_value",
"cost",
"evaluations",
"memory",
"run_dir",
"found_concrete_solution",
"found_concrete_solution_sum",
]
exp_name = "single_bysplit_comparison"
exp = FastDownwardExperiment(exp_name)
exp.add_fetcher("data/single_bysplit_experiments_only_parse")
exp.add_fetcher("data/single_position_experiments_only_parse")
# Add report step (AbsoluteReport is the standard report).
exp.add_report(AbsoluteReport(attributes=ATTRIBUTES), outfile="report.html")
all_variants = [
"sequence_10M_fw(min_cg)",
"sequence_10M_fw(max_cg)",
"sequence_10M_fw(max_refined)",
"sequence_10M_fw(min_refined)",
"sequence_10M_fw(min_hadd)",
"sequence_10M_fw(max_hadd)",
"sequence_10M_fw(min_cost)",
"sequence_10M_fw(max_cost)",
"sequence_10M_fw(max_ref_goal_dist)",
"sequence_10M_bw(min_cg)",
"sequence_10M_bw(max_cg)",
"sequence_10M_bw(max_refined)",
"sequence_10M_bw(min_refined)",
"sequence_10M_bw(min_hadd)",
"sequence_10M_bw(max_hadd)",
"sequence_10M_bw(min_cost)",
"sequence_10M_bw(max_cost)",
"sequence_10M_bw(max_ref_goal_dist)",
"sequence_10M_bd(min_cg)",
"sequence_10M_bd(max_cg)",
"sequence_10M_bd(max_refined)",
"sequence_10M_bd(min_refined)",
"sequence_10M_bd(min_hadd)",
"sequence_10M_bd(max_hadd)",
"sequence_10M_bd(min_cost)",
"sequence_10M_bd(max_cost)",
"sequence_10M_bd(max_ref_goal_dist)",
]
fw_variants = [
"sequence_10M_fw(min_cg)",
"sequence_10M_fw(max_cg)",
"sequence_10M_fw(max_refined)",
"sequence_10M_fw(min_refined)",
"sequence_10M_fw(min_hadd)",
"sequence_10M_fw(max_hadd)",
"sequence_10M_fw(min_cost)",
"sequence_10M_fw(max_cost)",
"sequence_10M_fw(max_ref_goal_dist)",
]
bw_variants = [
"sequence_10M_bw(min_cg)",
"sequence_10M_bw(max_cg)",
"sequence_10M_bw(max_refined)",
"sequence_10M_bw(min_refined)",
"sequence_10M_bw(min_hadd)",
"sequence_10M_bw(max_hadd)",
"sequence_10M_bw(min_cost)",
"sequence_10M_bw(max_cost)",
"sequence_10M_bw(max_ref_goal_dist)",
]
bd_variants = [
"sequence_10M_bd(min_cg)",
"sequence_10M_bd(max_cg)",
"sequence_10M_bd(max_refined)",
"sequence_10M_bd(min_refined)",
"sequence_10M_bd(min_hadd)",
"sequence_10M_bd(max_hadd)",
"sequence_10M_bd(min_cost)",
"sequence_10M_bd(max_cost)",
"sequence_10M_bd(max_ref_goal_dist)",
]
exp.add_report(
PerDomainComparison(
output_file=f"{exp_name}-eval/perdomain_comparison_all.txt",
attributes=["coverage"],
filter_algorithm=all_variants,
),
name="perdomain_comparison_all",
)
exp.add_report(
PerDomainComparison(
output_file=f"{exp_name}-eval/perdomain_comparison_fw.txt",
attributes=["coverage"],
filter_algorithm=fw_variants,
),
name="perdomain_comparison_fw",
)
exp.add_report(
PerDomainComparison(
output_file=f"{exp_name}-eval/perdomain_comparison_bw.txt",
attributes=["coverage"],
filter_algorithm=bw_variants,
),
name="perdomain_comparison_bw",
)
exp.add_report(
PerDomainComparison(
output_file=f"{exp_name}-eval/perdomain_comparison_bd.txt",
attributes=["coverage"],
filter_algorithm=bd_variants,
),
name="perdomain_comparison_bd",
)
exp.add_report(
PerDomainComparison(
output_file=f"{exp_name}-eval/perdomain_comparison_found_concrete_solution_all.txt",
attributes=["found_concrete_solution_sum"],
filter_algorithm=all_variants,
),
name="perdomain_comparison_found_concrete_solution_all",
)
exp.add_report(
PerDomainComparison(
output_file=f"{exp_name}-eval/perdomain_comparison_found_concrete_solution_fw.txt",
attributes=["found_concrete_solution_sum"],
filter_algorithm=fw_variants,
),
name="perdomain_comparison_found_concrete_solution_fw",
)
exp.add_report(
PerDomainComparison(
output_file=f"{exp_name}-eval/perdomain_comparison_found_concrete_solution_bw.txt",
attributes=["found_concrete_solution_sum"],
filter_algorithm=bw_variants,
),
name="perdomain_comparison_found_concrete_solution_bw",
)
exp.add_report(
PerDomainComparison(
output_file=f"{exp_name}-eval/perdomain_comparison_found_concrete_solution_bd.txt",
attributes=["found_concrete_solution_sum"],
filter_algorithm=bd_variants,
),
name="perdomain_comparison_found_concrete_solution_bd",
)
fw_bysplit_variants = [
"CEGAR_10M_fw",
"sequence_10M_fw(max_refined)",
"sequence_10M_fw(min_refined)",
"sequence_10M_fw(max_cost)",
"sequence_10M_fw(min_cost)",
"sequence_10M_fw(max_hadd)",
"sequence_10M_fw(min_hadd)",
]
bw_bysplit_variants = [
"CEGAR_10M_bw",
"sequence_10M_bw(max_refined)",
"sequence_10M_bw(min_refined)",
"sequence_10M_bw(max_cost)",
"sequence_10M_bw(min_cost)",
"sequence_10M_bw(max_hadd)",
"sequence_10M_bw(min_hadd)",
]
bd_bysplit_variants = [
"CEGAR_10M_bd",
"sequence_10M_bd(max_refined)",
"sequence_10M_bd(min_refined)",
"sequence_10M_bd(max_cost)",
"sequence_10M_bd(min_cost)",
"sequence_10M_bd(max_hadd)",
"sequence_10M_bd(min_hadd)",
]
exp.add_report(
PerDomainComparison(
sorted_attributes=["found_concrete_solution_sum", "coverage"],
output_file=f"{exp_name}-eval/perdomain_comparison_table_fw.txt",
filter_algorithm=fw_bysplit_variants,
),
name="perdomain_comparison_table_fw",
)
exp.add_report(
PerDomainComparison(
sorted_attributes=["found_concrete_solution_sum", "coverage"],
output_file=f"{exp_name}-eval/perdomain_comparison_table_bw.txt",
filter_algorithm=bw_bysplit_variants,
),
name="perdomain_comparison_table_bw",
)
exp.add_report(
PerDomainComparison(
sorted_attributes=["found_concrete_solution_sum", "coverage"],
output_file=f"{exp_name}-eval/perdomain_comparison_table_bd.txt",
filter_algorithm=bd_bysplit_variants,
),
name="perdomain_comparison_table_bd",
)
bysplit_variants = fw_bysplit_variants + bw_bysplit_variants + bd_bysplit_variants
grouped_algorithms = (fw_bysplit_variants, bw_bysplit_variants, bd_bysplit_variants)
namecolumn = [
r"\strategydefault",
r"\strategyrefined",
r"\strategyminrefined",
r"\strategycost",
r"\strategymincost",
r"\strategyhadd",
r"\strategyminhadd",
]
exp.add_report(
DominanceComparison(
grouped_algorithms=grouped_algorithms,
namecolumn=namecolumn,
sorted_attributes=["found_concrete_solution_sum", "coverage"],
output_file=f"{exp_name}-eval/dominance_comparison.txt",
filter_algorithm=bysplit_variants,
)
)
# Parse the commandline and show or run experiment steps.
exp.run_steps()
Execution log using Downward Lab 8.2 (total time: 23 minutes and 34 seconds):
2024-07-23 10:07:00,406 INFO Running step fetch-single_bysplit_experiments_only_parse: fetcher('data/single_bysplit_experiments_only_parse', 'reports/single_bysplit_comparison-eval', filter=None, merge=None)
2024-07-23 10:07:00,406 INFO Fetching properties from data/single_bysplit_experiments_only_parse to single_bysplit_comparison-eval
2024-07-23 10:07:00,722 INFO Collecting properties from 34020 run directories
2024-07-23 10:07:00,854 INFO Collected 100/34020 properties files
[...]
2024-07-23 10:08:20,539 INFO Collected 34000/34020 properties files
2024-07-23 10:10:55,659 WARNING Wrote properties file. It contains 1549 runs with unexplained errors.
2024-07-23 10:10:57,922 INFO Running step fetch-single_position_experiments_only_parse: fetcher('data/single_position_experiments_only_parse', 'reports/single_bysplit_comparison-eval', filter=None, merge=None)
2024-07-23 10:10:57,930 INFO Fetching properties from data/single_position_experiments_only_parse to single_bysplit_comparison-eval
single_bysplit_comparison-eval already exists. Do you want to (o)verwrite it, (m)erge the results, or (c)ancel? m
2024-07-23 10:11:53,555 INFO Collecting properties from 12600 run directories
2024-07-23 10:11:53,673 INFO Collected 100/12600 properties files
[...]
2024-07-23 10:12:22,913 INFO Collected 12600/12600 properties files
2024-07-23 10:15:59,620 WARNING Wrote properties file. It contains 2142 runs with unexplained errors.
2024-07-23 10:16:02,557 INFO Running step report.html: absolutereport('reports/single_bysplit_comparison-eval', 'reports/single_bysplit_comparison-eval/report.html')
2024-07-23 10:16:02,557 INFO Reading properties file
2024-07-23 10:17:01,333 INFO Reading properties file finished
2024-07-23 10:17:07,772 WARNING Report contains 2142 runs with unexplained errors.
2024-07-23 10:17:08,720 INFO Unique unexplained errors: 1987
2024-07-23 10:17:08,763 INFO Creating table(s) for cegar_abstractions_init_time
2024-07-23 10:17:09,161 INFO Creating table(s) for cost
2024-07-23 10:17:09,398 INFO Creating table(s) for coverage
2024-07-23 10:17:09,654 INFO Creating table(s) for error
2024-07-23 10:17:09,894 INFO Creating table(s) for evaluations
2024-07-23 10:17:10,155 INFO Creating table(s) for expansions
2024-07-23 10:17:10,427 INFO Creating table(s) for expansions_until_last_jump
2024-07-23 10:17:10,675 INFO Creating table(s) for found_concrete_solution
2024-07-23 10:17:10,829 INFO Creating table(s) for found_concrete_solution_sum
2024-07-23 10:17:11,087 INFO Creating table(s) for initial_h_value
2024-07-23 10:17:11,426 INFO Creating table(s) for memory
2024-07-23 10:17:11,680 INFO Creating table(s) for planner_time
2024-07-23 10:17:11,962 INFO Creating table(s) for run_dir
2024-07-23 10:17:12,106 INFO Creating table(s) for search_time
2024-07-23 10:17:12,410 INFO Creating table(s) for total_time
2024-07-23 10:17:47,733 INFO Wrote file:///reports/single_bysplit_comparison-eval/report.html
2024-07-23 10:17:50,717 INFO Running step perdomain_comparison_all: perdomaincomparison('reports/single_bysplit_comparison-eval', 'reports/single_bysplit_comparison-eval/perdomain_comparison_all.html')
2024-07-23 10:17:50,717 INFO Reading properties file
2024-07-23 10:18:50,063 INFO Reading properties file finished
2024-07-23 10:18:55,578 WARNING Report contains 1549 runs with unexplained errors.
2024-07-23 10:18:56,335 INFO Wrote file:///reports/single_bysplit_comparison-eval/perdomain_comparison_all.html
2024-07-23 10:18:58,508 INFO Running step perdomain_comparison_fw: perdomaincomparison('reports/single_bysplit_comparison-eval', 'reports/single_bysplit_comparison-eval/perdomain_comparison_fw.html')
2024-07-23 10:18:58,508 INFO Reading properties file
2024-07-23 10:19:58,216 INFO Reading properties file finished
2024-07-23 10:20:01,344 WARNING Report contains 567 runs with unexplained errors.
2024-07-23 10:20:01,610 INFO Wrote file:///reports/single_bysplit_comparison-eval/perdomain_comparison_fw.html
2024-07-23 10:20:02,340 INFO Running step perdomain_comparison_bw: perdomaincomparison('reports/single_bysplit_comparison-eval', 'reports/single_bysplit_comparison-eval/perdomain_comparison_bw.html')
2024-07-23 10:20:02,340 INFO Reading properties file
2024-07-23 10:21:01,757 INFO Reading properties file finished
2024-07-23 10:21:04,925 WARNING Report contains 533 runs with unexplained errors.
2024-07-23 10:21:05,191 INFO Wrote file:///reports/single_bysplit_comparison-eval/perdomain_comparison_bw.html
2024-07-23 10:21:05,925 INFO Running step perdomain_comparison_bd: perdomaincomparison('reports/single_bysplit_comparison-eval', 'reports/single_bysplit_comparison-eval/perdomain_comparison_bd.html')
2024-07-23 10:21:05,926 INFO Reading properties file
2024-07-23 10:22:06,108 INFO Reading properties file finished
2024-07-23 10:22:08,559 WARNING Report contains 449 runs with unexplained errors.
2024-07-23 10:22:08,813 INFO Wrote file:///reports/single_bysplit_comparison-eval/perdomain_comparison_bd.html
2024-07-23 10:22:09,522 INFO Running step perdomain_comparison_found_concrete_solution_all: perdomaincomparison('reports/single_bysplit_comparison-eval', 'reports/single_bysplit_comparison-eval/perdomain_comparison_found_concrete_solution_all.html')
2024-07-23 10:22:09,522 INFO Reading properties file
2024-07-23 10:23:06,926 INFO Reading properties file finished
2024-07-23 10:23:12,421 WARNING Report contains 1549 runs with unexplained errors.
2024-07-23 10:23:13,158 INFO Wrote file:///reports/single_bysplit_comparison-eval/perdomain_comparison_found_concrete_solution_all.html
2024-07-23 10:23:15,291 INFO Running step perdomain_comparison_found_concrete_solution_fw: perdomaincomparison('reports/single_bysplit_comparison-eval', 'reports/single_bysplit_comparison-eval/perdomain_comparison_found_concrete_solution_fw.html')
2024-07-23 10:23:15,291 INFO Reading properties file
2024-07-23 10:24:14,419 INFO Reading properties file finished
2024-07-23 10:24:17,579 WARNING Report contains 567 runs with unexplained errors.
2024-07-23 10:24:17,845 INFO Wrote file:///reports/single_bysplit_comparison-eval/perdomain_comparison_found_concrete_solution_fw.html
2024-07-23 10:24:18,597 INFO Running step perdomain_comparison_found_concrete_solution_bw: perdomaincomparison('reports/single_bysplit_comparison-eval', 'reports/single_bysplit_comparison-eval/perdomain_comparison_found_concrete_solution_bw.html')
2024-07-23 10:24:18,598 INFO Reading properties file
2024-07-23 10:25:18,003 INFO Reading properties file finished
2024-07-23 10:25:21,221 WARNING Report contains 533 runs with unexplained errors.
2024-07-23 10:25:21,480 INFO Wrote file:///reports/single_bysplit_comparison-eval/perdomain_comparison_found_concrete_solution_bw.html
2024-07-23 10:25:22,173 INFO Running step perdomain_comparison_found_concrete_solution_bd: perdomaincomparison('reports/single_bysplit_comparison-eval', 'reports/single_bysplit_comparison-eval/perdomain_comparison_found_concrete_solution_bd.html')
2024-07-23 10:25:22,173 INFO Reading properties file
2024-07-23 10:26:19,960 INFO Reading properties file finished
2024-07-23 10:26:23,156 WARNING Report contains 449 runs with unexplained errors.
2024-07-23 10:26:23,423 INFO Wrote file:///reports/single_bysplit_comparison-eval/perdomain_comparison_found_concrete_solution_bd.html
2024-07-23 10:26:24,135 INFO Running step perdomain_comparison_table_fw: perdomaincomparison('reports/single_bysplit_comparison-eval', 'reports/single_bysplit_comparison-eval/perdomain_comparison_table_fw.html')
2024-07-23 10:26:24,135 INFO Reading properties file
2024-07-23 10:27:21,428 INFO Reading properties file finished
2024-07-23 10:27:24,639 WARNING Report contains 482 runs with unexplained errors.
2024-07-23 10:27:24,861 INFO Wrote file:///reports/single_bysplit_comparison-eval/perdomain_comparison_table_fw.html
2024-07-23 10:27:25,446 INFO Running step perdomain_comparison_table_bw: perdomaincomparison('reports/single_bysplit_comparison-eval', 'reports/single_bysplit_comparison-eval/perdomain_comparison_table_bw.html')
2024-07-23 10:27:25,447 INFO Reading properties file
2024-07-23 10:28:22,375 INFO Reading properties file finished
2024-07-23 10:28:25,495 WARNING Report contains 417 runs with unexplained errors.
2024-07-23 10:28:25,714 INFO Wrote file:///reports/single_bysplit_comparison-eval/perdomain_comparison_table_bw.html
2024-07-23 10:28:26,249 INFO Running step perdomain_comparison_table_bd: perdomaincomparison('reports/single_bysplit_comparison-eval', 'reports/single_bysplit_comparison-eval/perdomain_comparison_table_bd.html')
2024-07-23 10:28:26,249 INFO Reading properties file
2024-07-23 10:29:25,081 INFO Reading properties file finished
2024-07-23 10:29:28,286 WARNING Report contains 401 runs with unexplained errors.
2024-07-23 10:29:29,304 INFO Wrote file:///reports/single_bysplit_comparison-eval/perdomain_comparison_table_bd.html
2024-07-23 10:29:29,886 INFO Running step dominancecomparison: dominancecomparison('reports/single_bysplit_comparison-eval', 'reports/single_bysplit_comparison-eval/dominancecomparison.html')
2024-07-23 10:29:29,886 INFO Reading properties file
2024-07-23 10:30:27,852 INFO Reading properties file finished
2024-07-23 10:30:31,094 WARNING Report contains 1300 runs with unexplained errors.
2024-07-23 10:30:31,746 INFO Wrote file:///reports/single_bysplit_comparison-eval/dominancecomparison.html
Execution log using this pull request (total time: 11 minutes and 6 seconds):
2024-07-23 09:45:10,617 INFO Running step fetch-single_bysplit_experiments_only_parse: fetcher('data/single_bysplit_experiments_only_parse', 'reports/single_bysplit_comparison-eval', filter=None, merge=None)
2024-07-23 09:45:10,617 INFO Fetching properties from data/single_bysplit_experiments_only_parse to single_bysplit_comparison-eval
2024-07-23 09:45:10,924 INFO Collecting properties from 34020 run directories
2024-07-23 09:45:11,062 INFO Collected 100/34020 properties files
[...]
2024-07-23 09:46:29,601 INFO Collected 34000/34020 properties files
2024-07-23 09:49:10,676 WARNING Wrote properties file. It contains 1549 runs with unexplained errors.
2024-07-23 09:49:10,692 INFO Running step fetch-single_position_experiments_only_parse: fetcher('data/single_position_experiments_only_parse', 'reports/single_bysplit_comparison-eval', filter=None, merge=None)
2024-07-23 09:49:10,700 INFO Fetching properties from data/single_position_experiments_only_parse to single_bysplit_comparison-eval
single_bysplit_comparison-eval already exists. Do you want to (o)verwrite it, (m)erge the results, or (c)ancel? m
2024-07-23 09:50:46,672 INFO Collecting properties from 12600 run directories
2024-07-23 09:50:46,794 INFO Collected 100/12600 properties files
[...]
2024-07-23 09:51:15,444 INFO Collected 12600/12600 properties files
2024-07-23 09:55:19,407 WARNING Wrote properties file. It contains 2142 runs with unexplained errors.
2024-07-23 09:55:19,424 INFO Running step report.html: absolutereport('reports/single_bysplit_comparison-eval', 'reports/single_bysplit_comparison-eval/report.html')
2024-07-23 09:55:19,580 WARNING Report contains 2142 runs with unexplained errors.
2024-07-23 09:55:20,670 INFO Unique unexplained errors: 1987
2024-07-23 09:55:20,721 INFO Creating table(s) for cegar_abstractions_init_time
2024-07-23 09:55:21,135 INFO Creating table(s) for cost
2024-07-23 09:55:21,387 INFO Creating table(s) for coverage
2024-07-23 09:55:21,652 INFO Creating table(s) for error
2024-07-23 09:55:21,900 INFO Creating table(s) for evaluations
2024-07-23 09:55:22,177 INFO Creating table(s) for expansions
2024-07-23 09:55:22,445 INFO Creating table(s) for expansions_until_last_jump
2024-07-23 09:55:22,707 INFO Creating table(s) for found_concrete_solution
2024-07-23 09:55:22,870 INFO Creating table(s) for found_concrete_solution_sum
2024-07-23 09:55:23,143 INFO Creating table(s) for initial_h_value
2024-07-23 09:55:23,492 INFO Creating table(s) for memory
2024-07-23 09:55:23,755 INFO Creating table(s) for planner_time
2024-07-23 09:55:24,029 INFO Creating table(s) for run_dir
2024-07-23 09:55:24,171 INFO Creating table(s) for search_time
2024-07-23 09:55:24,444 INFO Creating table(s) for total_time
2024-07-23 09:55:59,853 INFO Wrote file:///reports/single_bysplit_comparison-eval/report.html
2024-07-23 09:55:59,864 INFO Running step perdomain_comparison_all: perdomaincomparison('reports/single_bysplit_comparison-eval', 'reports/single_bysplit_comparison-eval/perdomain_comparison_all.html')
2024-07-23 09:56:00,068 WARNING Report contains 1549 runs with unexplained errors.
2024-07-23 09:56:00,805 INFO Wrote file:///reports/single_bysplit_comparison-eval/perdomain_comparison_all.html
2024-07-23 09:56:00,815 INFO Running step perdomain_comparison_fw: perdomaincomparison('reports/single_bysplit_comparison-eval', 'reports/single_bysplit_comparison-eval/perdomain_comparison_fw.html')
2024-07-23 09:56:00,927 WARNING Report contains 567 runs with unexplained errors.
2024-07-23 09:56:01,179 INFO Wrote file:///reports/single_bysplit_comparison-eval/perdomain_comparison_fw.html
2024-07-23 09:56:01,182 INFO Running step perdomain_comparison_bw: perdomaincomparison('reports/single_bysplit_comparison-eval', 'reports/single_bysplit_comparison-eval/perdomain_comparison_bw.html')
2024-07-23 09:56:01,295 WARNING Report contains 533 runs with unexplained errors.
2024-07-23 09:56:01,545 INFO Wrote file:///reports/single_bysplit_comparison-eval/perdomain_comparison_bw.html
2024-07-23 09:56:01,547 INFO Running step perdomain_comparison_bd: perdomaincomparison('reports/single_bysplit_comparison-eval', 'reports/single_bysplit_comparison-eval/perdomain_comparison_bd.html')
2024-07-23 09:56:01,656 WARNING Report contains 449 runs with unexplained errors.
2024-07-23 09:56:01,896 INFO Wrote file:///reports/single_bysplit_comparison-eval/perdomain_comparison_bd.html
2024-07-23 09:56:01,899 INFO Running step perdomain_comparison_found_concrete_solution_all: perdomaincomparison('reports/single_bysplit_comparison-eval', 'reports/single_bysplit_comparison-eval/perdomain_comparison_found_concrete_solution_all.html')
2024-07-23 09:56:02,099 WARNING Report contains 1549 runs with unexplained errors.
2024-07-23 09:56:02,835 INFO Wrote file:///reports/single_bysplit_comparison-eval/perdomain_comparison_found_concrete_solution_all.html
2024-07-23 09:56:02,845 INFO Running step perdomain_comparison_found_concrete_solution_fw: perdomaincomparison('reports/single_bysplit_comparison-eval', 'reports/single_bysplit_comparison-eval/perdomain_comparison_found_concrete_solution_fw.html')
2024-07-23 09:56:02,959 WARNING Report contains 567 runs with unexplained errors.
2024-07-23 09:56:03,206 INFO Wrote file:///reports/single_bysplit_comparison-eval/perdomain_comparison_found_concrete_solution_fw.html
2024-07-23 09:56:03,209 INFO Running step perdomain_comparison_found_concrete_solution_bw: perdomaincomparison('reports/single_bysplit_comparison-eval', 'reports/single_bysplit_comparison-eval/perdomain_comparison_found_concrete_solution_bw.html')
2024-07-23 09:56:03,322 WARNING Report contains 533 runs with unexplained errors.
2024-07-23 09:56:03,574 INFO Wrote file:///reports/single_bysplit_comparison-eval/perdomain_comparison_found_concrete_solution_bw.html
2024-07-23 09:56:03,577 INFO Running step perdomain_comparison_found_concrete_solution_bd: perdomaincomparison('reports/single_bysplit_comparison-eval', 'reports/single_bysplit_comparison-eval/perdomain_comparison_found_concrete_solution_bd.html')
2024-07-23 09:56:03,692 WARNING Report contains 449 runs with unexplained errors.
2024-07-23 09:56:03,934 INFO Wrote file:///reports/single_bysplit_comparison-eval/perdomain_comparison_found_concrete_solution_bd.html
2024-07-23 09:56:03,937 INFO Running step perdomain_comparison_table_fw: perdomaincomparison('reports/single_bysplit_comparison-eval', 'reports/single_bysplit_comparison-eval/perdomain_comparison_table_fw.html')
2024-07-23 09:56:04,038 WARNING Report contains 482 runs with unexplained errors.
2024-07-23 09:56:04,261 INFO Wrote file:///reports/single_bysplit_comparison-eval/perdomain_comparison_table_fw.html
2024-07-23 09:56:04,264 INFO Running step perdomain_comparison_table_bw: perdomaincomparison('reports/single_bysplit_comparison-eval', 'reports/single_bysplit_comparison-eval/perdomain_comparison_table_bw.html')
2024-07-23 09:56:04,372 WARNING Report contains 417 runs with unexplained errors.
2024-07-23 09:56:04,592 INFO Wrote file:///reports/single_bysplit_comparison-eval/perdomain_comparison_table_bw.html
2024-07-23 09:56:04,594 INFO Running step perdomain_comparison_table_bd: perdomaincomparison('reports/single_bysplit_comparison-eval', 'reports/single_bysplit_comparison-eval/perdomain_comparison_table_bd.html')
2024-07-23 09:56:04,699 WARNING Report contains 401 runs with unexplained errors.
2024-07-23 09:56:04,930 INFO Wrote file:///reports/single_bysplit_comparison-eval/perdomain_comparison_table_bd.html
2024-07-23 09:56:04,932 INFO Running step dominancecomparison: dominancecomparison('reports/single_bysplit_comparison-eval', 'reports/single_bysplit_comparison-eval/dominancecomparison.html')
2024-07-23 09:56:05,122 WARNING Report contains 1300 runs with unexplained errors.
2024-07-23 09:56:05,780 INFO Wrote file:///reports/single_bysplit_comparison-eval/dominancecomparison.html
Note also that these executions contain the fetcher steps, the timing differences would be even higher without them (1 minute against 13 minutes). Basically the savings in this example are 1 minute per report, and other use cases could have dozens of reports with savings of even several hours.
Thanks!
Thanks for the logs! I agree that it's obvious that loading the JSON files is the bottleneck on your machine: it always takes 1 minute to load ~46K runs. I just measured the same thing for a similar experiment on my machine and loading ~17K runs (90 MB) takes 0.5 seconds. So either your machine is much slower, or the simplejson package is missing in your venv, or your runs just have many more attributes or some attributes that have lots of data. My guess is that it's the latter. Probably you've parsed an output that's written for every evaluated state or similar. Could that be the case? How big is the resulting properties file?
If the file simply has lots of data, I recommend preprocessing it so that it only contains the data you're interested in.
Hi. I have simplejson installed and your guess is correct, I'm not currently using the attributes but the properties file size is 5 GB because it contains some heavy attributes that were parsed because I could need them in other reports :sweat: .
Anyway, I think reading the properties file in each run is not very efficient and following the approach of loading and updating the data in the Experiment object could be a valuable improvement for the next major release of Downward Lab :smile: . The only concern is filters cannot modify the Experiment's data, but producing another filtered dictionary with references to the original object as I'm doing here.
Another complementary improvement would be using https://github.com/ijl/orjson instead of simplejson (around 20x improvement).
So this pull request can be closed but I hope maintainers give an opportunity to the idea.
I had a look at orjson as well and it would be nice to see some time measurements for reading and writing Downward Lab JSON files with orjson vs. simplejson. If it's really that much faster for our use case, we should switch :-)
OK, I have fixed the issue, the problem was just that I considered only one properties file for all experiments, but several evaluation directories with a different properties file may exist. So, it is solved using a dictionary with eval dirs as keys for the Experiment's props attribute.