ORLM icon indicating copy to clipboard operation
ORLM copied to clipboard

Issues with IndustryOR Benchmark

Open AuroraLHL opened this issue 1 year ago • 1 comments

Hello! I've noticed several issues with the IndustryOR benchmark you released. For example, in some instances, the parameters are not provided. How are the optimal values determined without actual parameters?

Additionally, there are instances with incorrect solutions and unclear problem statements.

Could you please clarify how you collected this benchmark? Was there no manual verification before using it as a benchmark?

AuroraLHL avatar Oct 11 '24 01:10 AuroraLHL

Hi there! Thank you so much for pointing out these issues. We truly appreciate your feedback! We've recognized some problems in this benchmark and are currently working on a thorough review. A new version will be released soon.

The benchmark stems primarily from three sources: part of the content comes from textbook exercises, another part from well-known mathematical modeling competitions, and the rest from real-world operations research challenges faced by Cardinal Operations. We've made modifications to these problems to protect client privacy and ensure they fit within the window length limits of large language models. Additionally, many of the original problems and datasets were in Chinese, and we used AI translation to make them accessible to a broader scholar. This translation step may have contributed to the issues as well.

Once again, thank you for your valuable attention and feedback! We're committed to refining the English version to improve its accuracy and will release the updated version soon. Stay tuned!

CyrilHuangZ avatar Oct 15 '24 07:10 CyrilHuangZ

Thank you for reaching out and for your patience.

We appreciate your feedback regarding the IndustryOR benchmark dataset. We have recently updated the dataset and corrected the issues you pointed out. The labels are now computed with improved accuracy and consistency. You can check it here https://huggingface.co/datasets/CardinalOperations/IndustryOR.

If you have any further concerns or find any additional discrepancies, please do not hesitate to let us know. Your input is invaluable in ensuring the quality of our dataset.

CyrilHuangZ avatar Apr 03 '25 03:04 CyrilHuangZ