opencompass icon indicating copy to clipboard operation
opencompass copied to clipboard

[Bug] DeepSeek R1 32B 模型 测评 AIME2024 数据集 得分低

Open carllisicau opened this issue 9 months ago • 19 comments

先决条件

  • [x] 我已经搜索过 问题讨论 但未得到预期的帮助。
  • [x] 错误在 最新版本 中尚未被修复。

问题类型

我正在使用官方支持的任务/模型/数据集进行评估。

环境

{'CUDA available': True, 'CUDA_HOME': '/usr/local/cuda', 'GCC': 'gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)', 'GPU 0,1,2,3': 'Tesla V100S-PCIE-32GB', 'MMEngine': '0.10.6', 'MUSA available': False, 'NVCC': 'Cuda compilation tools, release 11.7, V11.7.64', 'OpenCV': '4.11.0', 'PyTorch': '2.5.1+cu124', 'PyTorch compiling details': 'PyTorch built with:\n' ' - GCC 9.3\n' ' - C++ Version: 201703\n' ' - Intel(R) oneAPI Math Kernel Library Version ' '2024.2-Product Build 20240605 for Intel(R) 64 ' 'architecture applications\n' ' - Intel(R) MKL-DNN v3.5.3 (Git Hash ' '66f0cb9eb66affd2da3bf5f8d897376f04aae6af)\n' ' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n' ' - LAPACK is enabled (usually provided by ' 'MKL)\n' ' - NNPACK is enabled\n' ' - CPU capability usage: AVX512\n' ' - CUDA Runtime 12.4\n' ' - NVCC architecture flags: ' '-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n' ' - CuDNN 90.1\n' ' - Magma 2.6.1\n' ' - Build settings: BLAS_INFO=mkl, ' 'BUILD_TYPE=Release, CUDA_VERSION=12.4, ' 'CUDNN_VERSION=9.1.0, ' 'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, ' 'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 ' '-fabi-version=11 -fvisibility-inlines-hidden ' '-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO ' '-DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON ' '-DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK ' '-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE ' '-O2 -fPIC -Wall -Wextra -Werror=return-type ' '-Werror=non-virtual-dtor -Werror=bool-operation ' '-Wnarrowing -Wno-missing-field-initializers ' '-Wno-type-limits -Wno-array-bounds ' '-Wno-unknown-pragmas -Wno-unused-parameter ' '-Wno-strict-overflow -Wno-strict-aliasing ' '-Wno-stringop-overflow -Wsuggest-override ' '-Wno-psabi -Wno-error=old-style-cast ' '-Wno-missing-braces -fdiagnostics-color=always ' '-faligned-new -Wno-unused-but-set-variable ' '-Wno-maybe-uninitialized -fno-math-errno ' '-fno-trapping-math -Werror=format ' '-Wno-stringop-overflow, LAPACK_INFO=mkl, ' 'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, ' 'TORCH_VERSION=2.5.1, USE_CUDA=ON, USE_CUDNN=ON, ' 'USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, ' 'USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, ' 'USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, ' 'USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, ' 'USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, \n', 'Python': '3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0]', 'TorchVision': '0.20.1+cu124', 'lmdeploy': "not installed:No module named 'lmdeploy'", 'numpy_random_seed': 2147483648, 'opencompass': '0.4.0+862bf78', 'sys.platform': 'linux', 'transformers': '4.48.1'}

重现问题 - 代码/配置示例

使用命令启动测评 python run.py --datasets aime2024_0shot_nocot_gen_2b9dc2 --hf-type chat --hf-path /root/ai/deepseek32b/DeepSeek-R1-Distill-Qwen-32B --debug --max-out-len 8096 --generation-kwargs do_sample=True top_k=50 得分仅为3.33分 { "accuracy": 3.3333333333333335 } 查看output文件夹中的配置文件 configs文件内容如下

datasets=[ dict(abbr='aime2024', eval_cfg=dict( evaluator=dict( type='opencompass.datasets.MATHEvaluator', version='v2'), pred_postprocessor=dict( type='opencompass.datasets.math_postprocess_v2')), infer_cfg=dict( inferencer=dict( max_out_len=2048, type='opencompass.openicl.icl_inferencer.GenInferencer'), prompt_template=dict( template=dict( round=[ dict(prompt='{question}\nRemember to put your final answer within \boxed{}.', role='HUMAN'), ]), type='opencompass.openicl.icl_prompt_template.PromptTemplate'), retriever=dict( type='opencompass.openicl.icl_retriever.ZeroRetriever')), path='opencompass/aime2024', reader_cfg=dict( input_columns=[ 'question', ], output_column='answer'), type='opencompass.datasets.Aime2024Dataset'), ] models=[ dict(abbr='DeepSeek-R1-Distill-Qwen-32B_hf', batch_size=8, generation_kwargs=dict( do_sample=True, top_k=50, top_p=0.95), max_out_len=8096, max_seq_len=None, model_kwargs=dict( ), pad_token_id=None, path='/root/ai/deepseek32b/DeepSeek-R1-Distill-Qwen-32B', peft_kwargs=dict( ), peft_path=None, run_cfg=dict( num_gpus=1), stop_words=[ ], tokenizer_kwargs=dict( ), tokenizer_path=None, type='opencompass.models.huggingface_above_v4_33.HuggingFacewithChatTemplate'), ]

查看predictions文件内容如下

"28": {
    "origin_prompt": [
        {
            "role": "HUMAN",
            "prompt": "\nLet $b \\geq 2$ be an integer. Call a positive integer $n$ $b\\textit{-eautiful}$ if it has exactly two digits when expressed in base $b$, and these two digits sum to $\\sqrt{n}$. For example, $81$ is $13$-eautiful because $81=\\underline{6}$$\\underline{3}_{13}$ and $6+3=\\sqrt{81}$. Find the least integer $b\\geq 2$ for which there are more than ten $b$-eautiful integers.\n\nRemember to put your final answer within \\boxed{}.\n"
        }
    ],
    "prediction": "<think>\nAlright, so I have this problem about b-eautiful numbers. Let me try to understand it step by step.\n\nFirst, the problem says that a positive integer n is b-eautiful if two conditions are met:\n\n1. When n is expressed in base b, it has exactly two digits.\n2. The sum of these two digits equals the square root of n.\n\nAn example is given: 81 is 13-eautiful because when written in base 13, it's 63 (which are the digits 6 and 3), and 6 + 3 = 9, which is the square root of 81.\n\nThe task is to find the smallest integer b (where b is at least 2) such that there are more than ten b-eautiful numbers. So, I need to find the least b where the count of such numbers exceeds ten.\n\nLet me break down the problem.\n\nFirst, let's recall that any two-digit number in base b can be written as n = a*b + c, where a and c are digits in base b. Since it's a two-digit number, a can't be zero. So, a is between 1 and b-1, and c is between 0 and b-1.\n\nGiven that n is b-eautiful, the sum of the digits a + c must equal sqrt(n). So, we have:\n\na + c = sqrt(n)\n\nBut n is also equal to a*b + c. So, substituting n, we get:\n\na + c = sqrt(a*b + c)\n\nLet me write that equation:\n\na + c = sqrt(a*b + c)\n\nHmm, maybe I can square both sides to eliminate the square root:\n\n(a + c)^2 = a*b + c\n\nExpanding the left side:\n\na^2 + 2ac + c^2 = a*b + c\n\nLet me rearrange this equation to group like terms:\n\na^2 + 2ac + c^2 - a*b - c = 0\n\nSimplify terms:\n\na^2 + (2c - b)a + c^2 - c = 0\n\nThis is a quadratic equation in terms of a. Alternatively, we can think of it as a quadratic in c, but maybe it's better to see if we can find a relationship between a and c.\n\nAlternatively, maybe we can express this as:\n\na^2 + (2c - b)a + (c^2 - c) = 0\n\nBut this seems a bit messy. Maybe another approach.\n\nLet me consider that n = a*b + c and sqrt(n) = a + c.\n\nSo, n = (a + c)^2.\n\nTherefore, a*b + c = (a + c)^2.\n\nSo, the equation is:\n\na*b + c = a^2 + 2ac + c^2\n\nLet me rearrange:\n\na*b = a^2 + 2ac + c^2 - c\n\nSo,\n\na*b = a^2 + 2ac + c^2 - c\n\nHmm, maybe factor the right-hand side:\n\na*b = a^2 + c(2a + c - 1)\n\nNot sure if that helps.\n\nAlternatively, maybe express b in terms of a and c:\n\nFrom a*b + c = (a + c)^2,\n\nWe can write:\n\nb = [(a + c)^2 - c]/a\n\nSimplify numerator:\n\n(a + c)^2 - c = a^2 + 2ac + c^2 - c\n\nTherefore,\n\nb = (a^2 + 2ac + c^2 - c)/a = a + 2c + (c^2 - c)/a\n\nSince b must be an integer, (c^2 - c) must be divisible by a. So, a divides (c^2 - c). So, a | c(c - 1).\n\nSince a and c are digits in base b, c is between 0 and b - 1, and a is between 1 and b - 1.\n\nSo, for given a and c, b is determined as:\n\nb = a + 2c + (c^2 - c)/a\n\nGiven that b must be an integer, (c^2 - c) must be divisible by a.\n\nAlternatively, since a divides c(c - 1), and a is at least 1, which it is, but since a is a digit in base b, a is at most b - 1.\n\nBut perhaps for each base b, we can find the number of pairs (a, c) such that a + c = sqrt(a*b + c), with a and c satisfying 1 ≤ a ≤ b - 1 and 0 ≤ c ≤ b - 1.\n\nBut maybe instead of thinking about it per base, it's better to find, for each a and c, the corresponding b, and then count how many b's have more than ten such numbers.\n\nWait, but the problem is to find the least b such that there are more than ten b-eautiful numbers.\n\nSo, for each b, starting from 2 upwards, count the number of n that are b-eautiful, and find the smallest b where this count exceeds 10.\n\nSo, perhaps it's better to approach it by, for each b, find the number of pairs (a, c) such that a + c = sqrt(a*b + c).\n\nBut how?\n\nAlternatively, since n = a*b + c, and sqrt(n) = a + c, then n must be a perfect square. So, n is a perfect square.\n\nTherefore, n = k^2, where k is a positive integer.\n\nGiven that n is a two-digit number in base b, so n is between b (which is 10 in base b, i.e., 1*b + 0 = b) and b^2 - 1 (which is (b-1)*b + (b-1) = b^2 - 1).\n\nTherefore, k^2 is between b and b^2 - 1.\n\nSo, k must satisfy sqrt(b) ≤ k ≤ sqrt(b^2 - 1) ≈ b - 1/b, so k is between 2 and b - 1, since b ≥ 2.\n\nBut more precisely, k must satisfy b ≤ k^2 ≤ b^2 - 1.\n\nSo, sqrt(b) ≤ k ≤ sqrt(b^2 - 1).\n\nSince sqrt(b^2 - 1) is slightly less than b, so k must be less than or equal to b - 1.\n\nBut also, k must be at least 2, since b is at least 2, so sqrt(b) is at least sqrt(2) ≈ 1.414, so k must be at least 2.\n\nWait, for b=2, sqrt(2) is about 1.414, so k must be at least 2, but 2^2 = 4. But in base 2, the two-digit numbers go from 2 (10_2) to 3 (11_2). So, n can be 2 or 3. So, 4 is not a two-digit number in base 2. So, for b=2, the maximum n is 3, which is less than 4, so k=2 is too big. So, maybe for b=2, there are no b-eautiful numbers.\n\nBut let's check:\n\nFor b=2, n must be two digits in base 2, so n can be 2 or 3.\n\nFor n=2: digits are 10_2, so a=1, c=0. Sum is 1 + 0 = 1, which is not sqrt(2).\n\nFor n=3: digits are 11_2, sum is 1 + 1 = 2, which is sqrt(3)? No, sqrt(3) ≈ 1.732, so not equal. So, indeed, b=2 has no b-eautiful numbers.\n\nSimilarly, for b=3:\n\nTwo-digit numbers in base 3 are from 3 (10_3) to 8 (22_3).\n\nSo, n can be 3,4,5,6,7,8.\n\nCompute sqrt(n):\n\nn=3: sqrt≈1.732, sum digits: a + c.\n\nn=3: 10_3, a=1, c=0, sum=1 ≠ 1.732.\n\nn=4: 11_3, sum=2. sqrt(4)=2. So, 2=2, so 4 is 3-eautiful.\n\nn=5: 12_3, sum=1+2=3. sqrt(5)≈2.236. Not equal.\n\nn=6: 20_3, sum=2+0=2. sqrt(6)≈2.449. Not equal.\n\nn=7: 21_3, sum=2+1=3. sqrt(7)≈2.645. Not equal.\n\nn=8: 22_3, sum=4. sqrt(8)≈2.828. Not equal.\n\nSo, only n=4 is 3-eautiful.\n\nThus, for b=3, there is 1 b-eautiful number.\n\nSimilarly, for b=4:\n\nTwo-digit numbers in base 4: 4 (10_4) to 15 (33_4).\n\nCompute for each n from 4 to 15:\n\nn=4: 10_4, sum=1. sqrt(",
    "gold": "211"
},
"29": {
    "origin_prompt": [
        {
            "role": "HUMAN",
            "prompt": "\nFind the number of rectangles that can be formed inside a fixed regular dodecagon ($12$-gon) where each side of the rectangle lies on either a side or a diagonal of the dodecagon. The diagram below shows three of those rectangles. [asy] unitsize(0.6 inch); for(int i=0; i<360; i+=30) { dot(dir(i), 4+black); draw(dir(i)--dir(i+30)); } draw(dir(120)--dir(330)); filldraw(dir(210)--dir(240)--dir(30)--dir(60)--cycle, mediumgray, linewidth(1.5)); draw((0,0.366)--(0.366,0), linewidth(1.5)); [/asy]\n\nRemember to put your final answer within \\boxed{}.\n"
        }
    ],
    "prediction": "<think>\nOkay, so I have this problem where I need to find the number of rectangles that can be formed inside a regular dodecagon, which is a 12-sided polygon. Each side of the rectangle has to lie on either a side or a diagonal of the dodecagon. The Asymptote diagram shows three examples of such rectangles, so I know they can be of different sizes and orientations.\n\nFirst, I should recall that a regular dodecagon has all sides equal and all internal angles equal. Each internal angle in a regular dodecagon is 150 degrees because the formula for internal angles is \\((n-2) \\times 180^\\circ / n\\), which for \\(n=12\\) gives \\((12-2) \\times 180 / 12 = 150^\\circ\\).\n\nNow, the problem is about finding rectangles. So, I need to figure out how rectangles can be formed by connecting sides or diagonals of the dodecagon. Since a rectangle has four sides with opposite sides equal and all angles 90 degrees, the key is to find sets of four points (vertices of the dodecagon) that form such a shape.\n\nLet me think about how the sides and diagonals of the dodecagon can form rectangles. In a regular polygon, the sides are the edges connecting consecutive vertices, and diagonals are the lines connecting non-consecutive vertices. In a dodecagon, each vertex is connected to 11 others, so there are a lot of diagonals.\n\nBut for a rectangle, we need four vertices such that the sides are either sides or diagonals of the dodecagon. So, maybe the sides of the rectangle correspond to sides or diagonals of the dodecagon.\n\nI remember that in regular polygons, certain diagonals can form rectangles or other shapes. For example, in a regular octagon, you can form rectangles by connecting every other vertex.\n\nSimilarly, in a regular dodecagon, perhaps connecting every third vertex or some other interval can form rectangles. Let me think about how the sides and diagonals can form right angles.\n\nWait, in a regular polygon, the central angles between consecutive vertices are equal. For a dodecagon, each central angle is \\(360^\\circ / 12 = 30^\\circ\\). So, the angle between two adjacent vertices from the center is 30 degrees.\n\nTo form a rectangle, the sides must meet at 90-degree angles. So, the vectors corresponding to the sides of the rectangle must be perpendicular. Since the polygon is regular, the vectors can be represented in terms of the central angles.\n\nSo, if I can find two vectors that are perpendicular and lie on the sides or diagonals of the dodecagon, then I can form a rectangle by combining them.\n\nLet me denote the vertices of the dodecagon as \\(V_0, V_1, V_2, \\ldots, V_{11}\\) going around the polygon. Each vertex \\(V_k\\) can be represented in the complex plane as \\(e^{i \\theta_k}\\) where \\(\\theta_k = 30^\\circ \\times k\\).\n\nIf I can find four points \\(V_a, V_b, V_c, V_d\\) such that the vectors \\(V_b - V_a\\) and \\(V_d - V_a\\) are perpendicular, and similarly for other sides, then they form a rectangle.\n\nBut maybe there's a simpler way. Since the dodecagon is regular and symmetric, maybe I can count the number of rectangles based on the number of pairs of parallel sides.\n\nWait, rectangles have opposite sides equal and parallel. So, in the dodecagon, if I can find two pairs of parallel chords (sides or diagonals) that are perpendicular to each other, they can form a rectangle.\n\nSo, perhaps I should figure out how many pairs of parallel chords exist in the dodecagon and then see how many of these pairs are perpendicular.\n\nBut first, how many pairs of parallel sides or diagonals are there in a regular dodecagon?\n\nIn a regular polygon with \\(n\\) sides, the number of pairs of parallel sides is \\(n/2\\) if \\(n\\) is even. Wait, for a dodecagon, which has 12 sides, the number of pairs of parallel sides is 6, since each side has one parallel side opposite to it.\n\nBut in addition to sides, there are diagonals that can also be parallel. So, the number of pairs of parallel diagonals is more complicated.\n\nWait, in a regular polygon, the number of directions for parallel chords depends on the number of sides. For a dodecagon, each chord can be defined by the number of vertices it skips. For example, a side skips 0 vertices, a diagonal that skips 1 vertex is another type, skips 2, skips 3, etc., up to skipping 5 vertices (since beyond that, it's the same as skipping fewer in the other direction).\n\nSo, for a regular dodecagon, chords can skip \\(k = 0, 1, 2, 3, 4, 5\\) vertices. Each \\(k\\) gives a set of parallel chords.\n\nTherefore, each direction corresponds to a step size \\(k\\), and each step size \\(k\\) from 1 to 5 (since \\(k=0\\) is the sides themselves) gives a set of parallel diagonals. So, for each \\(k = 1\\) to \\(5\\), there are 12 diagonals each, but they are grouped into parallel sets.\n\nWait, actually, for each \\(k\\), the number of distinct directions is 6 because of symmetry. Hmm, maybe I need to think differently.\n\nWait, perhaps each step size \\(k\\) and \\(12 - k\\) gives the same direction but in opposite orientation. So, for \\(k = 1\\) and \\(k = 11\\), they are the same direction but opposite; similarly for \\(k=2\\) and \\(k=10\\), etc. So, for step sizes, we can consider \\(k = 1\\) to \\(6\\), but beyond \\(k=6\\), it's the same as smaller \\(k\\) in the opposite direction.\n\nBut in our case, since we have 12 sides, each step size \\(k\\) from 1 to 5 gives a unique direction, and \\(k=6\\) is the diameter, which is its own opposite.\n\nWait, actually, in a regular dodecagon, the diameters (which connect opposite vertices) are the only chords that are their own opposites. So, for each \\(k\\) from 1 to 5, there are two directions (clockwise and counterclockwise), but in terms of parallelism, they are the same direction.\n\nWait, maybe it's better to think in terms of slopes. Each chord with step size \\(k\\) will have a certain slope, and chords with step size \\(k\\) and \\(12 - k\\) will have slopes that are negatives of each other, hence not parallel. So, actually, for each \\(k\\) from 1 to 5, there is a unique set of parallel chords.\n\nTherefore, in total, there are 6 different directions for chords: step sizes \\(k = 0\\) (sides), \\(1\\), \\(2\\), \\(3\\), \\(4\\), \\(5\\), and \\(6\\) (diameters). But step size \\(6\\) is just the diameter, which is a single direction.\n\nWait, no, step size \\(k\\) and \\(12 - k\\) are different directions because they go in opposite directions around the polygon. So, for each \\(k = 1\\) to \\(5\\), we have two directions, but they are not parallel. So, actually, each step size \\(k\\) corresponds to a unique direction. So, for step sizes \\(k = 0\\) (sides), \\(1\\), \\(2\\), \\(3\\), \\(4\\), \\(5\\), and \\(6\\), each gives a unique direction.\n\nWait, but for step size \\(k=6\\), it's the diameter, so it's only one direction because going \\(6\\) steps in either direction from a vertex gets you to the same opposite vertex.\n\nSo, in total, there are 7 different directions for chords: sides (k=0), diameters (k=6), and for \\(k=1\\) to \\(5\\), each gives two directions, but they are not parallel. Wait, no, actually, for each \\(k\\), the chords are parallel if they have the same step size. So, chords with the same \\(k\\) are parallel, regardless of starting point.\n\nTherefore, for each \\(k\\) from 0 to 6, the chords with step size \\(k\\) are all parallel to each other. So, in a dodecagon, how many unique directions do we have? For each \\(k = 0, 1, 2, 3, 4, 5, 6\\), we have a unique direction.\n\nBut wait, for \\(k = 1\\) and \\(k = 11\\), are they the same? Because stepping 1 forward or 11 backward is the same direction.\n\nWait, actually, in a regular polygon, stepping \\(k\\) forward is equivalent to stepping \\(n - k\\) backward. So, for direction purposes, stepping \\(k\\) or \\(n - k\\) gives the same direction. So, in a 12-gon, stepping 1 and stepping 11 give the same direction but in opposite orientations.\n\nBut when considering parallelism,",
    "gold": "315"
}

发现推理并未完成,这可能是导致得分底下的主要原因

重现问题 - 命令或脚本

以上

重现问题 - 错误信息

以上

其他信息

No response

carllisicau avatar Feb 18 '25 10:02 carllisicau

DeepSeek R1论文里设置max_out_len=32768,2048是不够的

Sibyl233 avatar Feb 22 '25 09:02 Sibyl233

DeepSeek R1论文里设置max_out_len=32768,2048是不够的

改成32768之后accuracy仍然只有3.33,我用的deepseek-distill-Qwen2-7B (smoothquantpre) [maoshizhuo@ISPC-GPU2-CS opencompass]$ CUDA_VISIBLE_DEVICES=1 python run.py --datasets aime2024_0shot_nocot_gen_2b9dc2 --hf-type chat --hf-path /home/maoshizhuo/2025/deepseek-Qwen-7B --debug --max-out-len 32768 --generation-kwargs do_sample=True top_k=50 02/23 21:24:40 - OpenCompass - INFO - Loading aime2024_0shot_nocot_gen_2b9dc2: /home/maoshizhuo/2025/GPassK/opencompass/opencompass/configs/./datasets/aime2024/aime2024_0shot_nocot_gen_2b9dc2.py 02/23 21:24:40 - OpenCompass - INFO - Loading example: /home/maoshizhuo/2025/GPassK/opencompass/opencompass/configs/./summarizers/example.py 02/23 21:24:40 - OpenCompass - INFO - Current exp folder: outputs/default/20250223_212440 02/23 21:24:40 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored. 02/23 21:24:40 - OpenCompass - INFO - Partitioned into 1 tasks. 02/23 21:24:41 - OpenCompass - INFO - Task [deepseek-Qwen-7B_hf/aime2024] Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:10<00:00, 5.06s/it] 02/23 21:24:56 - OpenCompass - INFO - using stop words: ['<|end▁of▁sentence|>'] 02/23 21:24:56 - OpenCompass - INFO - Try to load the data from /home/maoshizhuo/.cache/opencompass/./data/aime.jsonl 02/23 21:24:56 - OpenCompass - INFO - Start inferencing [deepseek-Qwen-7B_hf/aime2024] [2025-02-23 21:24:56,836] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting build dataloader [2025-02-23 21:24:56,837] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process... 0%| | 0/4 [00:00<?, ?it/s]02/23 21:24:56 - OpenCompass - INFO - Generation Args of Huggingface: 02/23 21:24:56 - OpenCompass - INFO - {'do_sample': True, 'top_k': 50, 'stopping_criteria': [<opencompass.models.huggingface_above_v4_33._get_stopping_criteria..MultiTokenEOSCriteria object at 0x7f7184512980>], 'max_new_tokens': 2048, 'pad_token_id': 151643} 25%|█████████████████████████████████ | 1/4 [01:44<05:14, 104.73s/it]02/23 21:26:41 - OpenCompass - INFO - Generation Args of Huggingface: 02/23 21:26:41 - OpenCompass - INFO - {'do_sample': True, 'top_k': 50, 'stopping_criteria': [<opencompass.models.huggingface_above_v4_33._get_stopping_criteria..MultiTokenEOSCriteria object at 0x7f71845a8490>], 'max_new_tokens': 2048, 'pad_token_id': 151643} 50%|██████████████████████████████████████████████████████████████████ | 2/4 [03:24<03:23, 101.67s/it]02/23 21:28:21 - OpenCompass - INFO - Generation Args of Huggingface: 02/23 21:28:21 - OpenCompass - INFO - {'do_sample': True, 'top_k': 50, 'stopping_criteria': [<opencompass.models.huggingface_above_v4_33._get_stopping_criteria..MultiTokenEOSCriteria object at 0x7f71845a9840>], 'max_new_tokens': 2048, 'pad_token_id': 151643} 75%|███████████████████████████████████████████████████████████████████████████████████████████████████ | 3/4 [05:04<01:40, 100.80s/it]02/23 21:30:00 - OpenCompass - INFO - Generation Args of Huggingface: 02/23 21:30:00 - OpenCompass - INFO - {'do_sample': True, 'top_k': 50, 'stopping_criteria': [<opencompass.models.huggingface_above_v4_33._get_stopping_criteria..MultiTokenEOSCriteria object at 0x7f71845a9a20>], 'max_new_tokens': 2048, 'pad_token_id': 151643} 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [06:35<00:00, 98.96s/it] 02/23 21:31:32 - OpenCompass - INFO - Partitioned into 1 tasks. 02/23 21:31:33 - OpenCompass - INFO - Try to load the data from /home/maoshizhuo/.cache/opencompass/./data/aime.jsonl 02/23 21:31:33 - OpenCompass - INFO - Task [deepseek-Qwen-7B_hf/aime2024]: {'accuracy': 3.3333333333333335} dataset version metric mode deepseek-Qwen-7B_hf


aime2024 2b9dc2 accuracy gen 3.33

msz12345 avatar Feb 23 '25 13:02 msz12345

我也遇到了这个问题,光改模型的max_out_len不行的,因为他数据集的参数限制是2048,要两个都改。outputs里有参数相关的py文件,可以看到数据集的max_out_len。然后我去数据集相关的源码里改的,才解决。

nku-ligl avatar Feb 25 '25 02:02 nku-ligl

我也遇到了这个问题,光改模型的max_out_len不行的,因为他数据集的参数限制是2048,要两个都改。outputs里有参数相关的py文件,可以看到数据集的max_out_len。然后我去数据集相关的源码里改的,才解决。

请问一下你说的数据集相关的源码是哪个源码呢?我找了dataset的aime2024.py并没有找到限制输出长度的代码,谢谢!

msz12345 avatar Feb 25 '25 02:02 msz12345

configs/datasets/aime2024/,你用哪个版本的数据集就改哪个版本的代码,改max_out_len

nku-ligl avatar Feb 25 '25 03:02 nku-ligl

请问改了max out len 解决了吗

configs/datasets/aime2024/,你用哪个版本的数据集就改哪个版本的代码,改max_out_len

wccccp avatar Feb 27 '25 13:02 wccccp

请问改了max out len 解决了吗

configs/datasets/aime2024/,你用哪个版本的数据集就改哪个版本的代码,改max_out_len

解决了,得到的结果非常准确,花了9个多小时,在这里多谢 @nku-ligl 同仁了!

msz12345 avatar Feb 27 '25 13:02 msz12345

请问你跑出来多少分,能跟官方的数据对上嘛

hh0o0hh avatar Feb 28 '25 06:02 hh0o0hh

请问你跑出来多少分,能跟官方的数据对上嘛

和官方的差不多,好像是30.3%,官方的低一点点

msz12345 avatar Mar 01 '25 03:03 msz12345

We have provided an example on how to re-implement the AIME for DeepSeek-R1-32B. Please check: https://github.com/open-compass/opencompass/blob/main/docs/en/user_guides/deepseek_r1.md

tonysy avatar Mar 04 '25 09:03 tonysy

请问你用的数据集和aime2024_gen不带版本号 默认指向的aime2024_gen_6e39a4有啥区别,评测chat模型应该用哪个呢?

AllenShow avatar Mar 11 '25 08:03 AllenShow

aime2024_gen_6e39a4

你知道如果我是本地下载了数据集,怎么指向本地的路径吗

lebronjamesking avatar Mar 15 '25 12:03 lebronjamesking

model config 中指定 pred_postprocessor=dict(type=extract_non_reasoning_content)

cdpath avatar Mar 17 '25 02:03 cdpath

请问你跑出来多少分,能跟官方的数据对上嘛

和官方的差不多,好像是30.3%,官方的低一点点

请问30.3%是什么模型呀,官网上7B-distill似乎能到55.5%

wjw136 avatar Mar 18 '25 13:03 wjw136

请问你跑出来多少分,能跟官方的数据对上嘛

和官方的差不多,好像是30.3%,官方的低一点点

请问30.3%是什么模型呀,官网上7B-distill似乎能到55.5%

我用的1.5B-distill

msz12345 avatar Mar 18 '25 13:03 msz12345

我想问一下为什么我的出现了找不到路径呢 (opencompass) (base) ubuntu@ubuntu-SYS-4028GR-TR:/apps/llms/opencompass$ python run.py --datasets aime2024_0shot_nocot_gen_2b9dc2 --hf-type chat --hf-path /apps/llama_factory/LLaMA-Factory/saves/DeepSeek-R1-Distill-Qwen-7B/full/sft_25_4_3_19:02/ --debug --max-out-len 32768 --generation-kwargs do_sample=True top_k=50 04/08 10:50:14 - OpenCompass - INFO - Loading aime2024_0shot_nocot_gen_2b9dc2: /apps/llms/opencompass/opencompass/configs/./datasets/aime2024/aime2024_0shot_nocot_gen_2b9dc2.py 04/08 10:50:14 - OpenCompass - INFO - Loading example: /apps/llms/opencompass/opencompass/configs/./summarizers/example.py 04/08 10:50:14 - OpenCompass - INFO - Current exp folder: outputs/default/20250408_105014 04/08 10:50:14 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored. 04/08 10:50:14 - OpenCompass - INFO - Partitioned into 1 tasks. 04/08 10:50:15 - OpenCompass - WARNING - Only use 1 GPUs for total 8 available GPUs in debug mode. 04/08 10:50:15 - OpenCompass - INFO - Task [_hf/aime2024] Sliding Window Attention is enabled but not implemented for sdpa; unexpected results may be encountered. Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.10it/s] Some parameters are on the meta device because they were offloaded to the cpu. 04/08 10:50:21 - OpenCompass - INFO - using stop words: ['<|end▁of▁sentence|>'] Traceback (most recent call last): File "/apps/llms/opencompass/run.py", line 4, in main() File "/apps/llms/opencompass/opencompass/cli/main.py", line 337, in main runner(tasks) File "/apps/llms/opencompass/opencompass/runners/base.py", line 38, in call status = self.launch(tasks) File "/apps/llms/opencompass/opencompass/runners/local.py", line 128, in launch task.run(cur_model=getattr(self, 'cur_model', File "/apps/llms/opencompass/opencompass/tasks/openicl_infer.py", line 79, in run self.dataset = build_dataset_from_cfg(self.dataset_cfg) File "/apps/llms/opencompass/opencompass/utils/build.py", line 12, in build_dataset_from_cfg return LOAD_DATASET.build(dataset_cfg) File "/home/ubuntu/anaconda3/envs/opencompass/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build return self.build_func(cfg, *args, **kwargs, registry=self) File "/home/ubuntu/anaconda3/envs/opencompass/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg obj = obj_cls(**args) # type: ignore File "/apps/llms/opencompass/opencompass/datasets/base.py", line 17, in init dataset = self.load(**kwargs) File "/apps/llms/opencompass/opencompass/datasets/aime2024.py", line 18, in load with open(path, 'r') as f: FileNotFoundError: [Errno 2] No such file or directory: '' (opencompass) (base) ubuntu@ubuntu-SYS-4028GR-TR:/apps/llms/opencompa

Mrguanglei avatar Apr 08 '25 02:04 Mrguanglei

我想问一下为什么我的出现了找不到路径呢 (opencompass) (base) ubuntu@ubuntu-SYS-4028GR-TR:/apps/llms/opencompass$ python run.py --datasets aime2024_0shot_nocot_gen_2b9dc2 --hf-type chat --hf-path /apps/llama_factory/LLaMA-Factory/saves/DeepSeek-R1-Distill-Qwen-7B/full/sft_25_4_3_19:02/ --debug --max-out-len 32768 --generation-kwargs do_sample=True top_k=50 04/08 10:50:14 - OpenCompass - INFO - Loading aime2024_0shot_nocot_gen_2b9dc2: /apps/llms/opencompass/opencompass/configs/./datasets/aime2024/aime2024_0shot_nocot_gen_2b9dc2.py 04/08 10:50:14 - OpenCompass - INFO - Loading example: /apps/llms/opencompass/opencompass/configs/./summarizers/example.py 04/08 10:50:14 - OpenCompass - INFO - Current exp folder: outputs/default/20250408_105014 04/08 10:50:14 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored. 04/08 10:50:14 - OpenCompass - INFO - Partitioned into 1 tasks. 04/08 10:50:15 - OpenCompass - WARNING - Only use 1 GPUs for total 8 available GPUs in debug mode. 04/08 10:50:15 - OpenCompass - INFO - Task [_hf/aime2024] Sliding Window Attention is enabled but not implemented for sdpa; unexpected results may be encountered. Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.10it/s] Some parameters are on the meta device because they were offloaded to the cpu. 04/08 10:50:21 - OpenCompass - INFO - using stop words: ['<|end▁of▁sentence|>'] Traceback (most recent call last): File "/apps/llms/opencompass/run.py", line 4, in main() File "/apps/llms/opencompass/opencompass/cli/main.py", line 337, in main runner(tasks) File "/apps/llms/opencompass/opencompass/runners/base.py", line 38, in call status = self.launch(tasks) File "/apps/llms/opencompass/opencompass/runners/local.py", line 128, in launch task.run(cur_model=getattr(self, 'cur_model', File "/apps/llms/opencompass/opencompass/tasks/openicl_infer.py", line 79, in run self.dataset = build_dataset_from_cfg(self.dataset_cfg) File "/apps/llms/opencompass/opencompass/utils/build.py", line 12, in build_dataset_from_cfg return LOAD_DATASET.build(dataset_cfg) File "/home/ubuntu/anaconda3/envs/opencompass/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build return self.build_func(cfg, *args, **kwargs, registry=self) File "/home/ubuntu/anaconda3/envs/opencompass/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg obj = obj_cls(**args) # type: ignore File "/apps/llms/opencompass/opencompass/datasets/base.py", line 17, in init dataset = self.load(**kwargs) File "/apps/llms/opencompass/opencompass/datasets/aime2024.py", line 18, in load with open(path, 'r') as f: FileNotFoundError: [Errno 2] No such file or directory: '' (opencompass) (base) ubuntu@ubuntu-SYS-4028GR-TR:/apps/llms/opencompa

可能你需要git clone最新版的仓库,我的没有这个问题。

msz12345 avatar Apr 12 '25 03:04 msz12345

我想问一下为什么我的出现了找不到路径呢 (opencompass) (base) ubuntu@ubuntu-SYS-4028GR-TR:/apps/llms/opencompass$ python run.py --datasets aime2024_0shot_nocot_gen_2b9dc2 --hf-type chat --hf-path /apps/llama_factory/LLaMA-Factory/saves/DeepSeek-R1-Distill-Qwen-7B/full/sft_25_4_3_19:02/ --debug --max-out-len 32768 --generation-kwargs do_sample=True top_k=50 04/08 10:50:14 - OpenCompass - INFO - Loading aime2024_0shot_nocot_gen_2b9dc2: /apps/llms/opencompass/opencompass/configs/./datasets/aime2024/aime2024_0shot_nocot_gen_2b9dc2.py 04/08 10:50:14 - OpenCompass - INFO - Loading example: /apps/llms/opencompass/opencompass/configs/./summarizers/example.py 04/08 10:50:14 - OpenCompass - INFO - Current exp folder: outputs/default/20250408_105014 04/08 10:50:14 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored. 04/08 10:50:14 - OpenCompass - INFO - Partitioned into 1 tasks. 04/08 10:50:15 - OpenCompass - WARNING - Only use 1 GPUs for total 8 available GPUs in debug mode. 04/08 10:50:15 - OpenCompass - INFO - Task [_hf/aime2024] Sliding Window Attention is enabled but not implemented for sdpa; unexpected results may be encountered. Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.10it/s] Some parameters are on the meta device because they were offloaded to the cpu. 04/08 10:50:21 - OpenCompass - INFO - using stop words: ['<|end▁of▁sentence|>'] Traceback (most recent call last): File "/apps/llms/opencompass/run.py", line 4, in main() File "/apps/llms/opencompass/opencompass/cli/main.py", line 337, in main runner(tasks) File "/apps/llms/opencompass/opencompass/runners/base.py", line 38, in call status = self.launch(tasks) File "/apps/llms/opencompass/opencompass/runners/local.py", line 128, in launch task.run(cur_model=getattr(self, 'cur_model', File "/apps/llms/opencompass/opencompass/tasks/openicl_infer.py", line 79, in run self.dataset = build_dataset_from_cfg(self.dataset_cfg) File "/apps/llms/opencompass/opencompass/utils/build.py", line 12, in build_dataset_from_cfg return LOAD_DATASET.build(dataset_cfg) File "/home/ubuntu/anaconda3/envs/opencompass/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build return self.build_func(cfg, *args, **kwargs, registry=self) File "/home/ubuntu/anaconda3/envs/opencompass/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg obj = obj_cls(**args) # type: ignore File "/apps/llms/opencompass/opencompass/datasets/base.py", line 17, in init dataset = self.load(**kwargs) File "/apps/llms/opencompass/opencompass/datasets/aime2024.py", line 18, in load with open(path, 'r') as f: FileNotFoundError: [Errno 2] No such file or directory: '' (opencompass) (base) ubuntu@ubuntu-SYS-4028GR-TR:/apps/llms/opencompa

遇到了和你相同的问题,请问你解决这个问题了嘛?能否请教一下

MingZwhy avatar Apr 21 '25 09:04 MingZwhy

configs/datasets/aime2024/,你用哪个版本的数据集就改哪个版本的代码,改max_out_len

想问一下这是要改site-packages里的代码吗

Haruka1307 avatar Oct 16 '25 05:10 Haruka1307