RD-Agent Question on how to get run score for test dataset

❓ Questions and Help

Hello team, thanks for creating this wonderful toolset for data science automation. I am currently creating a custom dataset and use it to train a binary classification model.

The command i am using: rdagent data_science --competition <task_name>
I am using similar data structure as arf-12-hours-prediction-task in the tutorial. And I have also provided the test dataset, in the following files hierarchy:

git_ignore_folder/ds_data
├── eval
│   ├── <task_name>
│       ├── grade.py
│       ├── submission_test.csv
│       └── valid.py
├── <task_name>
│   ├── description.md
│   ├── sample.py
│   ├── sample_submission.csv
│   ├── test
│   │   ├── info.csv
│   │   └── X.npy
│   └── train
│       ├── info.csv
│       └── X.npy

I can successfully trigger the task, but when I look at the tracker UI, it seems the run score (test) is never executed. Please see the snapshot.

What have i missed? the env file is as follows:

# ==========================================
# Task Configuration
# ==========================================
DS_LOCAL_DATA_PATH="git_ignore_folder/ds_data"
DS_CODER_ON_WHOLE_PIPELINE=True
DS_CODER_COSTEER_ENV_TYPE=docker
DS_IF_USING_MLE_DATA=False
DS_SAMPLE_DATA_BY_LLM=False
DS_SCEN=rdagent.scenarios.data_science.scen.DataScienceScen

Nov 17 '25 13:11 alexyzhou

Hi, @alexyzhou

Thanks for the detailed information! From your description and the UI screenshot, it looks like the test scoring step wasn’t triggered. In the Data Science scenario, the evaluation on the test set is not automatically executed by the main pipeline — it requires running the Scoring the test results step described in the Run the Application section of the Data Science documentation.

Please double-check that step and run the scoring command accordingly. After that, the test score should appear properly in the tracker UI.

If it still doesn’t show up, feel free to share your command and logs — we’re happy to help!

Nov 18 '25 05:11 SunsetWolf

Hi @SunsetWolf

Thank you for your reply! I will try the command and see if it can get executed. Is there any possibility to get it automatically running as validation scores? I think it will be a great feature if we can enable that. Thanks!

Dec 05 '25 09:12 alexyzhou