How to specify my own dataset and custom factors using rdagent fin_quant
如果我想指定用qlib处理好的数据集,我应该如何在使用rdagent fin_quant启动的时候指定这个数据集目录呢 自定义的因子如何指定呢
目前似乎只能修改rdagent/scenarios/qlib/experiment目录下的yaml文件 是否有指定这个配置的方式
Hi, @Sesame2
Currently there is no interface to specify the dataset in RD-Agent's fintech scenario.
But I think maybe replacing the dataset under the default directory (~/.qlib/qlib_data/cn_data) is a simpler way other than the method of modifying the yaml file.
I think this approach might not work, because my custom dataset has different time ranges, stock universe, market region, and benchmark compared to the default dataset. Simply placing it under the default dataset path doesn’t seem to take effect.
Hey @SunsetWolf, thanks for your reply!
I’ve seen quite a few discussions in the RD-Agent community groups (on WeChat and QQ), and many users feel that the documentation is not very complete — it’s often unclear how to modify certain configurations or perform specific operations.
Recently, I’ve been reading through the RD-Agent source code and have written some notes and documentation about how to use RD-Agent and adjust its configurations.
If possible, I’d like to contribute by opening a PR to share these documents, so that other users with similar questions can benefit. Could you please let me know if that would be acceptable, and give me some guidance on how to proceed?
I think this approach might not work, because my custom dataset has different time ranges, stock universe, market region, and benchmark compared to the default dataset. Simply placing it under the default dataset path doesn’t seem to take effect.
if your custom dataset differs from the default one in aspects such as time range, stock universe, market region, or benchmark, the only effective way to handle this is by updating the corresponding YAML configuration file, just as you mentioned earlier.
I think this approach might not work, because my custom dataset has different time ranges, stock universe, market region, and benchmark compared to the default dataset. Simply placing it under the default dataset path doesn’t seem to take effect.
if your custom dataset differs from the default one in aspects such as time range, stock universe, market region, or benchmark, the only effective way to handle this is by updating the corresponding YAML configuration file, just as you mentioned earlier.
I’m still a bit unsure about which part of the YAML should actually be modified. There are several YAML files under rdagent/scenarios/qlib/experiment/factor_template and rdagent/scenarios/qlib/experiment/model_template, and each of them seems to include some custom data loading or processing logic.
I’m not quite sure how these files guide RD-Agent’s experiment process or how they control the factor mining workflow. So I’m a bit worried that directly modifying the default YAML files might break the experiment flow of the Agent.
Maybe you could provide some more detailed guidance on which YAML files should be changed and how they affect the experiment?
Hey @SunsetWolf, thanks for your reply!
I’ve seen quite a few discussions in the RD-Agent community groups (on WeChat and QQ), and many users feel that the documentation is not very complete — it’s often unclear how to modify certain configurations or perform specific operations.
Recently, I’ve been reading through the RD-Agent source code and have written some notes and documentation about how to use RD-Agent and adjust its configurations.
If possible, I’d like to contribute by opening a PR to share these documents, so that other users with similar questions can benefit. Could you please let me know if that would be acceptable, and give me some guidance on how to proceed?
We’d be very happy to welcome your contribution! 🎉
You can open a Pull Request (PR) to share your documentation or code. Once submitted, the maintainers will review it, and if everything looks good, it will be merged into the main branch. During the review process, there might be some feedback or suggestions for improvement, and the PR may go through a few iterations before being finalized.
Thank you for your willingness to contribute to the RD-Agent community!
I think this approach might not work, because my custom dataset has different time ranges, stock universe, market region, and benchmark compared to the default dataset. Simply placing it under the default dataset path doesn’t seem to take effect.
if your custom dataset differs from the default one in aspects such as time range, stock universe, market region, or benchmark, the only effective way to handle this is by updating the corresponding YAML configuration file, just as you mentioned earlier.
I’m still a bit unsure about which part of the YAML should actually be modified. There are several YAML files under
rdagent/scenarios/qlib/experiment/factor_templateandrdagent/scenarios/qlib/experiment/model_template, and each of them seems to include some custom data loading or processing logic.I’m not quite sure how these files guide RD-Agent’s experiment process or how they control the factor mining workflow. So I’m a bit worried that directly modifying the default YAML files might break the experiment flow of the Agent.
Maybe you could provide some more detailed guidance on which YAML files should be changed and how they affect the experiment?
Although we don’t currently provide a very user-friendly interface for customizing them, you can take a look at the function generate_data_folder_from_qlib(). This function shows where the YAML files fit into RD-Agent’s experiment pipeline — it launches a Qlib environment and runs generate.py, which in turn loads the YAML configurations to produce the factor data.
So, if you plan to modify or add custom data, it’s important to make sure your changes stay consistent with the indexing and data structure expected by generate.py.
Hi @SunsetWolf,
Just wanted to let you know that I’ve opened a PR (#1288) related to this issue
It adds some documentation about execution environment configuration, since this is a common point of confusion for many users.
I may have a few more documentation updates coming soon — especially around custom datasets and factor configurations — so I’ll likely open additional PRs to help make these parts easier to understand for new users.
Thanks again for your help and guidance!