refactor: Refactor data models
Description
The is a PR on issue #1316. I refactored sharegpt and tool calling related data models, and put them into camel/data_schemas/data_schemas.py Here are the major changes:
- Refactored
ShareGPTMessage,ShareGPTData,ToolCall, andToolResponseintodata_schema/data_schema.py. - Removed
ConversationItemandShareGPTDatafromcamel/data_collector/sharegpt_collector.py. - Removed the entire
camel/messages/conversion/conversation_models.pyfile. - Modified multiple corresponding
__init__.pyfiles. - Modified the corresnponding examples for sharegpt_collector and conversion between Camel Message and ShareGPT
Note: There is a cookbook "Real Function Calls and Hermes Format Data Generation" affected by this refactor. I hope to also update this cookbook if this PR is merged.
Checklist
Go over all the following points, and put an x in all the boxes that apply.
- [x] I have read the CONTRIBUTION guide (required)
- [x] I have linked this PR to an issue using the Development section on the right sidebar or by adding
Fixes #issue-numberin the PR description (required) - [ ] I have checked if any dependencies need to be added or updated in
pyproject.tomlandpoetry.lock - [ ] I have updated the tests accordingly (required for a bug fix or a new feature)
- [ ] I have updated the documentation if needed:
- [ ] I have added examples if this is a new feature
If you are unsure about any of these, don't hesitate to ask. We are here to help!
Maybe we can also move the messages/conversion/alpaca.py to the new data_schema folder as well?
Thanks @boerz-coding for the nice contribution, overall looks good to me! I left some small comments, waiting for my colleagues to confirm their thoughts.
Maybe we can also move the messages/conversion/alpaca.py to the new data_schema folder as well?
Yes I should have done this. I just realized I misunderstood the scope of issue #1316 at the very beginning. The alpaca data structures should definitely be unified. I will continue to work on this comment and let the reviewers know when it is done. Thank you for this comment!
Edit on this one: I read the issue #1316 again, and find that we did not mention this messages/conversion/alpaca.py in the original issue. I still agree with Xiaotian @MuggleJinx and think I should move messages/conversion/alpaca.py to the new data_schema folder (I actually plan to directly move everything in this file into the data_schemas.py file), but it will be helpful if @zjrwtx @Wendong-Fan or other reviewers would like to double check on this.
Thanks @boerz-coding for the nice contribution, overall looks good to me! I left some small comments, waiting for my colleagues to confirm their thoughts.
Thank you again for the helpful comments!
@MuggleJinx @zjrwtx I have modified the code according to Xiaotian's suggestions, and now it is ready for review again!
@zjrwtx @Wendong-Fan Could you please review my PR when you have time? I have modified the code according to Xiaotian's suggestions, and now it is ready for review again! There is no rush. Just wanted to make sure you have seen my message.
@zjrwtx Hi Yifeng, could you please review my PR when you have time? There is no rush but I want to make sure you have seen this.
@zjrwtx Hi Yifeng, could you please review my PR when you have time? There is no rush but I want to make sure you have seen this.
@zjrwtx Hi Yifeng, could you please review my PR when you have time? There is no rush but I want to make sure you have seen this.
hey @boerz-coding ,sorry for my late review, thanks for your great contribution. can we resolve the conflicts first?
Hi Yifeng@zjrwtx, no worries at all—I'm grateful to have your feedback!
Apologies for missing the conflict earlier. The main repo has changed a bit since my initial PR in February, but I've now resolved the conflicts. The PR should be ready for your review—thanks again!
thanks @boerz-coding , i will review soon
overall, i think camel/data_collector and camel/messages/conversion can be put in data_schemas folder to make it unify control,what do you think? @boerz-coding
@zjrwtx Thank you for the thoughtful suggestion! I think it’s an interesting direction to consider. That said, I believe the original goal of issue #1316 is to centralize the data model definitions (e.g., dataclasses and schemas) into a dedicated data_schemas module, rather than to restructure existing modules like camel/data_collector or camel/messages/conversion.
So I wonder if this kind of structural move might go beyond the initial scope of the issue. Perhaps we can hear @Wendong-Fan’s thoughts, since he originally proposed this issue? I’m happy to adjust the structure if both of you feel it's a better long-term solution.
overall, i think camel/data_collector and camel/messages/conversion can be put in data_schemas folder to make it unify control,what do you think? @boerz-coding
@zjrwtx Thank you for the thoughtful suggestion! I think it’s an interesting direction to consider. That said, I believe the original goal of issue #1316 is to centralize the data model definitions (e.g., dataclasses and schemas) into a dedicated data_schemas module, rather than to restructure existing modules like camel/data_collector or camel/messages/conversion.
So I wonder if this kind of structural move might go beyond the initial scope of the issue. Perhaps we can hear @Wendong-Fan’s thoughts, since he originally proposed this issue? I’m happy to adjust the structure if both of you feel it's a better long-term solution.
Hi @Wendong-Fan could you please let us know what do you think when you have time?
@Wendong-Fan Hi Wendong, this is a gentle reminder on this issue. This PR is ready to be merged. I have already completed the original goal for this refactor, and have communicated with Yifeng on the goal of this PR. I just formally invited you as a reviewer to help you better track it. Again there is no rush, and I am happy to further edit the code if necessary.