camel icon indicating copy to clipboard operation
camel copied to clipboard

refactor: Refactor data models

Open boerz-coding opened this issue 10 months ago • 12 comments

Description

The is a PR on issue #1316. I refactored sharegpt and tool calling related data models, and put them into camel/data_schemas/data_schemas.py Here are the major changes:

  • Refactored ShareGPTMessage, ShareGPTData, ToolCall, and ToolResponse into data_schema/data_schema.py.
  • Removed ConversationItem and ShareGPTData from camel/data_collector/sharegpt_collector.py.
  • Removed the entire camel/messages/conversion/conversation_models.py file.
  • Modified multiple corresponding __init__.py files.
  • Modified the corresnponding examples for sharegpt_collector and conversion between Camel Message and ShareGPT

Note: There is a cookbook "Real Function Calls and Hermes Format Data Generation" affected by this refactor. I hope to also update this cookbook if this PR is merged.

Checklist

Go over all the following points, and put an x in all the boxes that apply.

  • [x] I have read the CONTRIBUTION guide (required)
  • [x] I have linked this PR to an issue using the Development section on the right sidebar or by adding Fixes #issue-number in the PR description (required)
  • [ ] I have checked if any dependencies need to be added or updated in pyproject.toml and poetry.lock
  • [ ] I have updated the tests accordingly (required for a bug fix or a new feature)
  • [ ] I have updated the documentation if needed:
  • [ ] I have added examples if this is a new feature

If you are unsure about any of these, don't hesitate to ask. We are here to help!

boerz-coding avatar Feb 24 '25 05:02 boerz-coding

Maybe we can also move the messages/conversion/alpaca.py to the new data_schema folder as well?

MuggleJinx avatar Feb 25 '25 14:02 MuggleJinx

Thanks @boerz-coding for the nice contribution, overall looks good to me! I left some small comments, waiting for my colleagues to confirm their thoughts.

MuggleJinx avatar Feb 25 '25 14:02 MuggleJinx

Maybe we can also move the messages/conversion/alpaca.py to the new data_schema folder as well?

Yes I should have done this. I just realized I misunderstood the scope of issue #1316 at the very beginning. The alpaca data structures should definitely be unified. I will continue to work on this comment and let the reviewers know when it is done. Thank you for this comment!

Edit on this one: I read the issue #1316 again, and find that we did not mention this messages/conversion/alpaca.py in the original issue. I still agree with Xiaotian @MuggleJinx and think I should move messages/conversion/alpaca.py to the new data_schema folder (I actually plan to directly move everything in this file into the data_schemas.py file), but it will be helpful if @zjrwtx @Wendong-Fan or other reviewers would like to double check on this.

Thanks @boerz-coding for the nice contribution, overall looks good to me! I left some small comments, waiting for my colleagues to confirm their thoughts.

Thank you again for the helpful comments!

boerz-coding avatar Feb 25 '25 18:02 boerz-coding

@MuggleJinx @zjrwtx I have modified the code according to Xiaotian's suggestions, and now it is ready for review again!

boerz-coding avatar Apr 01 '25 16:04 boerz-coding

@zjrwtx @Wendong-Fan Could you please review my PR when you have time? I have modified the code according to Xiaotian's suggestions, and now it is ready for review again! There is no rush. Just wanted to make sure you have seen my message.

boerz-coding avatar Apr 08 '25 18:04 boerz-coding

@zjrwtx Hi Yifeng, could you please review my PR when you have time? There is no rush but I want to make sure you have seen this.

boerz-coding avatar Apr 11 '25 02:04 boerz-coding

@zjrwtx Hi Yifeng, could you please review my PR when you have time? There is no rush but I want to make sure you have seen this.

boerz-coding avatar Apr 12 '25 16:04 boerz-coding

@zjrwtx Hi Yifeng, could you please review my PR when you have time? There is no rush but I want to make sure you have seen this.

hey @boerz-coding ,sorry for my late review, thanks for your great contribution. can we resolve the conflicts first?

zjrwtx avatar Apr 13 '25 01:04 zjrwtx

Hi Yifeng@zjrwtx, no worries at all—I'm grateful to have your feedback!

Apologies for missing the conflict earlier. The main repo has changed a bit since my initial PR in February, but I've now resolved the conflicts. The PR should be ready for your review—thanks again!

boerz-coding avatar Apr 13 '25 02:04 boerz-coding

thanks @boerz-coding , i will review soon

zjrwtx avatar Apr 13 '25 03:04 zjrwtx

overall, i think camel/data_collector and camel/messages/conversion can be put in data_schemas folder to make it unify control,what do you think? @boerz-coding

@zjrwtx Thank you for the thoughtful suggestion! I think it’s an interesting direction to consider. That said, I believe the original goal of issue #1316 is to centralize the data model definitions (e.g., dataclasses and schemas) into a dedicated data_schemas module, rather than to restructure existing modules like camel/data_collector or camel/messages/conversion.

So I wonder if this kind of structural move might go beyond the initial scope of the issue. Perhaps we can hear @Wendong-Fan’s thoughts, since he originally proposed this issue? I’m happy to adjust the structure if both of you feel it's a better long-term solution.

boerz-coding avatar Apr 13 '25 05:04 boerz-coding

overall, i think camel/data_collector and camel/messages/conversion can be put in data_schemas folder to make it unify control,what do you think? @boerz-coding

@zjrwtx Thank you for the thoughtful suggestion! I think it’s an interesting direction to consider. That said, I believe the original goal of issue #1316 is to centralize the data model definitions (e.g., dataclasses and schemas) into a dedicated data_schemas module, rather than to restructure existing modules like camel/data_collector or camel/messages/conversion.

So I wonder if this kind of structural move might go beyond the initial scope of the issue. Perhaps we can hear @Wendong-Fan’s thoughts, since he originally proposed this issue? I’m happy to adjust the structure if both of you feel it's a better long-term solution.

Hi @Wendong-Fan could you please let us know what do you think when you have time?

boerz-coding avatar Apr 14 '25 17:04 boerz-coding

@Wendong-Fan Hi Wendong, this is a gentle reminder on this issue. This PR is ready to be merged. I have already completed the original goal for this refactor, and have communicated with Yifeng on the goal of this PR. I just formally invited you as a reviewer to help you better track it. Again there is no rush, and I am happy to further edit the code if necessary.

boerz-coding avatar May 11 '25 18:05 boerz-coding