litgpt icon indicating copy to clipboard operation
litgpt copied to clipboard

[fix][1760] Added fix for the missing `context` key issue in dolly!

Open pytholic opened this issue 4 months ago • 1 comments

What is the type of this PR?

  • [ ] Refactor (refactored code that neither fixes a bug nor adds a feature)
  • [ ] Feature (non-breaking change which adds functionality)
  • [x] Bug fix (non-breaking change which fixes an issue)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ ] Test (including new or correcting previous tests)
  • [ ] Style (changes to code formatting and styling)
  • [ ] Optimization (performance improvements)
  • [ ] Documentation Update (updates to the documentation (i.e. README, comments, docstrings, etc.)
  • [ ] Revert (reverts a previous commit)

Motivation and Context

To fix the issue where context key error was thrown in Dolly dataloader.

Modifications

The following changes were made in this PR.

  • Update _transform in dolly.py
  • Added a test for it
  • Updated printing a bit

How Has This Been Tested?

I added the following test to simulate the original issue. The code has been tests on MacOS.

  • tests/data/test_dolly.py::test_dolly_missing_keys

Previous behavior Screenshot 2024-10-03 at 2 01 29 AM

After fixing image

Checklist:

  • [ ] My code follows the style guidelines of this project
  • [x] I have performed a self-review of my code
  • [x] I have commented on my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [x] My changes generate no new warnings
  • [x] I have added tests that prove my fix is effective or that my feature works
  • [ ] New and existing unit tests pass locally with my changes
  • [ ] Any dependent changes have been merged and published in downstream modules

Notes

  1. I noticed that some tests related to the tokenizer are failing. Note that these are not related to my updates (CC: @rasbt ). Screenshot 2024-10-03 at 1 27 58 AM

  2. I updated the printing logs a bit. The instruction was being printed twice which wasn't appealing. I have also added some newline characters. I applied it to only lora.py. If it is desirable, it can be applied to other scripts as well.

Before image

After format_new

pytholic avatar Oct 02 '24 18:10 pytholic