InternEvo icon indicating copy to clipboard operation
InternEvo copied to clipboard

feat(dataloader): refine implementation of mocked and megatron dataloader

Open zigzagcai opened this issue 1 year ago • 0 comments

Motivation

  1. Fix CI timeout for https://github.com/InternLM/InternEvo/issues/342 (Completed)
  2. Refine implementation of megatron and mocked dataloader (Completed)

Modification

  • internlm/train/pipeline.py
  • internlm/data/*

BC-breaking (Optional)

None

Use cases (Optional)

None

Checklist

Before PR:

  • [ ] Pre-commit or other linting tools are used to fix the potential lint issues.
  • [ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
  • [ ] The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  • [ ] The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

  • [ ] If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects.
  • [ ] CLA has been signed and all committers have signed the CLA in this PR.

zigzagcai avatar Sep 24 '24 07:09 zigzagcai