llm-foundry icon indicating copy to clipboard operation
llm-foundry copied to clipboard

Data validation notebook

Open XiaohanZhangCMU opened this issue 1 year ago • 0 comments

  • add notebook/data_validation_notebook which runs data preparation and token counting from byod/data_validation branch. Merged to main to keep underlying functions up-to-date.
  • add utils functions used by notebook/data_validation_notebook
  • shuffle functions in convert_text_to_mds to data prep utils with minor modifications

XiaohanZhangCMU avatar Mar 14 '24 05:03 XiaohanZhangCMU