FlagEmbedding icon indicating copy to clipboard operation
FlagEmbedding copied to clipboard

Dataset used in LongLLM_QLORA

Open Mingyi-Hong opened this issue 1 year ago • 1 comments

Hi:

I wonder what's the base/source dataset used to create the following dataset

  • bio_book
  • one_details_book
  • multi_details_book
  • multi_details_paper_long
  • one_detail_paper_long

Thanks!

Best,

Mingyi

Mingyi-Hong avatar Sep 17 '24 20:09 Mingyi-Hong

Hi,

  • All books are from books3 subset of the Pile.
  • All papers are from arxiv subset of the Pile.

namespace-Pt avatar Sep 25 '24 07:09 namespace-Pt