FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

Split conversations with IJSON.

Open itsPat opened this issue 2 years ago • 0 comments

More of a suggestion than an issue, but here's a gist describing what I did with the split_conversations.py script.

https://gist.github.com/itsPat/a1bb06f1dfbd9d4b07288df3bb18b802

Improved it by making use of ijson, so that the input data is processed line by line and then immediately written to the output file instead of storing all the objects in memory.

This will help those with low memory work on large datasets and makes it so the memory required does not scale with the size of the input dataset.

itsPat avatar Jun 08 '23 22:06 itsPat