FastChat
FastChat copied to clipboard
Split conversations with IJSON.
More of a suggestion than an issue, but here's a gist describing what I did with the split_conversations.py script.
https://gist.github.com/itsPat/a1bb06f1dfbd9d4b07288df3bb18b802
Improved it by making use of ijson, so that the input data is processed line by line and then immediately written to the output file instead of storing all the objects in memory.
This will help those with low memory work on large datasets and makes it so the memory required does not scale with the size of the input dataset.