RedPajama-Data icon indicating copy to clipboard operation
RedPajama-Data copied to clipboard

How do i prepare the data for Visualisation?

Open dittops opened this issue 2 years ago • 2 comments

I'm trying to use the data visualization using meerkat. The viz/main.py visualization is a sample data of Git Hub. Is there a script with which I can expand to other datasets?

dittops avatar Jun 28 '23 18:06 dittops

I think you can follow the steps described in the viz Readme https://github.com/togethercomputer/RedPajama-Data/tree/main/viz#reproducing-the-data-processing-pipeline to expand to the full github or also other slices of the dataset?

mauriceweber avatar Jul 04 '23 07:07 mauriceweber

I have tried following the steps. It's creating the pca32.faisis file. But as per the main.py file, it's loading the data from "https://huggingface.co/datasets/meerkat-ml/lemma/resolve/main/filtered_08cdfa755e6d4d89b673d5bd1acee5f6.mk.tar.gz". How do I create that data which can be passed?

dittops avatar Jul 14 '23 14:07 dittops