hangouts-reader icon indicating copy to clipboard operation
hangouts-reader copied to clipboard

Doesn't work for large files

Open saschalalala opened this issue 10 years ago • 4 comments

Hey, I tried processing a quite large file (476MB) and get an allocation size overflow.

saschalalala avatar Nov 01 '15 21:11 saschalalala

Having the same issue in FF 52.0.1 with a 314.8 MB .json file

evitolins avatar Mar 26 '17 23:03 evitolins

500MB file dying here as well.

faddat avatar Apr 29 '17 08:04 faddat

Maybe someone submitting a PR with an implementation of something like this? I'm not sure if it is similar but I really only wanted a single conversation so I wrote a Ruby script to split my JSON file into many by ID. It messily made ~50 files and you'll need to search the original file to figure out which ID you need. You can also one by one add them to Hangouts Reader them since it doesn't destroy previous parses.

require 'json'

def getJSON(file)
	if File.readable?(file)
		$data = JSON.parse(IO.read(file))["conversation_state"]
	end
end


getJSON("Hangouts.json")
puts "Parsed File"

$restructured = {}
$data.each do |i|
	id = i["conversation_id"]["id"]
	puts "Found conversation in #{id}"
	if !$restructured[id]
		$restructured[id] = {"conversation_state" => []}
	end
	$restructured[id]["conversation_state"] << i
end
puts "Finished sorting"

$restructured.each do |key, value|
	puts "Generating #{key}"
	output = File.new("#{key}.json", "w+")
	output.write(JSON.generate(value))
end
puts "Done"

I imported the 10mb file just fine, but results may vary if you have a really long chat you'd like to import.

danielhickman avatar Jun 30 '17 02:06 danielhickman

I couldn't quite figure out the Ruby program, so I write my own with Node and a ton of un-needed dependancies that I used for laziness:

const jsonn = require("jsonstream"),
  fs = require("fs"),
  util = require("util"),
  fs_writeFile = util.promisify(fs.writeFile),
  rxjs = require("rxjs"),
  { debounceTime } = require("rxjs/operators");

const dataa = {};

const sub = new rxjs.Subject();

sub.pipe(debounceTime(1000)).subscribe(data => {
  console.log(`${data.new ? "new" : "old"}Id, ${data.id}`);
});

fs.createReadStream("./Hangouts.json")
  .pipe(jsonn.parse("conversations.*"))
  .on("data", data => {
    const id =
      data["conversation"] &&
      data["conversation"]["conversation_id"] &&
      data["conversation"]["conversation_id"]["id"];

    if (id) {
      if (dataa[id]) {
        sub.next({ new: false, id });
        dataa[id].conversations.push(data);
      } else {
        sub.next({ new: true, id });
        dataa[id] = {
          conversations: [data]
        };
      }
    }
  })
  .on("end", () => {
    Object.keys(dataa).reduce(async (prev, key) => {
        await prev;
        console.log('writing', key);
      await fs_writeFile(`${key}.json`, JSON.stringify(dataa[key]));
      delete dataa[key];
    }, Promise.resolve());
  });

Tested with 700MB files

crutchcorn avatar Nov 26 '18 09:11 crutchcorn