hangouts-reader
hangouts-reader copied to clipboard
Doesn't work for large files
Hey, I tried processing a quite large file (476MB) and get an allocation size overflow.
Having the same issue in FF 52.0.1 with a 314.8 MB .json file
500MB file dying here as well.
Maybe someone submitting a PR with an implementation of something like this? I'm not sure if it is similar but I really only wanted a single conversation so I wrote a Ruby script to split my JSON file into many by ID. It messily made ~50 files and you'll need to search the original file to figure out which ID you need. You can also one by one add them to Hangouts Reader them since it doesn't destroy previous parses.
require 'json'
def getJSON(file)
if File.readable?(file)
$data = JSON.parse(IO.read(file))["conversation_state"]
end
end
getJSON("Hangouts.json")
puts "Parsed File"
$restructured = {}
$data.each do |i|
id = i["conversation_id"]["id"]
puts "Found conversation in #{id}"
if !$restructured[id]
$restructured[id] = {"conversation_state" => []}
end
$restructured[id]["conversation_state"] << i
end
puts "Finished sorting"
$restructured.each do |key, value|
puts "Generating #{key}"
output = File.new("#{key}.json", "w+")
output.write(JSON.generate(value))
end
puts "Done"
I imported the 10mb file just fine, but results may vary if you have a really long chat you'd like to import.
I couldn't quite figure out the Ruby program, so I write my own with Node and a ton of un-needed dependancies that I used for laziness:
const jsonn = require("jsonstream"),
fs = require("fs"),
util = require("util"),
fs_writeFile = util.promisify(fs.writeFile),
rxjs = require("rxjs"),
{ debounceTime } = require("rxjs/operators");
const dataa = {};
const sub = new rxjs.Subject();
sub.pipe(debounceTime(1000)).subscribe(data => {
console.log(`${data.new ? "new" : "old"}Id, ${data.id}`);
});
fs.createReadStream("./Hangouts.json")
.pipe(jsonn.parse("conversations.*"))
.on("data", data => {
const id =
data["conversation"] &&
data["conversation"]["conversation_id"] &&
data["conversation"]["conversation_id"]["id"];
if (id) {
if (dataa[id]) {
sub.next({ new: false, id });
dataa[id].conversations.push(data);
} else {
sub.next({ new: true, id });
dataa[id] = {
conversations: [data]
};
}
}
})
.on("end", () => {
Object.keys(dataa).reduce(async (prev, key) => {
await prev;
console.log('writing', key);
await fs_writeFile(`${key}.json`, JSON.stringify(dataa[key]));
delete dataa[key];
}, Promise.resolve());
});
Tested with 700MB files