cmc-csci143 icon indicating copy to clipboard operation
cmc-csci143 copied to clipboard

Incorrect file sizes

Open nati-azmera opened this issue 1 year ago • 1 comments

Hello,

After running nohup sh load_tweets_parallel.sh for about 3 hours, I get weird file sizes. I have also fixed the schema issues

docker-compose exec pg_denormalized sh -c 'du -hd0 $PGDATA 49G

docker-compose exec pg_normalized_batch sh -c 'du -hd0 $PGDATA 49G

Does anyone know why or how I could fix it?

nati-azmera avatar Apr 26 '24 04:04 nati-azmera

It looks like you have probably inserted data twice into the pg_denormalized database, and your insert into pg_normalized_batch was interrupted for some reason.

The most correct thing to do is to restart from scratch: delete you existing database, and reinsert the data. As this will take a long time, however, I will waive for you the requirement that the pg_normalized_batch test cases pass, so you can begin working on the CREATE INDEX commands. I can't do that for pg_denormalized, however, because you'll need the test cases to know if the SQL SELECT statements you've written are correct. For that database, you will have to delete everything and start over.

mikeizbicki avatar Apr 26 '24 15:04 mikeizbicki