cmc-csci143 icon indicating copy to clipboard operation
cmc-csci143 copied to clipboard

Strange File Missing Error & Reset Commands Not Working

Open RowanGray472 opened this issue 7 months ago • 2 comments

I'm having two issues and I think they could be related.

First, when I'm trying to load the data I'm getting this error from the load_denormalized part.

$ sh load_tweets_parallel.sh 
================================================================================
load pg_denormalized
================================================================================
psql: error: connection to server at "localhost" (127.0.0.1), port 4720 failed: FATAL:  "base/13468" is not a valid data directory

Claude says the possible causes of this error are that

  • The data directory is corrupted or incomplete
  • You're pointing to the wrong location for your data directory
  • Permissions issues preventing access to the data files
  • Incomplete database initialization

And all of these things seem like they're not things I could have caused. I haven't run any non-Kosher commands, like rm $HOME/bigdata, and I'm using files that all worked on the last assignment. And all I've changed in the docker-compose file was the port numbers.

And second, when I try to run the commands provided to remove some of the data, I get this error when the containers are up.

$ docker compose exec pg_normalized_batch bash -c 'rm -rf $PGDATA'
rm: cannot remove '/var/lib/postgresql/data': Device or resource busy

I tried pulling them down before running the commands, then I get this error.

$ docker compose exec pg_normalized_batch bash -c 'rm -rf $PGDATA'
service "pg_normalized_batch" is not running

My only guess is that I've somehow accidentally deleted the symlink, but honestly I'm not confident in that hypothesis. Any ideas for what's gone wrong here?

RowanGray472 avatar Apr 26 '25 00:04 RowanGray472

Your docker volume has gotten into a corrupted state that is preventing you from even running the reset command. (This has happened to a few other students as well.)

The solution is to run the following:

$ docker-compose run pg_normalized_batch bash -c 'rm -rf $PGDATA/*'

(And the equivalent for the denormalized db.)

The command above has two differences from the command that was in the README:

  1. I have changed $PGDATA to PGDATA/*. This means that instead of deleting the directory, the command will only delete the contents of the directory. Several students have had problems where the OS didn't allow the deletion of the directory and this led to a corrupted db. (I'm guessing that the first time you ran the deletion command, you received an error about not being able to delete because of an OS-level lock on a file.)
  2. I have replaced exec with run. (The shorthand for this is s/exec/run since this is the vim find/replace syntax, and all the cool kids normally just write the shortened version without explanation.) Recall that exec runs the command in a running container, and run will work even if the container is not running (by bringing up the container if needed). This change allows the rm command to run even if the container cannot be brought up normally because the db has been corrupted.

I have updated the README to contain this new command.

mikeizbicki avatar Apr 27 '25 05:04 mikeizbicki

If the new command does not work for some reason, an alternative solution is to simply change the path in of the bind mount in docker-compose.yml from $HOME/bigdata/pg_normalized_batch to $HOME/bigdata/pg_normalized_batch2.

Since this is a different folder, it will be empty when postgres first starts up, and it will create a new db from scratch even if the old one is corrupted.

mikeizbicki avatar Apr 27 '25 05:04 mikeizbicki