cmc-csci143
cmc-csci143 copied to clipboard
Question about Part 4 in lab-posix-mapreduce
I noticed that in Part 4 of the lab-posix-mapreduce lab, the code we use to find all of the geolocated tweets sent on January 1st 2020 is:
unzip -p /data/Twitter\ dataset/geoTwitter20-01-01.zip
In the pipe lab, the code we use to run similar queries is:
unzip -p '/data/Twitter dataset/geoTwitter20-01-01.zip'
Why do we use the single quotation marks around the /data/Twitter dataset/geoTwitter20-01-01.zip query in the pipe lab, but not in the mapreduce lab? I tried to generate the top10.png plot with quotes around the query, but got the following error message:
"boxplot.gp", line 5: warning: Skipping data file with no valid points
plot '/dev/stdin' using 1:xtic(2) notitle
"boxplot.gp", line 5: x range is invalid
Not sure if this is a related error, or not. Would appreciate some clarification on the role of the single quotes here. Thanks!
Fantastic question. There is no difference in the shell between using quotes or the backslash. Both of the following get interpreted exactly the same.
/data/Twitter\ dataset/geoTwitter20-01-01.zip
and
'/data/Twitter dataset/geoTwitter20-01-01.zip'
The error message you are getting is unrelated. In general, anytime you get an error inside of a pipeline, you can debug it by removing the last command from the pipeline and checking what it's input is. You don't say what the command is that you ran, but I'm guessing it was something like
$ cat reduce | gnuplot -c ...
If that's the case, remove the | gnuplot ... and run only the cat reduce. You will probably see that the reduce file is empty. This is most likely because you ran the reduce step before your map step finished completely. Recall that whenever you run a program with the & at the end of the command, the command will happen asynchronously. So it is running in the background. You can use the ps command to check if the program has completed; then once it has completed, you can run the reduce step.