replacing-bash-scripting-with-python icon indicating copy to clipboard operation
replacing-bash-scripting-with-python copied to clipboard

file iteration

Open kylerlaird opened this issue 5 years ago • 4 comments
trafficstars

I'm enjoying your document! I've been moving away from Bash to Python scripting for years. I'm always looking for better ways to do it.

I'm curious about the section on iterating through lines of a file. You provide some relatively complicated ways to do it using "with", readline(), etc. Why not simply "for line in open(file):"?

kylerlaird avatar Jun 26 '20 15:06 kylerlaird

Nice work @kylerlaird that you are moving away from Bash. I am copying from this Python doc and please read the details there.

"It is good practice to use the with keyword when dealing with file objects. The advantage is that the file is properly closed after its suite finishes, even if an exception is raised at some point. Using with is also much shorter than writing equivalent try-finally blocks..."

ghost avatar Jun 26 '20 19:06 ghost

The use of with is superfluous in this case. The use of "my_file" is pointless.

These should be equivalent.

with open('my_file.txt') as my_file:
    for line in my_file:
        do_stuff_with(line.rstrip())
for line in open('my_file.txt'):
    do_stuff_with(line.rstrip())

kylerlaird avatar Jun 26 '20 21:06 kylerlaird

I agree that with is a good practice and a good tool to have (when needed for an object which is used multiple times). The pattern of "loop through all of the lines in a file" is so common, though, that it's a shame to add extra crud to what should be simple in Python.

kylerlaird avatar Jun 26 '20 21:06 kylerlaird

It depends what you're doing. If you're simply looping over a single file, that's fine. However, if you're looping over multiple files, depending what happens with the garbage collector, you could run out of file descriptors from the OS. This would rarely happen in practice in CPython because it uses reference counting, but it's difficult to guarantee that it wouldn't happen in Pypy, which uses tracing GC.

Additionally, while the current implementation of io objects does clean up file descriptors when the object is destructed, it's not part of the documented semantics and shouldn't be relied upon.

You're right that with is frequently unnecessarily, and I don't always use it in my personal scripts. However, the whole point of this tutorial is help people write scripts that are safe and maintainable by replacing Bash with Python. Running out of file descriptors on a corner case is exactly the kind of nonsense I'm trying to help people avoid. I'm not going to skip on a best-practice to avoid one extra line of code---especially if I'm teaching others.

ninjaaron avatar Jun 27 '20 09:06 ninjaaron