py4e icon indicating copy to clipboard operation
py4e copied to clipboard

Chapter 11 Regex: typo in the code example. Top-level domain validation expression

Open nikolayrantsev opened this issue 4 years ago • 0 comments

Hello, Thanks, everyone so much for this course!

  1. Found small type in the 11th chapter, section 'Extracting data using regular expressions': ... Here is our new regular expression: [a-zA-Z0-9]\S*@\S*[a-zA-Z] ... then the code block with the usage of this example: If we use this expression in our program, our data is much cleaner:
# Search for lines that have an at sign between characters
# The characters must be a letter or number
import re
hand = open('mbox-short.txt')
for line in hand:
    line = line.rstrip()
    x = re.findall('[a-zA-Z0-9]\S+@\S+[a-zA-Z]', line)
    if len(x) > 0:
        print(x)

# Code: http://www.py4e.com/code3/re07.py

please update the "+" sign in the line x = re.findall('[a-zA-Z0-9]\S+@\S+[a-zA-Z]', line) with the "*"

  1. Interesting thing here is that by running the code with the correct expression [a-zA-Z0-9]\S*@\S*[a-zA-Z], we're receiving the results including lines like: [ 'dhorwitz@david-horwitz-6:~/branchManagemnt/sakai_2-5-x']

Appreciate the explanation of how to improve the expression in order to filter out the records not matching email address criteria to have a top-level domain.

Thank you!

nikolayrantsev avatar Dec 28 '20 19:12 nikolayrantsev