py4e
py4e copied to clipboard
Chapter 11 Regex: typo in the code example. Top-level domain validation expression
Hello, Thanks, everyone so much for this course!
- Found small type in the 11th chapter, section 'Extracting data using regular expressions':
...
Here is our new regular expression:
[a-zA-Z0-9]\S*@\S*[a-zA-Z]
... then the code block with the usage of this example: If we use this expression in our program, our data is much cleaner:
# Search for lines that have an at sign between characters
# The characters must be a letter or number
import re
hand = open('mbox-short.txt')
for line in hand:
line = line.rstrip()
x = re.findall('[a-zA-Z0-9]\S+@\S+[a-zA-Z]', line)
if len(x) > 0:
print(x)
# Code: http://www.py4e.com/code3/re07.py
please update the "+" sign in the line x = re.findall('[a-zA-Z0-9]\S+@\S+[a-zA-Z]', line)
with the "*"
- Interesting thing here is that by running the code with the correct expression
[a-zA-Z0-9]\S*@\S*[a-zA-Z]
, we're receiving the results including lines like: [ 'dhorwitz@david-horwitz-6:~/branchManagemnt/sakai_2-5-x']
Appreciate the explanation of how to improve the expression in order to filter out the records not matching email address criteria to have a top-level domain.
Thank you!