data-science-from-scratch
data-science-from-scratch copied to clipboard
Crashcourse to Python - defaultdict
Hi,
I got a problem in Chapter 2 (German version) with example about "Defaultdict" and also "Counter".
Whats seems to be left out here is, how the value "document" has been defined.
Code: _``` from collections import defaultdict word_counts = {} for word in document: if word in word_counts: word_counts[word] += 1 else: word_count[word] = 1
I always get an error message like this:
"name 'document' is not defined"
as a matter of this error the following code examples are all not working as well:
- Counter - because word_counts can not work without the code before
- Sets - same here...
For subsection Bolean, the code does not work neither:
s = some_function_that_returns_a_string()
if s:
first_char = s[0]
else:
first_char = ""
the following error appears:
NameError: name 'some_function_that_returns_a_string' is not defined
and the same holds for:
first_char = s and s[0]
Any help or hint, would be great.
@felix4webscience document is a variable that is user-defined actually.
This might help to understand it better
def get_document():
url = "http://radar.oreilly.com/2010/06/what-is-data-science.html"
html = requests.get(url).text
soup = BeautifulSoup(html, 'html5lib')
content = soup.find("div", "article-body") # find article-body div
regex = r"[\w']+|[\.]" # matches a word or a period
document = []
for paragraph in content("p"):
words = re.findall(regex, fix_unicode(paragraph.text))
document.extend(words)
return document
Use it like this
document = get_document()
Then run your code on the document variable.
Hope this helps.