data-science-from-scratch icon indicating copy to clipboard operation
data-science-from-scratch copied to clipboard

Crashcourse to Python - defaultdict

Open felix4webscience opened this issue 5 years ago • 1 comments

Hi,

I got a problem in Chapter 2 (German version) with example about "Defaultdict" and also "Counter".

Whats seems to be left out here is, how the value "document" has been defined.

Code: _``` from collections import defaultdict word_counts = {} for word in document: if word in word_counts: word_counts[word] += 1 else: word_count[word] = 1


I always get an error message like this: 
"name 'document' is not defined"

as a matter of this error the following code examples are all not working as well:
- Counter - because word_counts can not work without the code before
- Sets - same here...

For subsection  Bolean, the code does not work neither:

s = some_function_that_returns_a_string()
if s:
    first_char = s[0]
else:
    first_char = ""

the following error appears:
NameError: name 'some_function_that_returns_a_string' is not defined

and the same holds for:
first_char = s and s[0]

Any help or hint, would be great.

felix4webscience avatar Oct 13 '18 12:10 felix4webscience

@felix4webscience document is a variable that is user-defined actually.

This might help to understand it better

def get_document():
    url = "http://radar.oreilly.com/2010/06/what-is-data-science.html"
    html = requests.get(url).text
    soup = BeautifulSoup(html, 'html5lib')

    content = soup.find("div", "article-body")  # find article-body div
    regex = r"[\w']+|[\.]"  # matches a word or a period

    document = []

    for paragraph in content("p"):
        words = re.findall(regex, fix_unicode(paragraph.text))
        document.extend(words)

    return document

Use it like this document = get_document()

Then run your code on the document variable.

Hope this helps.

devAmoghS avatar Nov 29 '18 04:11 devAmoghS