spacy-course
spacy-course copied to clipboard
Advanced Spacy Course: Bowie is a word
How to reproduce the behaviour
Your Environment
- Operating System:
- Python Version Used:
- spaCy Version Used:
- Environment Information:
It can be reproduced on any platform. The problem is that "Bowie" is a valid English word, and also a hash can't be used in a different nlp object with a different vocabulary.
>>> # Create an English and German nlp object
>>> nlp = spacy.blank("en")
>>> nlp_de = spacy.blank("de")
>>>
>>> # Get the ID for the string 'Bowie'
>>> bowie_id = nlp.vocab.strings["Bowie"]
>>> print(bowie_id)
2644858412616767388
>>>
>>> # Look up the ID for "Bowie" in the vocab
>>> print(nlp_de.vocab.strings[bowie_id])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "spacy/strings.pyx", line 132, in spacy.strings.StringStore.__getitem__
KeyError: "[E018] Can't retrieve string for hash '2644858412616767388'. This usually refers to an issue with the `Vocab` or `StringStore`."
>>>
Sorry, I'm not sure exactly what you're asking? Looking at this example (in chapter 2, section 3 under "Vocab, hashes and lexemes") this example code is supposed to throw an error.
Yes, but the documentation indicates the error should be on this line. bowie_id = nlp.vocab.strings["Bowie"]
Quote: Answer explosion/spaCy#2: "Bowie" is not a regular word in the English or German dictionary, so it can’t be hashed. But "Bowie" is a regular word in English and can be hashed.
Answer explosion/spaCy#2 is the listed as the correct answer, yet answer explosion/spaCy#3 is the correct answer. nlp_de is not a valid name. The vocab can only be shared if the nlp objects have the same name. The hash for "Bowie" cannot be used in the german vocabulary object.
On Thu, Mar 31, 2022 at 12:29 PM Adriane Boyd @.***> wrote:
Sorry, I'm not sure exactly what you're asking? Looking at this example (in chapter 2, section 3 under "Vocab, hashes and lexemes") this example code is supposed to throw an error.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
-- David Johnson www.breadstand.com
Maybe we're not looking at the same version of the course? Here's the current version:
https://course.spacy.io/en/chapter2
The correct answer is: "The string "Bowie" isn’t in the German vocab, so the hash can’t be resolved in the string store."
Where does the documentation indicate that this particular line should throw an error?
You can always hash any string with nlp.vocab.strings["Bowie"] and you'll get the same hash for "Bowie" from every spacy pipeline, so you'll also get the same value from nlp_de.vocab.strings["Bowie"].
This question is about the other direction: you can only convert hash -> string for strings that have already been added to that pipeline's string store. Usually this happens automatically when you process texts, but you can also add any string explicitly with nlp.vocab.strings.add("Bowie").