Basic example in readme fails (Word2Vec download 404s)
Running the example in the readme
using Embeddings
const embtable = load_embeddings(Word2Vec) # or load_embeddings(FastText_Text) or ...
fails with
ERROR: HTTP.ExceptionRequest.StatusError(404, "GET", "/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz", HTTP.Messages.Response:
"""
HTTP/1.1 404 Not Found
x-amz-request-id: 7CJ4RS3EZ3VHMSR4
x-amz-id-2: JQ2JTqHhFeLJ7JtP5pJM+AzcR3Kq8kKB4Hy5Tars31NaRlk3Xo++mRiLVYHArclGUSZQm5Ztv/o=
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Thu, 27 Oct 2022 15:01:28 GMT
Server: AmazonS3
""")
Are the word2vec embeddings available elsewhere? Otherwise this should probably be addressed in the readme.
A good question, I suspect they must be available somewhere else. They are so often used, though they are old now.
I just added the weights to hugging face: https://huggingface.co/LoganKilpatrick/GoogleNews-vectors-negative300/blob/main/GoogleNews-vectors-negative300.bin.gz
Thanks for opening this issue and the replies so far. I copied the new URL and inserted at this line: https://github.com/JuliaText/Embeddings.jl/blob/306c04bead62b32873dedbc2609c74c4ca34306b/src/word2vec.jl#L17
Unfortunately, when I try load_embeddings(Word2Vec), I get the following error message.
7-Zip (a) [64] 17.04 : Copyright (c) 1999-2021 Igor Pavlov : 2017-08-28
p7zip Version 17.04 (locale=en_GB.UTF-8,Utf16=on,HugeFiles=on,64 bits,16 CPUs Intel(R) Core(TM) i7-10875H CPU @ 2.30GHz (A0652),ASM,AES-NI)
Scanning the drive for archives:
1 file, 36239 bytes (36 KiB)
Extracting archive: /home/nikos/.julia/datadeps/word2vec 300d/GoogleNews-vectors-negative300.bin.gz
ERROR: /home/nikos/.julia/datadeps/word2vec 300d/GoogleNews-vectors-negative300.bin.gz
/home/nikos/.julia/datadeps/word2vec 300d/GoogleNews-vectors-negative300.bin.gz
Open ERROR: Can not open the file as [gzip] archive
ERRORS:
Is not archive
Can't open as archive: 1
Files: 0
Size: 0
Compressed: 0
I downloaded the file manually from the new URL and this works. Once, downloaded I opened the file with Archive manager in Ubuntu and this worked too.
hmm that's weird, 7zip is normally very reliable