python3.11 -m unidic download fails due to 403 error from Github
$ python3.11 -m unidic download
✘ Server error (403)
Couldn't fetch dictionary info. If this error persists please open an issue.
curl https://raw.githubusercontent.com/polm/unidic-py/master/dicts.json can get the response correctly.
Changed the header of request and it worked, so maybe github is blocking request by header
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
}
requests.get(
"https://raw.githubusercontent.com/polm/unidic-py/master/dicts.json",
headers=headers
)
We may need to update the request header to avoid being blocked by github. Thanks.
same issue
I can't reproduce this issue, things work as usual for me. Is this error persistent for you, or is it intermittent?
If there has been a change in Github policy that caused this, I need to find alternatives for hosting the file - spoofing the user agent for Github is not a solution.
Hi, Polum. Thanks for looking into this. In the aws ec2 instance, it is persistent error. In my local, it works fine so far.
So I doubt github may have new policy to detect the bot/crawler from aws ec2 instance. Yeah, it would be great to have an alternatives for hosting the file. Thanks.
Thank you for the extra information that this is on an EC2 instance, that makes sense. I can definitely add a parameter to specify a local file or separate URL.
It might take me a little while to implement this, but I would also be happy to accept a PR.
As a short-term workaround, besides changing your headers, you can change the download URL in the source in your local installation, or rewrite the function where it's used.
i decided to detour and download directly
version and url are from https://raw.githubusercontent.com/polm/unidic-py/master/dicts.json stated in the code as polm mentinoed
python -c "import urllib.request; import unidic; import os; from unidic.download import download_and_clean; opener = urllib.request.build_opener(); opener.addheaders = [('User-Agent', 'Mozilla/5.0')]; urllib.request.install_opener(opener); download_and_clean('3.1.0+2021-08-31', 'https://cotonoha-dic.s3-ap-northeast-1.amazonaws.com/unidic-3.1.0.zip')"
this is impossible to download at least from my location (Turkey) the speed is below 5kb/s yes kilobytes. I found one from some huggingface repo and tried installing it but in the end it didn't work, it seems the sizes were different. I am trying to use it for xtts which downloading this file is the starting point of its apps' runtime.
@patientx The issue here is about a 403 from Github, not about the download speed of the files from AWS. It sounds like you have a separate problem.
Additionally, it looks like xtts is by Coqui. They have a habit of specifying unidic as required even if you aren't using Japanese, which is wrong. Are you sure you even need to install this?
@patientx The issue here is about a 403 from Github, not about the download speed of the files from AWS. It sounds like you have a separate problem.
Additionally, it looks like xtts is by Coqui. They have a habit of specifying unidic as required even if you aren't using Japanese, which is wrong. Are you sure you even need to install this?
I had to use an american vps then download from the japanese aws from inside that, even than the speed was 50-100 kb max. Yes, they specify a unidic also a certain version it seems. Anyway it is in my possession and was able to install and use it. But later I found another fork which circumvents the use of unidic alltogether :)
Glad to hear you were able to work around it.
If you didn't need Unidic but were required to download it, I would encourage you to open an issue at the parent reposiory explaining requiring a library you don't need is inappropriate.