guesslang
guesslang copied to clipboard
How to add new language
Hello, if I want to do migration training based on yours, can I use the trained model? I tried to load the trained model but no effect, I hope to get your reply
when i download dataset from guesslangtool, many repo is not exist, and github server reject my request.
Hello @yjmm10
I tried to load the trained model but no effect, I hope to get your reply
I think that the current model doesn't suit transfer learning very well. The list of supported languages is embedded in the model graph itself. Mean that you'll have to hack the graph somehow to add new languages info. That might change in future versions but currently there are few blockers (I can go more in details if required).
Today the only recommended way to add new languages is to build a dataset including the new languages with guesslangtools.
when i download dataset from guesslangtool, many repo is not exist,
Yes that's expected, the Github public repository list that I use was last updated on January 2020 https://zenodo.org/record/3626071/ You can safely ignore this warning.
github server reject my request
Strange... Guesslangtools main workflow only rely on git
commands because, as far as I know, they are not (yet) restricted by Github servers. Github website & API are heavily restricted though.
Can you share the errors that you're getting?
github server reject my request
Strange... Guesslangtools main workflow only rely on
git
commands because, as far as I know, they are not (yet) restricted by Github servers. Github website & API are heavily restricted though.Can you share the errors that you're getting? The above exception was the direct cause of the following exception:
Thank you for your reply. This is Error message, when download the zip file, it often happen。
Traceback (most recent call last):
File "I:/Private/guesslangtools/guesslangtools/__main__.py", line 104, in <module>
main()
File "I:/Private/guesslangtools/guesslangtools/__main__.py", line 89, in main
run_workflow()
File "I:\Private\guesslangtools\guesslangtools\app.py", line 14, in run_workflow
compressed_repositories.download()
File "I:\Private\guesslangtools\guesslangtools\common.py", line 112, in wrapped
result = func(*args, **kw)
File "I:\Private\guesslangtools\guesslangtools\workflow\compressed_repositories.py", line 100, in download
for step, row in enumerate(pool_imap(_download_repository, rows), 1):
File "I:\Private\guesslangtools\guesslangtools\common.py", line 213, in pool_imap
for result in pool.imap(_apply, iterable):
File "D:\.conda\envs\base\lib\multiprocessing\pool.py", line 868, in next
raise value
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host.', None, 10054, None))
Process finished with exit code 1
or
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "I:/Private/guesslangtools/guesslangtools/__main__.py", line 104, in <module>
main()
File "I:/Private/guesslangtools/guesslangtools/__main__.py", line 89, in main
run_workflow()
File "I:\Private\guesslangtools\guesslangtools\app.py", line 14, in run_workflow
compressed_repositories.download()
File "I:\Private\guesslangtools\guesslangtools\common.py", line 112, in wrapped
result = func(*args, **kw)
File "I:\Private\guesslangtools\guesslangtools\workflow\compressed_repositories.py", line 97, in download
for step, row in enumerate(pool_imap(_download_repository, rows), 1):
File "I:\Private\guesslangtools\guesslangtools\common.py", line 213, in pool_imap
for result in pool.imap(_apply, iterable):
File "D:\.conda\envs\base\lib\multiprocessing\pool.py", line 868, in next
raise value
ValueError: check_hostname requires server_hostname
Process finished with exit code 1
Okay @yjmm10, it looks like you are using an older version of guesslangtools (version < 1.0). Older version of gueslangtools was downloading the repositories directly from Github HTTP servers. And due to Github HTTP servers restrictions (like the ones that you are experiencing) I switched to using Git command instead.
You can install guesslangtools latest version with the following commands
# Clone the latest version of the code
git clone https://github.com/yoeo/guesslangtools.git
cd guesslangtools
# Edit the language description file to add the new languages information
vi data/languages.yaml
# Install the new Guesslangtools on your system
pip install -Ue .
After installing guesslangtools you can run it to generate the dataset:
# You can change the --nb-xxx parameters to have more or less examples in your dataset
gltool /path/to/new/dataset
It will take hours, and when it is done, you can train Guesslang:
# Clone Guesslang
git clone https://github.com/yoeo/guesslang.git
cd guesslang
# Install Guesslang in "developper mode"
pip install -Ue .
# Copy the language mapping generated in the dataset (`languages.json`) into Guesslang repository
cp /path/to/new/dataset/languages.json ./data/languages.json
# Run the training
guesslang --train /path/to/new/dataset/files --steps 10000 --model /path/to/new/model
I'm using Linux command line syntax here, and I hope that it won't be hard to convert them into Window shell commands.