xfspell icon indicating copy to clipboard operation
xfspell copied to clipboard

Training own model

Open jbingel opened this issue 4 years ago • 5 comments

Hi, super interesting work, and thanks for sharing!

I'm wondering if you could include a training script in the repository that would allow one to train one's own model. :) Additionally, could you say something about how long training took (and on what hardware).

jbingel avatar Jun 18 '20 11:06 jbingel

+following

sainimohit23 avatar Jun 20 '20 10:06 sainimohit23

Hi, super interesting work, and thanks for sharing!

I'm wondering if you could include a training script in the repository that would allow one to train one's own model. :) Additionally, could you say something about how long training took (and on what hardware).

+1

dipansh-girdhar avatar Jun 22 '20 12:06 dipansh-girdhar

Hi, super interesting work, and thanks for sharing!

I'm wondering if you could include a training script in the repository that would allow one to train one's own model. :) Additionally, could you say something about how long training took (and on what hardware).

maybe you can train your own model following by the url. http://www.realworldnlpbook.com/blog/unreasonable-effectiveness-of-transformer-spell-checker.html

XiaoxueGu avatar Jun 29 '20 06:06 XiaoxueGu

@mhagiwara How do I create my own token file (.tok) with my dataset. I have a dataset of 20lakh food item names and I want to train a model to correct the food item names. Your blogpost describes the training process but I am confused about how do I create .tok file.

murtuzamdahod avatar Oct 11 '20 08:10 murtuzamdahod

@mhagiwara I was able to train the model with my own dataset using xfspell architecture. But now when I try to do inference, I am getting an error in fairseq.

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py", line 151, in load_checkpoint_to_cpu
    from fairseq.fb_pathmgr import fb_pathmgr
ModuleNotFoundError: No module named 'fairseq.fb_pathmgr'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/fairseq-interactive", line 8, in <module>
    sys.exit(cli_main())
  File "/usr/local/lib/python3.6/dist-packages/fairseq_cli/interactive.py", line 190, in cli_main
    main(args)
  File "/usr/local/lib/python3.6/dist-packages/fairseq_cli/interactive.py", line 82, in main
    task=task,
  File "/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py", line 179, in load_model_ensemble
    ensemble, args, _task = load_model_ensemble_and_task(filenames, arg_overrides, task)
  File "/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py", line 190, in load_model_ensemble_and_task
    state = load_checkpoint_to_cpu(filename, arg_overrides)
  File "/usr/local/lib/python3.6/dist-packages/fairseq/checkpoint_utils.py", line 160, in load_checkpoint_to_cpu
    path, map_location=lambda s, l: default_restore_location(s, "cpu")
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 577, in load
    with _open_zipfile_reader(opened_file) as opened_zipfile:
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 241, in __init__
    super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: [enforce fail at inline_container.cc:144] . PytorchStreamReader failed reading zip archive: failed finding central directory

Please kindly help me as I am unable to resolve

murtuzamdahod avatar Oct 14 '20 13:10 murtuzamdahod