bert-vocab-builder icon indicating copy to clipboard operation
bert-vocab-builder copied to clipboard

I am getting an error while running vocab builder.

Open mshivasharan opened this issue 3 years ago • 2 comments

I am getting an error while running vocab builder.

Code and files used for vocab bulider: !git clone https://github.com/kwonmha/bert-vocab-builder.git !wget https://github.com/LydiaXiaohongLi/Albert_Finetune_with_Pretrain_on_Custom_Corpus/raw/master/data_toy/restaurant_review_nopunct.txt !python ./bert-vocab-builder/subword_builder.py --corpus_filepattern "restaurant_review_nopunct.txt" --output_filename "vocab.txt" --min_count 1

Issue 1: fixed replacing 'tf.flags' by ' tf.compat.v1.flags' (Version issue) Traceback (most recent call last): File "./bert-vocab-builder/subword_builder.py", line 37, in tf.flags.DEFINE_string('output_filename', '/tmp/my.subword_text_encoder', AttributeError: module 'tensorflow' has no attribute 'flags'

Issue 2: The number of files to read : 1 Traceback (most recent call last): File "./bert-vocab-builder/subword_builder.py", line 86, in tf.app.run() File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "./bert-vocab-builder/subword_builder.py", line 67, in main split_on_newlines=FLAGS.split_on_newlines, additional_chars=FLAGS.additional_chars) File "/content/bert-vocab-builder/tokenizer.py", line 191, in corpus_token_counts split_on_newlines=split_on_newlines): File "/content/bert-vocab-builder/tokenizer.py", line 139, in _read_filepattern tf.logging.INFO("Start reading ", filename) TypeError: 'int' object is not callable

Could any one help please me out on this issue? Thanks in advance

mshivasharan avatar Dec 01 '20 16:12 mshivasharan

Fixed issue 2. tf.logging.INFO() should be tf.logging.info().

TY.

kwonmha avatar Dec 04 '20 01:12 kwonmha

I am getting an error while running vocab builder.

Code and files used for vocab bulider: !git clone https://github.com/kwonmha/bert-vocab-builder.git !wget https://github.com/LydiaXiaohongLi/Albert_Finetune_with_Pretrain_on_Custom_Corpus/raw/master/data_toy/restaurant_review_nopunct.txt !python ./bert-vocab-builder/subword_builder.py --corpus_filepattern "restaurant_review_nopunct.txt" --output_filename "vocab.txt" --min_count 1

Issue 1: fixed replacing 'tf.flags' by ' tf.compat.v1.flags' (Version issue) Traceback (most recent call last): File "./bert-vocab-builder/subword_builder.py", line 37, in tf.flags.DEFINE_string('output_filename', '/tmp/my.subword_text_encoder', AttributeError: module 'tensorflow' has no attribute 'flags'

Issue 2: The number of files to read : 1 Traceback (most recent call last): File "./bert-vocab-builder/subword_builder.py", line 86, in tf.app.run() File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "./bert-vocab-builder/subword_builder.py", line 67, in main split_on_newlines=FLAGS.split_on_newlines, additional_chars=FLAGS.additional_chars) File "/content/bert-vocab-builder/tokenizer.py", line 191, in corpus_token_counts split_on_newlines=split_on_newlines): File "/content/bert-vocab-builder/tokenizer.py", line 139, in _read_filepattern tf.logging.INFO("Start reading ", filename) TypeError: 'int' object is not callable

Could any one help please me out on this issue? Thanks in advance

Hi Shiva, have you figured out how to solve issue 1?

Jennyyin20 avatar Mar 25 '22 03:03 Jennyyin20