label-studio-ml-backend
label-studio-ml-backend copied to clipboard
bert ml backend example breaks during training process
I initialize an ML backend based on label_studio_ml/examples/bert/bert_classifier.py.but when i use it to train,the process always ended unexpectedly.
[2023-03-15 16:58:05,044] [WARNING] [label_studio_ml.api::_train::81] => Warning: API /train is deprecated since Label Studio 1.4.1. ML backend used API /train for training previously, but since 1.4.1 Label S
tudio backend and ML backend use /webhook for the training run.
[2023-03-15 16:58:05,044] [DEBUG] [label_studio_ml.model::get_result::61] Get result from last valid job
[2023-03-15 16:58:05,044] [DEBUG] [label_studio_ml.model::iter_finished_jobs::205] Try fetching last valid job id from directory D:\Code\biyeshiji\label-studio-ml-backend\bert_backend_2
[2023-03-15 16:58:05,045] [DEBUG] [label_studio_ml.model::get_result_from_last_job::127] Try job_id=1678846786
[2023-03-15 16:58:05,045] [WARNING] [label_studio_ml.model::_get_result_from_job_id::196] => Warning: 1678846786 dir doesn't contain result file. It seems that previous training session ended with error.
[2023-03-15 16:58:05,045] [ERROR] [label_studio_ml.model::get_result_from_last_job::131] 1678846786 job returns exception:
Traceback (most recent call last):
File "D:\LenovoSoftstore\Anaconda3\envs\yolov5\lib\site-packages\label_studio_ml\model.py", line 129, in get_result_from_last_job
result = self.get_result_from_job_id(job_id)
File "D:\LenovoSoftstore\Anaconda3\envs\yolov5\lib\site-packages\label_studio_ml\model.py", line 111, in get_result_from_job_id
assert isinstance(result, dict)
AssertionError
[2023-03-15 16:58:05,046] [DEBUG] [label_studio_ml.model::get_result_from_last_job::127] Try job_id=1678846774
[2023-03-15 16:58:05,046] [WARNING] [label_studio_ml.model::_get_result_from_job_id::196] => Warning: 1678846774 dir doesn't contain result file. It seems that previous training session ended with error.
[2023-03-15 16:58:05,046] [ERROR] [label_studio_ml.model::get_result_from_last_job::131] 1678846774 job returns exception:
Traceback (most recent call last):
File "D:\LenovoSoftstore\Anaconda3\envs\yolov5\lib\site-packages\label_studio_ml\model.py", line 129, in get_result_from_last_job
result = self.get_result_from_job_id(job_id)
File "D:\LenovoSoftstore\Anaconda3\envs\yolov5\lib\site-packages\label_studio_ml\model.py", line 111, in get_result_from_job_id
assert isinstance(result, dict)
AssertionError
[2023-03-15 16:58:05,046] [DEBUG] [label_studio_ml.model::get_result_from_last_job::127] Try job_id=1678846669
[2023-03-15 16:58:05,047] [WARNING] [label_studio_ml.model::_get_result_from_job_id::196] => Warning: 1678846669 dir doesn't contain result file. It seems that previous training session ended with error.
[2023-03-15 16:58:05,047] [ERROR] [label_studio_ml.model::get_result_from_last_job::131] 1678846669 job returns exception:
Traceback (most recent call last):
File "D:\LenovoSoftstore\Anaconda3\envs\yolov5\lib\site-packages\label_studio_ml\model.py", line 129, in get_result_from_last_job
result = self.get_result_from_job_id(job_id)
File "D:\LenovoSoftstore\Anaconda3\envs\yolov5\lib\site-packages\label_studio_ml\model.py", line 111, in get_result_from_job_id
assert isinstance(result, dict)
AssertionError
[2023-03-15 16:58:05,047] [DEBUG] [label_studio_ml.model::get_result_from_last_job::127] Try job_id=1678846666
[2023-03-15 16:58:05,048] [WARNING] [label_studio_ml.model::_get_result_from_job_id::196] => Warning: 1678846666 dir doesn't contain result file. It seems that previous training session ended with error.
[2023-03-15 16:58:05,048] [ERROR] [label_studio_ml.model::get_result_from_last_job::131] 1678846666 job returns exception:
Traceback (most recent call last):
File "D:\LenovoSoftstore\Anaconda3\envs\yolov5\lib\site-packages\label_studio_ml\model.py", line 129, in get_result_from_last_job
result = self.get_result_from_job_id(job_id)
File "D:\LenovoSoftstore\Anaconda3\envs\yolov5\lib\site-packages\label_studio_ml\model.py", line 111, in get_result_from_job_id
assert isinstance(result, dict)
AssertionError
[2023-03-15 16:58:05,048] [DEBUG] [label_studio_ml.model::get_result_from_last_job::127] Try job_id=1678846596
[2023-03-15 16:58:05,048] [WARNING] [label_studio_ml.model::_get_result_from_job_id::196] => Warning: 1678846596 dir doesn't contain result file. It seems that previous training session ended with error.
[2023-03-15 16:58:05,048] [ERROR] [label_studio_ml.model::get_result_from_last_job::131] 1678846596 job returns exception:
Traceback (most recent call last):
File "D:\LenovoSoftstore\Anaconda3\envs\yolov5\lib\site-packages\label_studio_ml\model.py", line 129, in get_result_from_last_job
result = self.get_result_from_job_id(job_id)
File "D:\LenovoSoftstore\Anaconda3\envs\yolov5\lib\site-packages\label_studio_ml\model.py", line 111, in get_result_from_job_id
assert isinstance(result, dict)
AssertionError
[2023-03-15 16:58:05,049] [DEBUG] [label_studio_ml.model::get_result_from_last_job::127] Try job_id=1678846594
[2023-03-15 16:58:05,049] [WARNING] [label_studio_ml.model::_get_result_from_job_id::196] => Warning: 1678846594 dir doesn't contain result file. It seems that previous training session ended with error.
[2023-03-15 16:58:05,049] [ERROR] [label_studio_ml.model::get_result_from_last_job::131] 1678846594 job returns exception:
Traceback (most recent call last):
File "D:\LenovoSoftstore\Anaconda3\envs\yolov5\lib\site-packages\label_studio_ml\model.py", line 129, in get_result_from_last_job
result = self.get_result_from_job_id(job_id)
File "D:\LenovoSoftstore\Anaconda3\envs\yolov5\lib\site-packages\label_studio_ml\model.py", line 111, in get_result_from_job_id
assert isinstance(result, dict)
AssertionError
[2023-03-15 16:58:05,050] [DEBUG] [label_studio_ml.model::fetch::536] Job result not found: create initial model
[2023-03-15 16:58:05,052] [DEBUG] [urllib3.connectionpool::_new_conn::1003] Starting new HTTPS connection (1): huggingface.co:443
[2023-03-15 16:58:06,162] [DEBUG] [urllib3.connectionpool::_make_request::456] https://huggingface.co:443 "HEAD /bert-base-multilingual-cased/resolve/main/config.json HTTP/1.1" 200 0
[2023-03-15 16:58:06,167] [DEBUG] [urllib3.connectionpool::_new_conn::1003] Starting new HTTPS connection (1): huggingface.co:443
[2023-03-15 16:58:07,357] [DEBUG] [urllib3.connectionpool::_make_request::456] https://huggingface.co:443 "HEAD /bert-base-multilingual-cased/resolve/main/pytorch_model.bin HTTP/1.1" 302 0
Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.s
eq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.
bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification
model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a B
ertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-multilingual-cased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Initialized with from_name=sentiment, to_name=text, labels=['Positive', 'Negative', 'Neutral']
[2023-03-15 16:58:09,391] [DEBUG] [label_studio_ml.model::train_script_wrapper::649] Running in model dir: D:\Code\biyeshiji\label-studio-ml-backend\bert_backend_2
[2023-03-15 16:58:09,398] [DEBUG] [urllib3.connectionpool::_new_conn::1003] Starting new HTTPS connection (1): huggingface.co:443
[2023-03-15 16:58:10,503] [DEBUG] [urllib3.connectionpool::_make_request::456] https://huggingface.co:443 "HEAD /bert-base-multilingual-cased/resolve/main/vocab.txt HTTP/1.1" 200 0
[2023-03-15 16:58:10,507] [DEBUG] [urllib3.connectionpool::_new_conn::1003] Starting new HTTPS connection (1): huggingface.co:443
[2023-03-15 16:58:11,597] [DEBUG] [urllib3.connectionpool::_make_request::456] https://huggingface.co:443 "HEAD /bert-base-multilingual-cased/resolve/main/added_tokens.json HTTP/1.1" 404 0
[2023-03-15 16:58:11,600] [DEBUG] [urllib3.connectionpool::_new_conn::1003] Starting new HTTPS connection (1): huggingface.co:443
[2023-03-15 16:58:12,653] [DEBUG] [urllib3.connectionpool::_make_request::456] https://huggingface.co:443 "HEAD /bert-base-multilingual-cased/resolve/main/special_tokens_map.json HTTP/1.1" 404 0
[2023-03-15 16:58:12,657] [DEBUG] [urllib3.connectionpool::_new_conn::1003] Starting new HTTPS connection (1): huggingface.co:443
[2023-03-15 16:58:13,804] [DEBUG] [urllib3.connectionpool::_make_request::456] https://huggingface.co:443 "HEAD /bert-base-multilingual-cased/resolve/main/tokenizer_config.json HTTP/1.1" 200 0
[2023-03-15 16:58:13,808] [DEBUG] [urllib3.connectionpool::_new_conn::1003] Starting new HTTPS connection (1): huggingface.co:443
[2023-03-15 16:58:14,847] [DEBUG] [urllib3.connectionpool::_make_request::456] https://huggingface.co:443 "HEAD /bert-base-multilingual-cased/resolve/main/tokenizer.json HTTP/1.1" 200 0
Token indices sequence length is longer than the specified maximum sequence length for this model (782 > 512). Running this sequence through the model will result in indexing errors
[2023-03-15 16:58:14,999] [DEBUG] [urllib3.connectionpool::_new_conn::1003] Starting new HTTPS connection (1): huggingface.co:443
[2023-03-15 16:58:16,156] [DEBUG] [urllib3.connectionpool::_make_request::456] https://huggingface.co:443 "HEAD /bert-base-multilingual-cased/resolve/main/config.json HTTP/1.1" 200 0
[2023-03-15 16:58:16,161] [DEBUG] [urllib3.connectionpool::_new_conn::1003] Starting new HTTPS connection (1): huggingface.co:443
[2023-03-15 16:58:17,325] [DEBUG] [urllib3.connectionpool::_make_request::456] https://huggingface.co:443 "HEAD /bert-base-multilingual-cased/resolve/main/pytorch_model.bin HTTP/1.1" 302 0
Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.s
eq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.
bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification
model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a B
ertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-multilingual-cased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Iteration: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 5.26it/s]
Iteration: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 10.75it/s]
Iteration: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 8.06it/s]
Epoch: 54%|███████████████████████████████████████████████████████████████████████████████████████▍ | 54/100 [00:06<00:05, 7.81it/s]
[2023-03-15 16:58:26,916] [DEBUG] [label_studio_ml.model::get_or_create::493] Reload model for project=3.1678846111 with version=None
[2023-03-15 16:58:26,916] [DEBUG] [label_studio_ml.model::create::469] Create project ('3.1678846111', 23144)
Loaded from train output with from_name=sentiment, to_name=text, labels=['Positive', 'Negative', 'Neutral']
[2023-03-15 16:58:28,364] [DEBUG] [label_studio_ml.api::log_response_info::163] Response status: 201 CREATED
[2023-03-15 16:58:28,365] [DEBUG] [label_studio_ml.api::log_response_info::164] Response headers: Content-Type: application/json
Content-Length: 3
[2023-03-15 16:58:28,365] [DEBUG] [label_studio_ml.api::log_response_info::165] Response body: b'{}\n'
[2023-03-15 16:58:28,365] [INFO] [werkzeug::_log::225] 172.27.152.135 - - [15/Mar/2023 16:58:28] "POST /train HTTP/1.1" 201 -
what's wrong?
many thanks!
@worldlinking Do you load it in docker?
we are also facing similar kind of issue while running it in docker directly not in docker-compose also we are not using redis getting assertion error in simple_text_classifier model
AssertionError [2022-12-08 15:59:13,623] [ERROR] [label_studio_ml.model::get_result_from_last_job::131] 1670513565 job returns exception: Traceback (most recent call last): File "/home/testuser/share/label-studio-ml-backend/label_studio_ml/model.py", line 129, in get_result_from_last_job result = self.get_result_from_job_id(job_id) File "/home/testuser/share/label-studio-ml-backend/label_studio_ml/model.py", line 111, in get_result_from_job_id assert isinstance(result, dict)