h2o-llmstudio icon indicating copy to clipboard operation
h2o-llmstudio copied to clipboard

[BUG] Exception due to noescapechar set when BLEU evaluation is being stored to csv

Open DavidFarago opened this issue 2 years ago • 2 comments
trafficstars

🐛 Bug

My experiment fails pretty early, with the following stacktrace upon BLEU evaluation:

2023-05-18 17:58:52,287 - INFO: Validation BLEU: 0.32177
2023-05-18 17:58:52,333 - ERROR: Exception occurred during H2O LLM Studio run:
Traceback (most recent call last):
File "/root/h2o-llmstudio/train_wave.py", line 106, in
run(cfg=cfg)
File "/root/h2o-llmstudio/train.py", line 570, in run
val_data, val_loss, val_metric, last_batch = run_train(
File "/root/h2o-llmstudio/train.py", line 205, in run_train
val_data, val_loss, val_metric = run_eval(
File "/root/h2o-llmstudio/train.py", line 136, in run_eval
save_predictions(cfg, val_data, val_dataloader, val_df, mode)
File "/root/h2o-llmstudio/llm_studio/src/utils/modeling_utils.py", line 442, in save_predictions
val_df.to_csv(csv_preds_name, index=False)
File "/root/anaconda3/envs/llmstudio310/lib/python3.10/site-packages/pandas/core/generic.py", line 3551, in to_csv
return DataFrameRenderer(formatter).to_csv(
File "/root/anaconda3/envs/llmstudio310/lib/python3.10/site-packages/pandas/io/formats/format.py", line 1180, in to_csv
csv_formatter.save()
File "/root/anaconda3/envs/llmstudio310/lib/python3.10/site-packages/pandas/io/formats/csvs.py", line 261, in save
self._save()
File "/root/anaconda3/envs/llmstudio310/lib/python3.10/site-packages/pandas/io/formats/csvs.py", line 266, in _save
self._save_body()
File "/root/anaconda3/envs/llmstudio310/lib/python3.10/site-packages/pandas/io/formats/csvs.py", line 304, in _save_body
self._save_chunk(start_i, end_i)
File "/root/anaconda3/envs/llmstudio310/lib/python3.10/site-packages/pandas/io/formats/csvs.py", line 315, in _save_chunk
libwriters.write_csv_rows(
File "pandas/_libs/writers.pyx", line 72, in pandas._libs.writers.write_csv_rows
_csv.Error: need to escape, but no escapechar set 

To Reproduce

I cannot disclse my training data, but it contains a lot of special characters like "ü", ",", "|", "-".

Details:

The full log is:

2023-05-18 17:48:26,845 - INFO: Global random seed: 612235
2023-05-18 17:48:26,846 - INFO: Preparing the data...
2023-05-18 17:48:26,846 - INFO: Setting up automatic validation split...
2023-05-18 17:48:27,033 - INFO: Preparing train and validation data
2023-05-18 17:48:27,033 - INFO: Loading train dataset...
2023-05-18 17:51:44,329 - INFO: Stop token ids: []
2023-05-18 17:51:44,771 - INFO: Sample prompt: ASSISTANT: pre(ask_birthday)
2023-05-18 17:51:44,771 - INFO: Loading validation dataset...
2023-05-18 17:55:01,590 - INFO: Stop token ids: []
2023-05-18 17:55:01,598 - INFO: Sample prompt: USER: Bei der IKK classic
2023-05-18 17:55:01,598 - INFO: Number of observations in train dataset: 59796
2023-05-18 17:55:01,599 - INFO: Number of observations in validation dataset: 605
2023-05-18 17:55:01,670 - INFO: dtype: torch.float16
2023-05-18 17:55:08,710 - INFO: Starting validation inference
2023-05-18 17:55:08,710 - INFO: validation progress: 0%| | 0/19 [00:00
2023-05-18 17:55:23,267 - INFO: validation progress: 5%|5 | 1/19 [00:14<04:22, 14.56s/it]
2023-05-18 17:55:34,945 - INFO: validation progress: 11%|# | 2/19 [00:26<03:38, 12.86s/it]
2023-05-18 17:55:46,675 - INFO: validation progress: 16%|#5 | 3/19 [00:37<03:17, 12.35s/it]
2023-05-18 17:55:58,387 - INFO: validation progress: 21%|##1 | 4/19 [00:49<03:01, 12.10s/it]
2023-05-18 17:56:10,103 - INFO: validation progress: 26%|##6 | 5/19 [01:01<02:47, 11.96s/it]
2023-05-18 17:56:21,834 - INFO: validation progress: 32%|###1 | 6/19 [01:13<02:34, 11.88s/it]
2023-05-18 17:56:33,060 - INFO: validation progress: 37%|###6 | 7/19 [01:24<02:20, 11.67s/it]
2023-05-18 17:56:44,724 - INFO: validation progress: 42%|####2 | 8/19 [01:36<02:08, 11.67s/it]
2023-05-18 17:56:56,384 - INFO: validation progress: 47%|####7 | 9/19 [01:47<01:56, 11.66s/it]
2023-05-18 17:57:06,834 - INFO: validation progress: 53%|#####2 | 10/19 [01:58<01:41, 11.29s/it]
2023-05-18 17:57:18,425 - INFO: validation progress: 58%|#####7 | 11/19 [02:09<01:31, 11.38s/it]
2023-05-18 17:57:30,061 - INFO: validation progress: 63%|######3 | 12/19 [02:21<01:20, 11.46s/it]
2023-05-18 17:57:41,740 - INFO: validation progress: 68%|######8 | 13/19 [02:33<01:09, 11.53s/it]
2023-05-18 17:57:52,859 - INFO: validation progress: 74%|#######3 | 14/19 [02:44<00:57, 11.40s/it]
2023-05-18 17:58:04,517 - INFO: validation progress: 79%|#######8 | 15/19 [02:55<00:45, 11.48s/it]
2023-05-18 17:58:16,537 - INFO: validation progress: 84%|########4 | 16/19 [03:07<00:34, 11.64s/it]
2023-05-18 17:58:28,222 - INFO: validation progress: 89%|########9 | 17/19 [03:19<00:23, 11.66s/it]
2023-05-18 17:58:39,877 - INFO: validation progress: 95%|#########4| 18/19 [03:31<00:11, 11.66s/it]
2023-05-18 17:58:51,656 - INFO: validation progress: 100%|##########| 19/19 [03:42<00:00, 11.69s/it]
2023-05-18 17:58:51,659 - INFO: validation progress: 100%|##########| 19/19 [03:42<00:00, 11.73s/it]
2023-05-18 17:58:52,073 - INFO: Mean validation loss: 10.37500
2023-05-18 17:58:52,287 - INFO: Validation BLEU: 0.32177
2023-05-18 17:58:52,333 - ERROR: Exception occurred during H2O LLM Studio run:
Traceback (most recent call last):
File "/root/h2o-llmstudio/train_wave.py", line 106, in
run(cfg=cfg)
File "/root/h2o-llmstudio/train.py", line 570, in run
val_data, val_loss, val_metric, last_batch = run_train(
File "/root/h2o-llmstudio/train.py", line 205, in run_train
val_data, val_loss, val_metric = run_eval(
File "/root/h2o-llmstudio/train.py", line 136, in run_eval
save_predictions(cfg, val_data, val_dataloader, val_df, mode)
File "/root/h2o-llmstudio/llm_studio/src/utils/modeling_utils.py", line 442, in save_predictions
val_df.to_csv(csv_preds_name, index=False)
File "/root/anaconda3/envs/llmstudio310/lib/python3.10/site-packages/pandas/core/generic.py", line 3551, in to_csv
return DataFrameRenderer(formatter).to_csv(
File "/root/anaconda3/envs/llmstudio310/lib/python3.10/site-packages/pandas/io/formats/format.py", line 1180, in to_csv
csv_formatter.save()
File "/root/anaconda3/envs/llmstudio310/lib/python3.10/site-packages/pandas/io/formats/csvs.py", line 261, in save
self._save()
File "/root/anaconda3/envs/llmstudio310/lib/python3.10/site-packages/pandas/io/formats/csvs.py", line 266, in _save
self._save_body()
File "/root/anaconda3/envs/llmstudio310/lib/python3.10/site-packages/pandas/io/formats/csvs.py", line 304, in _save_body
self._save_chunk(start_i, end_i)
File "/root/anaconda3/envs/llmstudio310/lib/python3.10/site-packages/pandas/io/formats/csvs.py", line 315, in _save_chunk
libwriters.write_csv_rows(
File "pandas/_libs/writers.pyx", line 72, in pandas._libs.writers.write_csv_rows
_csv.Error: need to escape, but no escapechar set 

DavidFarago avatar May 18 '23 18:05 DavidFarago

As a workaround, I can avoid this bug by avoiding validations (no "save best checkpoint" option).

DavidFarago avatar May 18 '23 20:05 DavidFarago

Thank you for reporting. This is hard to debug if you can't provide the exact character where it fails. As far as I can see there are no issues with the characters that you mentioned:

image

pascal-pfeiffer avatar May 19 '23 06:05 pascal-pfeiffer

Closed for now, @DavidFarago please re-open if needed.

psinger avatar Jul 10 '23 08:07 psinger