rat-sql
rat-sql copied to clipboard
preporcessing issue
Previous related issue https://github.com/microsoft/rat-sql/issues/21
My command line output:
DB connections: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 297.03it/s]
train section: 1%|█▏ | 99/8659 [00:02<03:46, 37.76it/s]100 sample done at c= 100
train section: 1%|█▏ | 99/8659 [00:02<04:07, 34.59it/s]
DB connections: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 281.66it/s]
val section: 9%|█████████▉ | 97/1034 [00:02<00:28, 32.90it/s]100 sample done at c= 200
val section: 9%|█████████▉ | 97/1034 [00:02<00:22, 41.70it/s]
87 words in vocab
Exception ignored in: <function CoreNLP.__del__ at 0x7efde4998560>
Traceback (most recent call last):
File "/app/ratsql/resources/corenlp.py", line 24, in __del__
File "/root/.local/lib/python3.7/site-packages/corenlp/client.py", line 83, in stop
File "/opt/conda/lib/python3.7/subprocess.py", line 1790, in kill
AttributeError: 'NoneType' object has no attribute 'SIGKILL'
I have also tried to increase docker memory to 8gb. Any suggestion?
yeah, i have the same problem 👍
yeah, i have the same problem 👍 I have tried to increase docker memory to 32gb. But, got this: train section: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8659/8659 [1:04:40<00:00, 2.23it/s] DB connections: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 166/166 [00:00<00:00, 267.34it/s] val section: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1034/1034 [10:14<00:00, 1.68it/s] 1580 words in vocab Exception ignored in: <function CoreNLP.del at 0x7fc9f7051290> Traceback (most recent call last): File "/app/ratsql/resources/corenlp.py", line 24, in del File "/root/.local/lib/python3.7/site-packages/corenlp/client.py", line 83, in stop File "/opt/conda/lib/python3.7/subprocess.py", line 1790, in kill AttributeError: 'NoneType' object has no attribute 'SIGKILL'
So, i think it is not the memory problem. Have you solve the problem? Please give some suggestion,thanks.
Hi @zsLin177 I have not solved this.
Even I'm facing the same issue
I also have this same issue on an instance with 45G memory, so I do not think it is a memory issue. I am using the Dockerfile provided by the repo, so there should be no dependency problems.
I am pretty convinced that this is an error with the way the corenlp server is supposed to shut down. I was also able to "force" this error to happen when I keyboard interrupt the program early, when the "train" section is being loaded into a registry
.
train section: 6%|██████████▋
[pretrained_embeddings.py] tokenize() method called ...
[pretrained_embeddings.py] tokenize() method called ...
[pretrained_embeddings.py] tokenize() method called ...
^CTraceback (most recent call last):
File "run.py", line 109, in <module>
main()
File "run.py", line 73, in main
preprocess.main(preprocess_config)
File "/app/ratsql/commands/preprocess.py", line 56, in main
preprocessor.preprocess()
File "/app/ratsql/commands/preprocess.py", line 35, in preprocess
self.model_preproc.add_item(item, section, validation_info)
File "/app/ratsql/models/enc_dec.py", line 44, in add_item
self.enc_preproc.add_item(item, section, enc_info)
File "/app/ratsql/models/spider/spider_enc.py", line 168, in add_item
preprocessed = self.preprocess_item(item, validation_info)
File "/app/ratsql/models/spider/spider_enc.py", line 203, in preprocess_item
cv_link = compute_cell_value_linking(question, item.schema)
File "/app/ratsql/models/spider/spider_match_utils.py", line 123, in compute_cell_value_linking
ret = db_word_match(word, column.orig_name, column.table.orig_name, schema.connection)
File "/app/ratsql/models/spider/spider_match_utils.py", line 91, in db_word_match
cursor.execute(p_str)
KeyboardInterrupt
train section: 6%|██████████▋ | 533/8659 [01:31<23:21, 5.80it/s]
Exception ignored in: <function CoreNLP.__del__ at 0x7f62693dbef0>
Traceback (most recent call last):
File "/app/ratsql/resources/corenlp.py", line 24, in __del__
File "/root/.local/lib/python3.7/site-packages/corenlp/client.py", line 83, in stop
File "/opt/conda/lib/python3.7/subprocess.py", line 1790, in kill
AttributeError: 'NoneType' object has no attribute 'SIGKILL'
Still don't know the fix, and I'm experimenting with some things. If I get the answer I'll post it here, but otherwise this information might be useful to you all :)
Edit- just following up. I did a full run of the preprocessing with L#24 of corenlp.py commented out (just replaced with with some kind of print statement for debugging). The code will reach its end, with no error, and you will get the preprocessing files that you need (check your data/<class 'corenlp.client.CoreNLPClient'>
object terminating incorrectly :) Hope that helps!
Thank you for your comment @hclent. I change the 23-25 lines to: def del(self): # self.client.stop() pass and it works too.
Thank you @hclent for your comment