gap-text2sql
gap-text2sql copied to clipboard
How to execute own queries?
Hello I would like to insert my own Questions and Databases but when I try to change the Spider json files it get the error:
RuntimeError: Error(s) in loading state_dict for EncDecModel:
size mismatch for decoder.rule_logits.2.weight: copying a param with shape torch.Size([97, 128]) from checkpoint, the shape in current model is torch.Size([76, 128]).
size mismatch for decoder.rule_logits.2.bias: copying a param with shape torch.Size([97]) from checkpoint, the shape in current model is torch.Size([76]).
size mismatch for decoder.rule_embedding.weight: copying a param with shape torch.Size([97, 128]) from checkpoint, the shape in current model is torch.Size([76, 128]).
size mismatch for decoder.node_type_embedding.weight: copying a param with shape torch.Size([55, 64]) from checkpoint, the shape in current model is torch.Size([49, 64]).
Is the an elegant solution to test my own data? Thanks in advance!
Hey, Thanks for your interests on our work. You can checkout the pull request https://github.com/awslabs/gap-text2sql/pull/6 when it is merged. I think you can run your own database and queries based on the notebook I provided.
Let me know if it works for you and let me know if you have any further questions.
Peng
Hello Peng,
Thank you very much for your quick response! I tried the notebook and it worked :+1: I will let you know if I have any questions. Have a nice weekend.
Kevin
Hello Peng,
I made further tests and figured out that the response sometimes contains the word 'terminal' for example:
Query: department with budget greater then 10 billion
Answer: SELECT department.Department_ID FROM department WHERE department.Budget_in_Billions > 'terminal'
I guess 'terminal' should be replaced the words contained in the query. How can the replacement be achieved?
Sincerely Kevin
Hey Kevin,
Thanks for your question. So the terminal usually will be a cell value: it could be a float/integer or a string. It usually involves some value copy mechanism to do it; but currently the model doesn't support it.
However, there is a simple solution for this: If it is number, you can easily detect the number in the utterance and directly fill it in the generated SQL. For string type, you can match the n-gram against the value in the databases: if it matched, it would be a string value for the corresponding column.
I think above is a simple solution for this. I have a script to achieve this but it takes time to clean it and make it public. You can try this method out because that is pretty simple.
And I will try to make the script public as soon as possible if you did not implement it by yourself.
Peng
Hey Peng,
Thank you very much for your explanation. I will try my best :)
Sincerely Kevin
Hi @Impavidity @kev2513 , I get the following error on trying the notebook -
WARNING <class 'seq2struct.models.enc_dec.EncDecModel.Preproc'>: superfluous {'name': 'EncDec'}
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-21-d986dbd802ee> in <module>()
----> 1 inferer = Inferer(infer_config)
4 frames
/content/gap-text2sql/rat-sql-gap/seq2struct/commands/infer.py in __init__(self, config)
34 registry.lookup('model', config['model']).Preproc,
35 config['model'])
---> 36 self.model_preproc.load()
37
38 def load_model(self, logdir, step):
/content/gap-text2sql/rat-sql-gap/seq2struct/models/enc_dec.py in load(self)
54
55 def load(self):
---> 56 self.enc_preproc.load()
57 self.dec_preproc.load()
58
/content/gap-text2sql/rat-sql-gap/seq2struct/models/spider/spider_enc.py in load(self)
1272
1273 def load(self):
-> 1274 self.tokenizer = BartTokenizer.from_pretrained(self.data_dir)
1275
1276
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in from_pretrained(cls, *inputs, **kwargs)
1138
1139 """
-> 1140 return cls._from_pretrained(*inputs, **kwargs)
1141
1142 @classmethod
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in _from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
1244 ", ".join(s3_models),
1245 pretrained_model_name_or_path,
-> 1246 list(cls.vocab_files_names.values()),
1247 )
1248 )
OSError: Model name 'data/spider-bart/nl2code-1115,output_from=true,fs=2,emb=bart,cvlink/enc' was not found in tokenizers model name list (facebook/bart-base, facebook/bart-large, facebook/bart-large-mnli, facebook/bart-large-cnn, facebook/bart-large-xsum, yjernite/bart_eli5). We assumed 'data/spider-bart/nl2code-1115,output_from=true,fs=2,emb=bart,cvlink/enc' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.json', 'merges.txt'] but couldn't find such vocabulary files at this path or url.
Can you please help me with what step am I missing?
Hello @thecodemakr,
I got the same issue executing Inference solved the problem for me:
python run.py preprocess experiments/spider-configs/gap-run.jsonnet
(also execute the Preprocess dataset step in advance)
Can you pls tell me that, How much time this command will run "python run.py preprocess experiments/spider-configs/gap-run.jsonnet", i'm running it for like an hr
Hi @Impavidity @kev2513 , I get the following error on trying the notebook -
WARNING <class 'seq2struct.models.enc_dec.EncDecModel.Preproc'>: superfluous {'name': 'EncDec'} --------------------------------------------------------------------------- OSError Traceback (most recent call last) <ipython-input-21-d986dbd802ee> in <module>() ----> 1 inferer = Inferer(infer_config) 4 frames /content/gap-text2sql/rat-sql-gap/seq2struct/commands/infer.py in __init__(self, config) 34 registry.lookup('model', config['model']).Preproc, 35 config['model']) ---> 36 self.model_preproc.load() 37 38 def load_model(self, logdir, step): /content/gap-text2sql/rat-sql-gap/seq2struct/models/enc_dec.py in load(self) 54 55 def load(self): ---> 56 self.enc_preproc.load() 57 self.dec_preproc.load() 58 /content/gap-text2sql/rat-sql-gap/seq2struct/models/spider/spider_enc.py in load(self) 1272 1273 def load(self): -> 1274 self.tokenizer = BartTokenizer.from_pretrained(self.data_dir) 1275 1276 /usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in from_pretrained(cls, *inputs, **kwargs) 1138 1139 """ -> 1140 return cls._from_pretrained(*inputs, **kwargs) 1141 1142 @classmethod /usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in _from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs) 1244 ", ".join(s3_models), 1245 pretrained_model_name_or_path, -> 1246 list(cls.vocab_files_names.values()), 1247 ) 1248 ) OSError: Model name 'data/spider-bart/nl2code-1115,output_from=true,fs=2,emb=bart,cvlink/enc' was not found in tokenizers model name list (facebook/bart-base, facebook/bart-large, facebook/bart-large-mnli, facebook/bart-large-cnn, facebook/bart-large-xsum, yjernite/bart_eli5). We assumed 'data/spider-bart/nl2code-1115,output_from=true,fs=2,emb=bart,cvlink/enc' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.json', 'merges.txt'] but couldn't find such vocabulary files at this path or url.
Can you please help me with what step am I missing?
@thecodemakr I am also facing the issue while running the notebook How did you resolve this