Longxu Dou comments

Results 32 comments of


                                            Longxu Dou

MultiSpider dataset availability

@margauxbo @Carbonhell @exitxingling @lwmlyy Thanks for your interests in multispider! Please refer to [code](https://github.com/longxudou/multispider) and [data](https://huggingface.co/datasets/dreamerdeo/multispider) for further discussion.

custom dataset creation for unisar

Hi @srewai , many thanks for your interest in unisar. To make the unisar work on your own dataset, the ONLY job is to call the step2 to generate the...

custom dataset creation for unisar

@srewai Actually, mBART shows a promising (but not good as English) performance in German (i.e., question and DB are in German and the output SQL also involves German headers.) In...

custom dataset creation for unisar

@srewai You can wait for our multilingual-Spider to train your own parser. It supports Chinese/Japanese/German/French/Spanish/Vietnamese. About 10K data and 120 tables. Q1: I will need to create sqlite db out...

custom dataset creation for unisar

Hi @srewai, this is the paper about multilingualSpider: https://arxiv.org/pdf/2212.13492.pdf . Welcome to read :) The codebase and dataset will be released in about one month. (I need to prepare more...

custom dataset creation for unisar

@epejhan90 Thanks for your interest! Q1: What are the preprocesses to add my table to db and be able to run three steps? The codebase doesn't support this function. But...

custom dataset creation for unisar

@epejhan90 This is because the released checkpoint is no-value version, which is trained by "query_toks_no_value" rather than "query_toks" in Spider dataset. If you want to fix this (make SQL contains...

custom dataset creation for unisar

@epejhan90 I think you could first fine-tune our non-value checkpoint to be "valuable" with construting the training corpus. Otherwise, it's time-consuming to adopt one GPU to fine-tune the BART-large model...

MultiSpider repo

@bekhzod-olimov Hi, thanks for your interests! Please refer to [code](https://github.com/longxudou/multispider) and [data](https://huggingface.co/datasets/dreamerdeo/multispider) for further discussion.

RAM crash when use collect method

Thanks for the excellent codebase and instructions! Eagerly awaiting a fix for the OOM issue.