fast-bert
fast-bert copied to clipboard
Vocab size while fine tuning language model
Hi, I used around 8000000 text sentences while fine tuning the language model but the newly added vocabulary size is only 50000. My data have atleast around 1000000-2000000 tokens to be added. Can, I explicitly change the vocab size while fine tuning? Thanks
@Sagar1094 can you please share the code that you are using for lm fine tuning ? thanks
Hi, I have followed the tutorial for the same. Regards, Sagar
On Wed, Jun 10, 2020, 5:06 PM krannnn [email protected] wrote:
@Sagar1094 https://github.com/Sagar1094 can you please share the code that you are using for lm fine tuning ? thanks
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaushaltrivedi/fast-bert/issues/223#issuecomment-641943784, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANALH756XKDACY35YAOXHRDRV5V3LANCNFSM4NNUHYQQ .
Hi, My data is little different, I have indian addresses for example "i 32 mangol puri delhi", "b-8/205 rohini delhi", "kormangalam bengaluru". I want to create a address classifier. These addresses have labels assosiated to them as well like "26-0", "23-2".
Using BERT pre trained I think it is impossible to train this kind of data as most of the words would be out of vocab. Can you please help me and suggest an alternative approach. I have tried training a bert, electra, roberta models from scratch with huge size of vocab - 2800000 words but it is failing. So i tried fine tuning fast-bert which aslo dosent work.
Please help 🙏😊 Regards, Sagar Gupta +91 8826361028
On Wed, Jun 10, 2020, 5:12 PM Sagar Gupta [email protected] wrote:
Hi, I have followed the tutorial for the same. Regards, Sagar
On Wed, Jun 10, 2020, 5:06 PM krannnn [email protected] wrote:
@Sagar1094 https://github.com/Sagar1094 can you please share the code that you are using for lm fine tuning ? thanks
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaushaltrivedi/fast-bert/issues/223#issuecomment-641943784, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANALH756XKDACY35YAOXHRDRV5V3LANCNFSM4NNUHYQQ .