TextAnalysis.jl [WIP] Albert

Hi everyone I am adding ALBERT [WIP] Currently only raw code is given in PR. Dependencies - Transformers.jl , WordTokenizer.jl

I am not exporting any function.I am still in middle of deciding what is the best way to use it. But i am adding some important codes which is used for conversion of pretrained checkpoints and in Demo file below

Roadmap

[x] SentencePiece - containing wordpiece as well as unigram model(python Wrapper (for now) as well as julia implementation (under development))
[X] tfckpt2bsonforalbert.jl - for conversion of Tensorflow checkpoint to BSON weights
[x] albert transformer - It is not completed but is based on transformers.jl transformer
[X] model file - for now is kept inside ALBERT folder but it just the general wrapping structure to load ALBERT pretrain weight
[x] APIs - alberttokenizer , albertmasklm , albertforsequenceclassification etc.
[x] our own hosted Pretrain model manage by datadeps.jl
[x] Documentation, test and Tutorial
[x] code and APIs for fine tuning and Data loading apart from above refactoring and cleaning of code is remaining

Important links

Pretrained weights link .

The pretrained weigths are converted from tensorflow check point released by google-research.
The code for conversion is given in tfckpt2bsonforalbert.jl
Currently Pretrained weight for Version-1 is given soon I will release it for version-2

For detail refer - link

Demo - link

PS All the suggestions are welcome

Mar 31 '20 09:03 tejasvaidhyadev

Sorry for closing PR before Commit history of git is now updated

News

Updated Demo

Contatins demo of embedding from wordpiece and sentencepiece
Demo of conversion of Tensorflow checkpoint to bson file(as desire by Julia flux) - link

Apr 02 '20 13:04 tejasvaidhyadev

Pretrained weights

Version 2 of ALBERT converted Bson is released It doesn't contain 30k-clean.model file (by sentencepiece)

Apr 18 '20 13:04 tejasvaidhyadev

@aviks any suggestion on the roadmap mentioned above. i am also thinking of adding Tutorial folder (containing ipynb of tutorials)

Apr 23 '20 10:04 tejasvaidhyadev

added Sentencepiece unigram support

Jun 27 '20 04:06 tejasvaidhyadev

completed trainable Albert structure.

Jul 03 '20 20:07 tejasvaidhyadev

fine-tuning Training Tutorial (it's not supported GPU so far)- here

Jul 17 '20 19:07 tejasvaidhyadev

The above code is pretty messy and not yet refractor (for the experiment) we can drop Sentencepiece as soon as PR of ALBERT is merged Apart from that pretrain.jl is ready and can drop tfck2bsonforalbert.jl in next push I will refractor code within next 1 week

Jul 18 '20 18:07 tejasvaidhyadev

Hi @tejasvaidhyadev can you move this PR to TextModels now please?

Nov 01 '20 21:11 aviks

Hi @tejasvaidhyadev can you move this PR to TextModels now please? Hi @aviks,

Is it okay, if I will do it the coming weekend? I have exams this week

Nov 02 '20 05:11 tejasvaidhyadev

I will do it the coming weekend?

Yes, of course, only whenever you have time.

Nov 02 '20 15:11 aviks

TextAnalysis.jl TextAnalysis.jl copied to clipboard

[WIP] Albert

Roadmap

Important links

News

Pretrained weights

TextAnalysis.jl
TextAnalysis.jl copied to clipboard