disco icon indicating copy to clipboard operation
disco copied to clipboard

add NLP usecase

Open martinjaggi opened this issue 3 years ago • 2 comments

add a new task as an example for NLP such as text classifiers. can probably use the .csv dataloaders for it.

  • Plan A: MobileBert Preprocessing pipeline and pretrained model for TFjs is already available here: https://github.com/tensorflow/tfjs-models/tree/master/qna the question is if the model is compiled graph format (inference only) or could be available as layered model as well for training. see also huggingface for more [pretrained model infos], potentially converting to keras-style layered model (https://huggingface.co/docs/transformers/model_doc/mobilebert), and the google blog post on the TFjs one

  • Plan B: LSTM If finetuning mobileBert as above doesn't work in js yet, we can try simpler LSTM models as for example here: https://blog.ldtalentwork.com/2020/01/10/tensorflowjs-how-to-create-a-language-translator/

In either case, what's nice is that we have existing tokenizers/preprocessing code already available so don't need to start from scratch

what do others think?

martinjaggi avatar Oct 19 '22 19:10 martinjaggi

Looks good @martinjaggi but from my side the main interest is in hooking up the federated flow, any model will do.

I would also like to be able to show local (client) model performance and server model performance after the various federated averaging steps :)

Would love to get involved!

ydennisy avatar Oct 21 '22 17:10 ydennisy

ok sure. i suggest we create a new PR on top of #483 .

and then let's try the LSTM first for simplicity i'd say

about the federated flow, this should already be completely fine and independent of the task. BTW after every communication round, server and local model are identical again. model performance therefore is usually indistinguishable from the centralized (hypothetical) scenario

martinjaggi avatar Oct 21 '22 20:10 martinjaggi