keras-nlp icon indicating copy to clipboard operation
keras-nlp copied to clipboard

BERT example integration test

Open mattdangerw opened this issue 3 years ago • 3 comments

I think we are at the point where we need some automated testing for this. This runs preprocessing, a few pretraining train steps, and a few finetuning train steps by invoking our scripts on real data.

mattdangerw avatar May 25 '22 18:05 mattdangerw

@chenmoneygithub @fchollet let me know what you think of this.

We definitely need some sort of automated testing here. I think this could be a good template for integration tests for our examples--literally run all the scripts on some real data we host and download in the test.

For just running 5 pretraining steps and 3 finetuning steps on a tiny version of the architecture take ~7 minutes on the stock github testing machines (CPU only). As we figure out a way to test on accelerators, we could definitely do a bit more training here.

mattdangerw avatar May 25 '22 18:05 mattdangerw

I think I also like this as a forcing function for simple "out of box" use. Needing to write a single, smallish test that runs your whole training pipeline is tricky, but it will force people to avoid sneaking in manual steps to get things working.

mattdangerw avatar May 25 '22 20:05 mattdangerw

Talked with @fchollet on this, we should do a few things.

  1. Move as much logic as possible out of the runnable script files into bert_model.py (and potentially add a bert_data.py, others as needed). Runnable scripts should be very basic and only transforming flags into function/class arguments.
  2. Change this integration test to run the end-to-end flow by invoking functionality from bert_model.py. If we've done things right this should still be a small and readable test.
  3. Leave the runnable scripts untested on ci for now. Never test them through pytest. Maybe someday test them by invoking them directly for limited training runs.

Will try to find some time to redo this with those changes in mind.

mattdangerw avatar Jun 02 '22 00:06 mattdangerw