PyHealth
PyHealth copied to clipboard
feat(models): Add BERT and BioBERT models for healthcare text classif…
Summary
This PR introduces BERT-based models for text classification tasks in healthcare applications, enabling fine-tuning of pre-trained language models on clinical and biomedical text data.
PyHealth currently lacks native support for initializing and fine-tuning BERT models for text classification. This contribution provides a production-ready BERT implementation with fine-tuning controls and BioBERT support.
Supported Models
| Alias | HuggingFace Model |
|---|---|
bert-base-uncased |
bert-base-uncased |
bert-base-cased |
bert-base-cased |
biobert |
dmis-lab/biobert-v1.1 |
Any other HuggingFace BERT-compatible model can also be used by passing the full model name.
Features
-
Pooling strategies:
cls,mean,max - Fine-tuning control: Freeze entire encoder or N bottom layers
- Differential learning rates: Separate LR for encoder vs classifier
- Task modes: Binary, multiclass, multilabel classification
This PR is submitted by the following group of UIUC students :
Hung Ngo (hungngo2) Hao Doan (haodoan2)