feat(models): Add BERT and BioBERT models for healthcare text classif…

Open hungngo2 opened this issue 1 month ago • 0 comments

Summary

This PR introduces BERT-based models for text classification tasks in healthcare applications, enabling fine-tuning of pre-trained language models on clinical and biomedical text data.

PyHealth currently lacks native support for initializing and fine-tuning BERT models for text classification. This contribution provides a production-ready BERT implementation with fine-tuning controls and BioBERT support.

Supported Models

Alias	HuggingFace Model
`bert-base-uncased`	`bert-base-uncased`
`bert-base-cased`	`bert-base-cased`
`biobert`	`dmis-lab/biobert-v1.1`

Any other HuggingFace BERT-compatible model can also be used by passing the full model name.

Features

Pooling strategies: cls, mean, max
Fine-tuning control: Freeze entire encoder or N bottom layers
Differential learning rates: Separate LR for encoder vs classifier
Task modes: Binary, multiclass, multilabel classification

This PR is submitted by the following group of UIUC students :

Hung Ngo (hungngo2) Hao Doan (haodoan2)

Dec 06 '25 05:12 hungngo2