PyHealth icon indicating copy to clipboard operation
PyHealth copied to clipboard

feat(models): Add BERT and BioBERT models for healthcare text classif…

Open hungngo2 opened this issue 1 month ago • 0 comments

Summary

This PR introduces BERT-based models for text classification tasks in healthcare applications, enabling fine-tuning of pre-trained language models on clinical and biomedical text data.

PyHealth currently lacks native support for initializing and fine-tuning BERT models for text classification. This contribution provides a production-ready BERT implementation with fine-tuning controls and BioBERT support.

Supported Models

Alias HuggingFace Model
bert-base-uncased bert-base-uncased
bert-base-cased bert-base-cased
biobert dmis-lab/biobert-v1.1

Any other HuggingFace BERT-compatible model can also be used by passing the full model name.

Features

  • Pooling strategies: cls, mean, max
  • Fine-tuning control: Freeze entire encoder or N bottom layers
  • Differential learning rates: Separate LR for encoder vs classifier
  • Task modes: Binary, multiclass, multilabel classification

This PR is submitted by the following group of UIUC students :

Hung Ngo (hungngo2) Hao Doan (haodoan2)

hungngo2 avatar Dec 06 '25 05:12 hungngo2