gemma icon indicating copy to clipboard operation
gemma copied to clipboard

Add biomedical multimodal dataset preparation tools for Gemma fine-tu…

Open SH20RAJ opened this issue 7 months ago • 1 comments

…ning

This commit addresses issue #210 by providing tools for preparing biomedical multimodal datasets with text, images, tables, and formulas for Gemma fine-tuning.

  • Add preprocess_pdfs.py for extracting content from PDFs
  • Add create_dataset.py for structuring the dataset
  • Add finetune_gemma.py for fine-tuning Gemma models
  • Add comprehensive documentation and requirements

The solution enables users to convert biomedical PDFs to a format suitable for Gemma fine-tuning while preserving the semantic relationships between text and non-text elements.

SH20RAJ avatar Apr 05 '25 06:04 SH20RAJ

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

google-cla[bot] avatar Apr 05 '25 06:04 google-cla[bot]