gemma
gemma copied to clipboard
Add biomedical multimodal dataset preparation tools for Gemma fine-tu…
…ning
This commit addresses issue #210 by providing tools for preparing biomedical multimodal datasets with text, images, tables, and formulas for Gemma fine-tuning.
- Add preprocess_pdfs.py for extracting content from PDFs
- Add create_dataset.py for structuring the dataset
- Add finetune_gemma.py for fine-tuning Gemma models
- Add comprehensive documentation and requirements
The solution enables users to convert biomedical PDFs to a format suitable for Gemma fine-tuning while preserving the semantic relationships between text and non-text elements.
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).
View this failed invocation of the CLA check for more information.
For the most up to date status, view the checks section at the bottom of the pull request.