Emergency department classification eval
Thank you for contributing an eval! ♥️
🚨 Please make sure your PR follows these guidelines, failure to follow the guidelines below will result in the PR being closed automatically. Note that even if the criteria are met, that does not guarantee the PR will be merged nor GPT-4 access granted. 🚨
PLEASE READ THIS:
In order for a PR to be merged, it must fail on GPT-4. We are aware that right now, users do not have access, so you will not be able to tell if the eval fails or not. Please run your eval with GPT-3.5-Turbo, but keep in mind as we run the eval, if GPT-4 gets higher than 90% on the eval, we will likely reject since GPT-4 is already capable of completing the task.
We plan to roll out a way for users submitting evals to see the eval performance on GPT-4 soon. Stay tuned! Until then, you will not be able to see the eval performance on GPT-4. We encourage partial PR's with ~5-10 example that we can then run the evals on and share the results with you so you know how your eval does with GPT-4 before writing all 100 examples.
Eval details 📑
Eval name
Emergency department notes classification
Eval description
Identify if the patient discharge summary has been processed through emergency department
What makes this a useful eval?
In smaller hospitals and urgent care settings, emergency department summaries get mixed with other settings and affects data management and insurance billing pipelines. Identifying if note has been through ED at source can expedite payments for doctors and providers. We have noticed that ChatGPT with GPT-4 has been much capable of understanding clinical information compared to GPT-3.5 for different downstream tasks that arise out ED classification.
Criteria for a good eval ✅
Below are some of the criteria we look for in a good eval. In general, we are seeking cases where the model does not do a good job despite being capable of generating a good response (note that there are some things large language models cannot do, so those would not make good evals).
Your eval should be:
- [x] Thematically consistent: The eval should be thematically consistent. We'd like to see a number of prompts all demonstrating some particular failure mode. For example, we can create an eval on cases where the model fails to reason about the physical world.
- [x] Contains failures where a human can do the task, but either GPT-4 or GPT-3.5-Turbo could not.
- [x] Includes good signal around what is the right behavior. This means either a correct answer for
Basicevals or theFactModel-graded eval, or an exhaustive rubric for evaluating answers for theCriteriaModel-graded eval. - [x] Include at least 100 high quality examples (it is okay to only contribute 5-10 meaningful examples and have us test them with GPT-4 before adding all 100)
If there is anything else that makes your eval worth including, please document it below.
Unique eval value
ED classification is often done by medical professionals and we want raw notes to be classified. Most algorithms require processed notes for better classification
Eval structure 🏗️
Your eval should
- [x] Check that your data is in
evals/registry/data/{name} - [x] Check that your yaml is registered at
evals/registry/evals/{name}.yaml - [x] Ensure you have the right to use the data you submit via this eval
(For now, we will only be approving evals that use one of the existing eval classes. You may still write custom eval classes for your own cases, and we may consider merging them in the future.)
Final checklist 👀
Submission agreement
By contributing to Evals, you are agreeing to make your evaluation logic and data under the same MIT license as this repository. You must have adequate rights to upload any data used in an Eval. OpenAI reserves the right to use this data in future service improvements to our product. Contributions to OpenAI Evals will be subject to our usual Usage Policies (https://platform.openai.com/docs/usage-policies).
- [x] I agree that my submission will be made available under an MIT license and complies with OpenAI's usage policies.
Email address validation
If your submission is accepted, we will be granting GPT-4 access to a limited number of contributors. Access will be given to the email address associated with the merged pull request.
- [x] I acknowledge that GPT-4 access will only be granted, if applicable, to the email address used for my merged pull request.
Limited availability acknowledgement
We know that you might be excited to contribute to OpenAI's mission, help improve our models, and gain access to GPT-4. However, due to the requirements mentioned above and high volume of submissions, we will not be able to accept all submissions and thus not grant everyone who opens a PR GPT-4 access. We know this is disappointing, but we hope to set the right expectation before you open this PR.
- [x] I understand that opening a PR, even if it meets the requirements above, does not guarantee the PR will be merged nor GPT-4 access granted.
Submit eval
- [x] I have filled out all required fields in the evals PR form
- [] (Ignore if not submitting code) I have run
pip install pre-commit; pre-commit installand have verified thatblack,isort, andautoflakeare running when I commit and push
Failure to fill out all required fields will result in the PR being closed.
Eval JSON data
Since we are using Git LFS, we are asking eval submitters to add in as many Eval Samples (at least 5) from their contribution here:
View evals in JSON
Eval
{"input":[{"role":"system","content":"is note from Emergency Department?"},{"role":"user","content":"Chief Complaint: Chest pain\nHistory of Present Illness: Ms. is a y\/o female with history of , s\/p Mohs, hypothyroidism, and CAD s\/p MI and DESx3 to LAD () who presents with atypical chest pain, with normal EKG and troponin. Morning of admission, patient reports that she felt a twinge of pain in the left chest along with abdominal discomfort. She then felt some nervousness\/anxiety, felt \"weird\" lasting 20 minutes, and then sought emergency help. She also explains that she recently moved to in and since then has felt very stressed and depressed. Of note, she had an MI in . At the time, her only symptoms were feeling \"hot\" and \"parched\" worse with exertion. Her EKGs were reportedly normal, but she was brought to the hospital where enzymes were positive. She received DESx3 to LAD at , and since then has been able to walk up to 1.5 miles daily without any symptoms. \nDischarge Diagnosis: Primary: Atypical chest pain Secondary: CAD "}],"ideal":"yes"}
{"input":[{"role":"system","content":"is note from Emergency Department?"},{"role":"user","content":"Chief Complaint: Difficulty with walking\nHistory of Present Illness: Ms. is a year old right-handed woman with most pertinent past medical history of COPD secondary to long history of tobacco use who presents to ED because of inability to walk with concern for stroke. Ms. is recovering from a COPD exacerbation. She presented to urgent care for management of this . Ms. continues to endorse fevers, chills, headache, difficulty with breathing, and generalized weakness. Ms. woke up on and felt that she could not walk. Ms. is a poor historian and has a hard time expressing why she cannot walk, but reports she feels as if she is walking on a boat. Ms. denies vertigo and does not feel that she is veering to one side versus another. Ms. feels her cognition is normal. Ms. feels that her voice sounds normal and that she is not having difficulty with swallowing. She denies loss of vision and double vision. She denies focal weakness or change in sensation. Ms. denies that she is having difficulty reaching for things with her hands. Ms. reports that she has had gradual hearing loss in the right ear over the last years. Ms. has not been evaluated for this. Ms. reports has been having ringing in the ear and sensation of hearing white-noise from this ear for the last three or so years. Ms. reports that she had similar symptoms years ago and was taken to . Ms. spent fifteen days on the neurology service and left hospital with no diagnosis. She reports constellation of symptoms, including aphasia, left sided weakness and incoordination. She reports that she gradually got better one her own. \nDischarge Diagnosis: Post infectious cerebilitis "}],"ideal":"yes"}
{"input":[{"role":"system","content":"is note from Emergency Department?"},{"role":"user","content":"Chief Complaint: fevers, week 1 treatment\nHistory of Present Illness: Mr. is a man with follicular lymphoma transformed to DLBCL most recently s\/p CAR T cell therapy () with relapse, who had a port placed last week with plan for initiation of immunotherapy who was unable to start due to low counts and was discharged with a planned readmission next week. He left on on he woke up ok but on arrival at work in the morning felt like he had been \"hit by a bus\" with muscle and bone aches. He stuck around work until acetaminophen kicked in and he felt he could drive himself home. At home, he had chills and rigors, and a brief episode of dyspnea while having shaking chills in the bathroom. He has a runny nose but no sore throat or other URI symptoms; no dysphagia. He has no cough, dyspnea other than described above, dysphagia, dysuria, increased urinary frequency, diarrhea, rashes. After having rigors, his wife took his temperature and noted he had a temperature to 99.9. He called and the on-call oncology fellow called him back and advised presentation to the ED given that he was neutropenic during his recent admission. \nDischarge Diagnosis: PRIMARY DIAGNOSIS ===================== RELAPSED DLBCL FEVER ACUTE KIDNEY INJURY TRANSAMINITIS SECONDARY DIAGNOSIS ===================== PE\/DVT HTN OSA "}],"ideal":"yes"}
{"input":[{"role":"system","content":"is note from Emergency Department?"},{"role":"user","content":"Chief Complaint: Altered mental status\nHistory of Present Illness: Mr. is a year old gentleman with previous history of stage IIIC melanoma s\/p wide local excision and axillary dissection without adjuvant chemotherapy, newly noted lung and brain metastases, presenting with altered mental status. \nREVIEW OF SYSTEMS: Denies any diarrhea, constipation, fever\/chills, bleeding, cough, SOB, chest pain. Remainder 10 pt ROS negative other than HPI.\nDischarge Diagnosis: Acute Toxic Metabolic Encephalopathy Metastatic Melanoma with Brain involvement Hypertension, uncontrolled "}],"ideal":"yes"}
{"input":[{"role":"system","content":"is note from Emergency Department?"},{"role":"user","content":"Chief Complaint: slurred speech\nHistory of Present Illness: Mrs. (i.e. ) is an year-old right-handed woman with PMH of CAD post stent placement (), right-sided carotid artery disease, possible atrial fibrillation, reported subdural hematoma one year ago because of fall and head strike, hypertension, hyperlipidemia, emphysema, and spinal stenosis with low back pain who neurology has been consulted for two brief episodes of possible aphasia with concern for TIA\/stroke. Mrs. initially presented to two days ago because of hypertension. Mrs. reported that her BP is always elevated but her measurement was higher than normal and this prompted her to go to the hospital. At , she was told that she needed to go to her PCP (Dr. at ) in the next two days, prompting her to come to schedule an appointment with her. During the appointment with Dr. 2 Mrs. had two periods lasting minutes where she \"could not get her words out\" describing her speech as non-sensical. She was aware of these episodes and knew what she wanted to say but was unable to get the words out. She denies loss of consciousness, dizziness or any other symptoms during these episodes. Dr. witnessed the episodes, told her to go to the ER and ordered an ambulance for her. Notably, her husband endorsed that she previously had a UTI several years ago where she had similar symptoms but does not remember the duration of the episodes at that time. I talked to Mrs. daughter, who is a , who reports that her mother is currently at her cognitive baseline. Mrs. tells me she feels normal and wants to go home. ROS: On neuro ROS, the pt denies headache, loss of vision, blurred vision, diplopia, dysarthria, dysphagia, lightheadedness, vertigo, tinnitus or hearing difficulty. Denies difficulties producing or comprehending speech. Denies focal weakness, numbness, parasthesiae. No bowel or bladder incontinence or retention. Denies difficulty with gait.\nDischarge Diagnosis: Transient Ischemic Attack (TIA) "}],"ideal":"yes"}
{"input":[{"role":"system","content":"is note from Emergency Department?"},{"role":"user","content":"Chief Complaint: Altered mental status\nHistory of Present Illness: PMH of Seizures, GBM s\/p left frontal craniotomy for resection (now awaiting initiation of radiation + TMZ), presented from home with encephalopathy. Unfortunately, at time of admission family was not present and my attempts at calling wife at number listed in HCP tab in OMR unsuccessful as it went to voicemail. Patient himself would awaken to voice, nod his head to what I was saying, though was unclear if he was truly understanding then would nod off again. Accordingly, all history comes from notes reviewed prior to patient being transferred to medical ward. Per review of notes, he last saw Dr at the end of , was written for a steroid taper which he completed yesterday. Per wife's discussion with he had worsening confusion and generalized weakness over the past few days as a result. He had not yet started TMZ\/radiation which was planned for . He initially presented to where was c\/f increased vasogenic edema so was transferred to . "}],"ideal":"no"}
{"input":[{"role":"system","content":"is note from Emergency Department?"},{"role":"user","content":"Chief Complaint: throat swelling\nHistory of Present Illness: Ms. is a y\/o with recent prolonged hospitalization for angioedema requiring cricothyrotomy and open tracheostomy, who presented with reccurrence of angioedema requiring repeat tracheostomy on . Patient had a prolonged hospital course from for angioedema with multiple complications and difficulty with trach decannulation. The etiology of her angioedema was unclear, and per report she followed-up with an allergist following discharge, however these notes were unfortunately unavailable. Patient again presented on with reoccurrence of her angioedema, concerning for impending respiratory failure. She was given 80 mg of methylprednisolone IV, 50 mg of diphenhydramine IV, 40 mg of famotidine IV, and 0.3 mg of epinephrine via EpiPen, and then was taken to the OR for an open tracheostomy without complication. Though she was initially placed on the ventilator, she was quickly weaned off to trach mask. Upon interview, she is endorsing productive cough & pain associated with trach site, similar to days prior. Denying any SOB, wheezing. Does have some abd discomfort associated with constipation (4d since last BM). Denies any associated nausea & dyspepsia. No CP, f\/c. \nDischarge Diagnosis: Primary Diagnoses ================= Hereditary angioedema, likely type 2, flare Subglottic stenosis Tracheobronchitis Dysphagia Hemorrhoids Secondary Diagnoses =================== GERD Normocytic Anemia "}],"ideal":"no"}
Hi @andrew-openai,
Thanks for the requested change, I have changed the prompt to identify if important keywords i.e. identified diagnosis belongs to emergency department. This is close to how healthcare providers do as well, they identify the patient complaint and diagnosis and use the same to identify if patient has been to ED.
Hope this helps!