Resume-Matcher icon indicating copy to clipboard operation
Resume-Matcher copied to clipboard

⚠️ Resume Parsing Fails When Projects Section Is Missing – Should Handle Missing or Null Sections Gracefully

Open rahimnadan opened this issue 9 months ago • 8 comments

🧩 Summary

While testing the resume parsing flow, I encountered a validation error when uploading a resume that does not include a Projects section. The backend fails with a Pydantic validation error due to NoneType values for projectName and description.


🪵 Logs


\[dev\:backend] \[2025-07-24T16:13:00+0500 - app.services.resume\_service - INFO] Validation error: 2 validation errors for StructuredResumeModel
\[dev\:backend] Projects.0.projectName
\[dev\:backend]   Input should be a valid string \[type=string\_type, input\_value=None, input\_type=NoneType]
\[dev\:backend] Projects.0.description
\[dev\:backend]   Input should be a valid string \[type=string\_type, input\_value=None, input\_type=NoneType]
\[dev\:backend] \[2025-07-24T16:13:00+0500 - app.api.router.v1.resume - WARNING] Resume validation failed: Resume structure validation failed: Resume validation failed. Projects -> 0 -> projectName: Input should be a valid string; Projects -> 0 -> description: Input should be a valid string


📌 Steps to Reproduce

  1. Clone and set up the project locally.
  2. Upload a resume that does not contain a Projects section.
  3. Observe the resume parsing failure in the backend terminal.

🤔 Root Cause (Likely)

It appears that the resume parser expects all sections (like Projects) to always be present. When this section is omitted or contains entries with missing fields (None), Pydantic validation fails due to required string constraints.


✅ Suggested Fix

  • Update Pydantic model to handle optional fields:

    from typing import Optional
    
    class ProjectModel(BaseModel):
        projectName: Optional[str] = None
        description: Optional[str] = None
    
  • Use default empty lists for optional sections:

    projects: List[ProjectModel] = []
    
  • Add preprocessing to discard null project entries before validation.

  • Generalize the parser to gracefully skip empty/missing sections, such as Projects, Certifications, etc.


🔁 Why This Matters

Not all resumes follow the same structure. Many senior professionals skip sections like Projects, Internships, etc. A robust system should be resilient enough to parse valid resumes even if some sections are missing, and should not crash or halt due to missing values.


🙏 Request

Please enhance the parser and schema validation to handle incomplete or sectionless resumes gracefully.

Happy to contribute a PR for this if needed. Thank you for maintaining this great project! 🚀

rahimnadan avatar Jul 24 '25 11:07 rahimnadan

I encountered a similar issue with the job description.

The job description did not specify the work arrangement (remote, hybrid, or on-site). As a result, an error appeared in the logs because a required value was missing.

To fix this, I had to update the job description to indicate that the position is on-site only.

yannsadowski avatar Jul 24 '25 13:07 yannsadowski

You are right, but I think the system should be more robust. That can be able to parse any kind of CV.

The other thing is that LLMs have the limitations to parse the output as a valid JSON, because it's native issue with them. To solve the problem, I think the best approach is to parse the CV to markdown format instead of JSON which is pretty much LLM friendly.

rahimnadan avatar Jul 24 '25 16:07 rahimnadan

You are right, but I think the system should be more robust. That can be able to parse any kind of CV.

The other thing is that LLMs have the limitations to parse the output as a valid JSON, because it's native issue with them. To solve the problem, I think the best approach is to parse the CV to markdown format instead of JSON which is pretty much LLM friendly.

Yep, the validations should be more relaxed and just handle the missing fields without crashing the app.

LuisaG avatar Jul 24 '25 22:07 LuisaG

got a similar error. My resume was built using Open Resume The parser for this project seems to be too strict

[2025-07-31T09:12:56-0700 - app.services.resume_service - INFO] Validation error: 3 validation errors for StructuredResumeModel
[start:backend] Experiences.0.location
[start:backend]   Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
[start:backend]     For further information visit https://errors.pydantic.dev/2.11/v/string_type
[start:backend] Experiences.1.location
[start:backend]   Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
[start:backend]     For further information visit https://errors.pydantic.dev/2.11/v/string_type
[start:backend] Experiences.2.location
[start:backend]   Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
[start:backend]     For further information visit https://errors.pydantic.dev/2.11/v/string_type
[start:backend] [2025-07-31T09:12:56-0700 - app.api.router.v1.resume - WARNING] Resume validation failed: Resume structure validation failed: Resume validation failed. Experiences -> 0 -> location: Input should be a valid string; Experiences -> 1 -> location: Input should be a valid string; Experiences -> 2 -> location: Input should be a valid string

brian316 avatar Jul 31 '25 16:07 brian316

I fixed the issue locally by adding this line to the LLM prompt in the file apps/backend/app/prompt/structured_job.py

- Use "N/A" for any fields that do not apply or are not present.

so the full file looks like

PROMPT = """
You are a JSON extraction engine. Convert the following resume text into precisely the JSON schema specified below.
- Do not compose any extra fields or commentary.
- Do not make up values for any fields.
- Use "Present" if an end date is ongoing.
- Make sure dates are in YYYY-MM-DD.
- Use "N/A" for any fields that do not apply or are not present.
- Do not format the response in Markdown or any other format. Just output raw JSON.

Schema:
```json
{0}
```

Resume:
```text
{1}
```

NOTE: Please output only a valid JSON matching the EXACT schema.
"""

Once this passes I run into more issues such as

[start:backend] [2025-07-31T09:42:46-0700 - app.api.router.v1.resume - ERROR] Error: 'NoneType' object has no attribute 'get' - traceback: Traceback (most recent call last):
[start:backend]   File "/Users/brian/local_projects/Resume-Matcher/apps/backend/app/api/router/v1/resume.py", line 147, in score_and_improve
[start:backend]     improvements = await score_improvement_service.run(
[start:backend]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[start:backend]     ...<2 lines>...
[start:backend]     )
[start:backend]     ^
[start:backend]   File "/Users/brian/local_projects/Resume-Matcher/apps/backend/app/services/score_improvement_service.py", line 205, in run
[start:backend]     resume, processed_resume = await self._get_resume(resume_id)
[start:backend]                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[start:backend]   File "/Users/brian/local_projects/Resume-Matcher/apps/backend/app/services/score_improvement_service.py", line 99, in _get_resume
[start:backend]     self._validate_resume_keywords(processed_resume, resume_id)
[start:backend]     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[start:backend]   File "/Users/brian/local_projects/Resume-Matcher/apps/backend/app/services/score_improvement_service.py", line 57, in _validate_resume_keywords
[start:backend]     keywords = keywords_data.get("extracted_keywords", [])
[start:backend]                ^^^^^^^^^^^^^^^^^
[start:backend] AttributeError: 'NoneType' object has no attribute 'get'

I only changed the env to be this model

LL_MODEL="gemma3:12b"

regardless this relies to heavily on llm parsing and i dont think the structure of the pydantic model is being fed to the LLM properly

for example there is a hard coded schema apps/backend/app/schemas/json/structured_job.py that uses

"employmentType": "string",

but the actual pydantic model uses

class StructuredJobModel(BaseModel):
    ...
    employment_type: EmploymentTypeEnum = Field(..., alias="employmentType")

where EmploymentTypeEnum is not fed into the model.

a better solution would be to feed the real schema into the LLM not a hard coded one with StructuredJobModel.model_json_schema()

all around I ran into many errors and weird engineering implementations that it makes me wonder how much was written by AI

brian316 avatar Jul 31 '25 17:07 brian316

Yep the validations are much too strong. E.g. Missing phone number, city or country, are not accepted as valid CVs, and the flow is broken.

Mchristos avatar Aug 01 '25 13:08 Mchristos

Yep the validations are much too strong. E.g. Missing phone number, city or country, are not accepted as valid CVs, and the flow is broken.

@srbhr you could add a documentation clarifying with is the standard format is required for it to work.

SalahAdDin avatar Aug 12 '25 03:08 SalahAdDin

Yes @SalahAdDin However, once again, I'm working on updating the project with some major overhauls. New updates will take time.

srbhr avatar Aug 12 '25 03:08 srbhr