py-avro-schema icon indicating copy to clipboard operation
py-avro-schema copied to clipboard

Use Pydantic's `model_config` to allow record name override

Open cbini opened this issue 10 months ago • 4 comments

I'm migrating over to using Pydantic for defining and this library to generate Avro schema. It's awesome, but one challenge I'm running into is being able to override record names to maintain backwards compatibility.

For example, if I have one data asset partitioned across multiple Avro files in my data lake, BigQuery cannot successfully piece them together unless each file has consistent record names.

The workaround I've found is to subclass my Pydantic model with the legacy naming convention, but I feel like it would be much better if this library considered Pydantic's model_config. That way, I could keep everything in one, nicely named class, like so:

class StudentModel(BaseModel):
    model_config = ConfigDict(title="student_record")

    StudentID: str | None = None
    StudentSchoolID: str | None = None
    SecondaryStudentID: str | None = None
    StudentFirstName: str | None = None
    StudentMiddleName: str | None = None
    StudentLastName: str | None = None

Pydantic's BaseModel.model_json_schema() method behaves this way, but the way pas.generate() is set up, it just uses the literal class name.

cbini avatar Apr 11 '24 21:04 cbini

Hi @cbini

That seems a useful enhancement indeed.

Would this work for everybody always? I agree that Pydantic's notation of "title" maps nicely onto JSON Schema "title".

Without knowing JSON Schema though, I might interpret Pydantic's "title" concept as a human-friendly name in addition to the system-friendly class name. I guess Pydantic allows any string as title? Say including spaces? That would not necessarily lead to a valid Avro record schema...

faph avatar Apr 12 '24 12:04 faph

Hi @cbini, do you still require this?

faph avatar May 01 '24 19:05 faph

This would be a nice-to-have in a future version. The workaround I've implemented now is good enough though--haven't had much bandwidth to revisit it.

cbini avatar May 04 '24 19:05 cbini

Understood. Let’s keep this open.

I’m thinking this should be controlled through an additional option flag similar to what we use for Pydantic field aliases.

faph avatar May 07 '24 07:05 faph

This PR ^ add the following option USE_CLASS_ALIAS, similar to USE_FIELD_ALIAS. On top of that, it implements Avro name validation across the board.

faph avatar May 23 '24 09:05 faph