troposphere icon indicating copy to clipboard operation
troposphere copied to clipboard

mashmallow or pydantic models from json-schema

Open dazza-codes opened this issue 4 years ago • 4 comments

Apologies if this already discussed somewhere - I am new to this project. To get started I used cfn2py script and tested some round-trip serializations back to json and yaml using some deepdiff and cfn-lint checks.

One concern is that boolean values are not JSON booleans but strings. Why does the t.to_dict() and t.to_json() data contain strings instead of JSON booleans? It seems like encode_to_dict(obj) should be replaced with just a json.loads(json.dumps(obj)) and let the json lib take care of all the necessary python/JSON compatibility and encodings.

Or using marshmallow or pydantic models in general should take care of all the schema mappings and serializations. It might also be easier to use botocore service descriptions or other AWS json payloads to auto-generate json-schema and models. It's not quite the same thing as CFN templates, but botocore has service API descriptions in e.g. lib/python3.7/site-packages/botocore/data/cloudformation/2010-05-15/service-2.json; see also

  • https://pydantic-docs.helpmanual.io/datamodel_code_generator/
  • https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-resource-specification.html
  • https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/resource-type-schemas.html

The resource-type-schemas might be amenable to auto-generation of service models using mashmallow or pydantic schema parsers and code generators. If something like this were to work well, it might eliminate most if not all of the issues about supporting new features in CFN services.

In this example, the cfn_schemas is a directory with unzipped data from a regional download .zip file.


pip install datamodel-code-generator[http]

wget https://schema.cloudformation.us-east-1.amazonaws.com/CloudformationSchema.zip
mkdir cfn_schemas
mv CloudformationSchema.zip cfn_schemas/
cd cfn_schemas/
unzip CloudformationSchema.zip 
cd ..
datamodel-codegen  --input cfn_schemas/aws-s3-bucket.json --input-file-type jsonschema --output aws_s3_bucket.py
cat aws_s3_bucket.py 

The pydantic models have built in serializations.

The resulting aws_s3_bucket.py contains:

# generated by datamodel-codegen:
#   filename:  aws-s3-bucket.json
#   timestamp: 2021-03-11T02:07:54+00:00

from __future__ import annotations

from typing import List, Optional

from pydantic import BaseModel


class DefaultRetention(BaseModel):
    Years: Optional[int] = None
    Days: Optional[int] = None
    Mode: Optional[str] = None


class ReplicationTimeValue(BaseModel):
    Minutes: int


class FilterRule(BaseModel):
    Value: str
    Name: str


class AccelerateConfiguration(BaseModel):
    AccelerationStatus: str


class Metrics(BaseModel):
    Status: str
    EventThreshold: Optional[ReplicationTimeValue] = None


class RoutingRuleCondition(BaseModel):
    KeyPrefixEquals: Optional[str] = None
    HttpErrorCodeReturnedEquals: Optional[str] = None


class DeleteMarkerReplication(BaseModel):
    Status: Optional[str] = None


class OwnershipControlsRule(BaseModel):
    ObjectOwnership: Optional[str] = None


class CorsRule(BaseModel):
    ExposedHeaders: Optional[List[str]] = None
    AllowedMethods: List[str]
    AllowedOrigins: List[str]
    AllowedHeaders: Optional[List[str]] = None
    MaxAge: Optional[int] = None
    Id: Optional[str] = None


class AccessControlTranslation(BaseModel):
    Owner: str


class ObjectLockRule(BaseModel):
    DefaultRetention: Optional[DefaultRetention] = None


class S3KeyFilter(BaseModel):
    Rules: List[FilterRule]


class Destination(BaseModel):
    BucketArn: str
    BucketAccountId: Optional[str] = None
    Format: str
    Prefix: Optional[str] = None


class RedirectAllRequestsTo(BaseModel):
    Protocol: Optional[str] = None
    HostName: str


class TagFilter(BaseModel):
    Value: str
    Key: str


class PublicAccessBlockConfiguration(BaseModel):
    RestrictPublicBuckets: Optional[bool] = None
    IgnorePublicAcls: Optional[bool] = None
    BlockPublicPolicy: Optional[bool] = None
    BlockPublicAcls: Optional[bool] = None


class NoncurrentVersionTransition(BaseModel):
    StorageClass: str
    TransitionInDays: int


class ServerSideEncryptionByDefault(BaseModel):
    SSEAlgorithm: str
    KMSMasterKeyID: Optional[str] = None


class MetricsConfiguration(BaseModel):
    TagFilters: Optional[List[TagFilter]] = None
    Id: str
    Prefix: Optional[str] = None


class ObjectLockConfiguration(BaseModel):
    ObjectLockEnabled: Optional[str] = None
    Rule: Optional[ObjectLockRule] = None


class LoggingConfiguration(BaseModel):
    DestinationBucketName: Optional[str] = None
    LogFilePrefix: Optional[str] = None


class Tiering(BaseModel):
    AccessTier: str
    Days: int


class DataExport(BaseModel):
    Destination: Destination
    OutputSchemaVersion: str


class ReplicationTime(BaseModel):
    Status: str
    Time: ReplicationTimeValue


class RedirectRule(BaseModel):
    ReplaceKeyWith: Optional[str] = None
    HttpRedirectCode: Optional[str] = None
    Protocol: Optional[str] = None
    HostName: Optional[str] = None
    ReplaceKeyPrefixWith: Optional[str] = None


class EncryptionConfiguration(BaseModel):
    ReplicaKmsKeyID: str


class InventoryConfiguration(BaseModel):
    Destination: Destination
    OptionalFields: Optional[List[str]] = None
    IncludedObjectVersions: str
    Enabled: bool
    Id: str
    Prefix: Optional[str] = None
    ScheduleFrequency: str


class ReplicationRuleAndOperator(BaseModel):
    TagFilters: Optional[List[TagFilter]] = None
    Prefix: Optional[str] = None


class VersioningConfiguration(BaseModel):
    Status: str


class CorsConfiguration(BaseModel):
    CorsRules: List[CorsRule]


class ReplicaModifications(BaseModel):
    Status: str


class Transition(BaseModel):
    TransitionDate: Optional[str] = None
    TransitionInDays: Optional[int] = None
    StorageClass: str


class SseKmsEncryptedObjects(BaseModel):
    Status: str


class Tag(BaseModel):
    Value: str
    Key: str


class AbortIncompleteMultipartUpload(BaseModel):
    DaysAfterInitiation: int


class SourceSelectionCriteria(BaseModel):
    ReplicaModifications: Optional[ReplicaModifications] = None
    SseKmsEncryptedObjects: Optional[SseKmsEncryptedObjects] = None


class OwnershipControls(BaseModel):
    Rules: List[OwnershipControlsRule]


class RoutingRule(BaseModel):
    RedirectRule: RedirectRule
    RoutingRuleCondition: Optional[RoutingRuleCondition] = None


class NotificationFilter(BaseModel):
    S3Key: S3KeyFilter


class ServerSideEncryptionRule(BaseModel):
    BucketKeyEnabled: Optional[bool] = None
    ServerSideEncryptionByDefault: Optional[ServerSideEncryptionByDefault] = None


class ReplicationDestination(BaseModel):
    AccessControlTranslation: Optional[AccessControlTranslation] = None
    Account: Optional[str] = None
    Metrics: Optional[Metrics] = None
    Bucket: str
    EncryptionConfiguration: Optional[EncryptionConfiguration] = None
    StorageClass: Optional[str] = None
    ReplicationTime: Optional[ReplicationTime] = None


class Rule(BaseModel):
    Status: str
    NoncurrentVersionExpirationInDays: Optional[int] = None
    Transitions: Optional[List[Transition]] = None
    TagFilters: Optional[List[TagFilter]] = None
    NoncurrentVersionTransitions: Optional[List[NoncurrentVersionTransition]] = None
    Prefix: Optional[str] = None
    NoncurrentVersionTransition: Optional[NoncurrentVersionTransition] = None
    ExpirationDate: Optional[str] = None
    ExpirationInDays: Optional[int] = None
    Transition: Optional[Transition] = None
    Id: Optional[str] = None
    AbortIncompleteMultipartUpload: Optional[AbortIncompleteMultipartUpload] = None


class WebsiteConfiguration(BaseModel):
    RoutingRules: Optional[List[RoutingRule]] = None
    IndexDocument: Optional[str] = None
    RedirectAllRequestsTo: Optional[RedirectAllRequestsTo] = None
    ErrorDocument: Optional[str] = None


class TopicConfiguration(BaseModel):
    Event: str
    Topic: str
    Filter: Optional[NotificationFilter] = None


class IntelligentTieringConfiguration(BaseModel):
    Status: str
    TagFilters: Optional[List[TagFilter]] = None
    Tierings: List[Tiering]
    Id: str
    Prefix: Optional[str] = None


class StorageClassAnalysis(BaseModel):
    DataExport: Optional[DataExport] = None


class LambdaConfiguration(BaseModel):
    Function: str
    Event: str
    Filter: Optional[NotificationFilter] = None


class ReplicationRuleFilter(BaseModel):
    Prefix: Optional[str] = None
    And: Optional[ReplicationRuleAndOperator] = None
    TagFilter: Optional[TagFilter] = None


class BucketEncryption(BaseModel):
    ServerSideEncryptionConfiguration: List[ServerSideEncryptionRule]


class LifecycleConfiguration(BaseModel):
    Rules: List[Rule]


class QueueConfiguration(BaseModel):
    Event: str
    Filter: Optional[NotificationFilter] = None
    Queue: str


class ReplicationRule(BaseModel):
    Status: str
    Destination: ReplicationDestination
    Filter: Optional[ReplicationRuleFilter] = None
    Priority: Optional[int] = None
    SourceSelectionCriteria: Optional[SourceSelectionCriteria] = None
    Id: Optional[str] = None
    Prefix: Optional[str] = None
    DeleteMarkerReplication: Optional[DeleteMarkerReplication] = None


class ReplicationConfiguration(BaseModel):
    Role: str
    Rules: List[ReplicationRule]


class AnalyticsConfiguration(BaseModel):
    TagFilters: Optional[List[TagFilter]] = None
    StorageClassAnalysis: StorageClassAnalysis
    Id: str
    Prefix: Optional[str] = None


class NotificationConfiguration(BaseModel):
    QueueConfigurations: Optional[List[QueueConfiguration]] = None
    LambdaConfigurations: Optional[List[LambdaConfiguration]] = None
    TopicConfigurations: Optional[List[TopicConfiguration]] = None


class Model(BaseModel):
    InventoryConfigurations: Optional[List[InventoryConfiguration]] = None
    WebsiteConfiguration: Optional[WebsiteConfiguration] = None
    DualStackDomainName: Optional[str] = None
    AccessControl: Optional[str] = None
    AnalyticsConfigurations: Optional[List[AnalyticsConfiguration]] = None
    AccelerateConfiguration: Optional[AccelerateConfiguration] = None
    PublicAccessBlockConfiguration: Optional[PublicAccessBlockConfiguration] = None
    BucketName: Optional[str] = None
    RegionalDomainName: Optional[str] = None
    OwnershipControls: Optional[OwnershipControls] = None
    ObjectLockConfiguration: Optional[ObjectLockConfiguration] = None
    ObjectLockEnabled: Optional[bool] = None
    LoggingConfiguration: Optional[LoggingConfiguration] = None
    ReplicationConfiguration: Optional[ReplicationConfiguration] = None
    Tags: Optional[List[Tag]] = None
    DomainName: Optional[str] = None
    BucketEncryption: Optional[BucketEncryption] = None
    WebsiteURL: Optional[str] = None
    NotificationConfiguration: Optional[NotificationConfiguration] = None
    LifecycleConfiguration: Optional[LifecycleConfiguration] = None
    VersioningConfiguration: Optional[VersioningConfiguration] = None
    MetricsConfigurations: Optional[List[MetricsConfiguration]] = None
    IntelligentTieringConfigurations: Optional[
        List[IntelligentTieringConfiguration]
    ] = None
    CorsConfiguration: Optional[CorsConfiguration] = None
    Id: Optional[str] = None
    Arn: Optional[str] = None

dazza-codes avatar Mar 11 '21 01:03 dazza-codes

I'll have to come back to read your additional comments. But when running your tests, did you set the TROPO_REAL_BOOL environment variable? The mapping is done here. This was added for backwards compatibility and will be the default in the next major revision.

markpeek avatar Mar 14 '21 19:03 markpeek

The TROPO_REAL_BOOL was not set.

dazza-codes avatar Mar 15 '21 19:03 dazza-codes

I would like to second that using Pydantic is really sweet. Typehints, serialization, Literals, etc.; it has been so agile to use. But not sure how big an overhaul it would be for this repo.

Looking at this PR for example: https://github.com/cloudtools/troposphere/pull/1858/files Seems like all of the definitions could be Pydantic BaseModels. But there is likely lots of machinery that rely on the current form 🤷

lautjy avatar Apr 06 '21 07:04 lautjy

@dazza-codes @lautjy I found a python library https://github.com/MacHu-GWU/cottonformation-project#welcome-to-cottonformation-documentation seems like they did exactly what you said about the Typehint, Parameter suggest and validation.

Seems like this guy use the cloudformation schema json file from AWS and jinja2 automatically generates all those code, I think we can borrow this to here.

  • generate code from schema json file: https://github.com/MacHu-GWU/cottonformation-project/blob/main/cottonformation/code/spec.py#L686
  • the generated code: https://github.com/MacHu-GWU/cottonformation-project/tree/main/cottonformation/res

angoraking avatar Jun 27 '21 03:06 angoraking