piccolo icon indicating copy to clipboard operation
piccolo copied to clipboard

Refactor `ModelBuilder` and `RandomBuilder`

Open jrycw opened this issue 2 years ago • 5 comments

I encountered ModelBuilder._randomize_attribute and initially found the numerous if-elif-else checks puzzling. However, further investigation revealed that certain factories necessitate additional information from the column. I attempted to consolidate these logics into RandomBuilder._build, aiming for improved clarity. Nevertheless, the final code may not be as pristine as desired. I'm looking forward to hearing the community's thoughts on whether this refactoring improves the codebase.

jrycw avatar Mar 22 '24 04:03 jrycw

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 92.82%. Comparing base (91d6238) to head (ad763c3). Report is 4 commits behind head on master.

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #971      +/-   ##
==========================================
+ Coverage   92.78%   92.82%   +0.03%     
==========================================
  Files         108      108              
  Lines        8182     8222      +40     
==========================================
+ Hits         7592     7632      +40     
  Misses        590      590              

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Mar 22 '24 04:03 codecov-commenter

This is great - thanks 👍

ModelBuilder is something which started fairly simple, but as more and more edge cases were discovered it has become quite complex.

I agree that your solution is cleaner and easier to follow.

I left a few comments.

@dantownsend I completely agree with you; the task turned out to be more complex than I initially anticipated. Building the mapper required careful handling of many cases, such as using partial, dealing with the return from get, and modifying the current next-ish method with default values to becoming a callable. While this PR is functional at the moment, I acknowledge that it may become challenging to maintain in the future. I am currently exploring alternative ideas and hope to provide another version soon.

jrycw avatar Mar 22 '24 17:03 jrycw

@dantownsend I completely agree with you; the task turned out to be more complex than I initially anticipated. Building the mapper required careful handling of many cases, such as using partial, dealing with the return from get, and modifying the current next-ish method with default values to becoming a callable. While this PR is functional at the moment, I acknowledge that it may become challenging to maintain in the future. I am currently exploring alternative ideas and hope to provide another version soon.

I think there's a lot of potential in this approach. Feel free to play around with other ideas, but this is a nice start.

dantownsend avatar Mar 22 '24 19:03 dantownsend

@dantownsend I completely agree with you; the task turned out to be more complex than I initially anticipated. Building the mapper required careful handling of many cases, such as using partial, dealing with the return from get, and modifying the current next-ish method with default values to becoming a callable. While this PR is functional at the moment, I acknowledge that it may become challenging to maintain in the future. I am currently exploring alternative ideas and hope to provide another version soon.

I think there's a lot of potential in this approach. Feel free to play around with other ideas, but this is a nice start.

I've just pushed another version. This version introduces a hook for users to register their own random type. I'll review your comments tomorrow as it's already midnight here in Asia.

jrycw avatar Mar 22 '24 19:03 jrycw

The third version may be a bit easier to understand:

  • RandomBuilder now offers a public API called get_mapper, enabling users to access the default random mapper.
  • Forming the type-callable pairs into a single dictionary involves several steps, which are encapsulated in the public API ModelBuilder.get_registry:
    • Initially, we import information from RandomBuilder.get_mapper, a one-time operation.
    • Next, we address situations where callables may require information from the column in ModelBuilder._get_local_mapper.
    • We then incorporate user-registered types into ModelBuilder._get_other_mapper.
    • Finally, we integrate the logic for list into reg = { **default_mapper, **cls._get_local_mapper(column), **cls._get_other_mapper(column)}, returning reg as a MappingProxyType.
  • I made an effort to comprehend the relationship between list and Array, but I'm afraid I may have gotten a bit lost in the codebase. I'm uncertain about the extent to which isinstance(column, Array) can handle the edge case effectively.
  • Handling enum.Enum remains a challenge.
  • I made a minor adjustment in ModelBuilder._build, where defaults = defaults or { } seems clearer to me.
  • The naming of variables and functions could be improved; I welcome any advice you may have on this matter.
  • Providing a method for users to unregister their custom types might be handy:
    @classmethod
    def unregister_types(cls) -> None:
        cls.__OTHER_MAPPER.clear()

jrycw avatar Mar 23 '24 06:03 jrycw