whylogs icon indicating copy to clipboard operation
whylogs copied to clipboard

WhyLabsWriter refactor

Open richard-rogers opened this issue 11 months ago • 3 comments

Description

Writable represents output to be serialized by whylogs. It will write itself to 1 or more temporary files. A Writer takes the temporary file(s) and sends them to their intended destination. The interface allows any Writer to handle any Writable, but in practice some Writers are only interested in certain types of Writables.

WhyLabsWriter acts as the main entry point for sending data to WhyLabs. It makes use of a new WhyLabsClient interface for interacting with WhyLabs REST APIs. WhyLabsWriter has some deprecated methods that duplicate WhyLabsClient functionality the sake of backwards compatibility. WhyLabsWriter delegates to the WhyLabsTransactionWriter, WhyLabsBatchWriter, and WhyLabsReferenceWriter classes according to the WhyLabsWriter::write() use case. The 3 classes correspond to the 3 REST API endpoints for uploading profiles (transaction, batch/async, and reference).

WhyLabsWriterBase implements a few utility methods shared by the various WhyLabs*Writer classes. In particular, _prepare_view_for_upload() handles processing required before uploading a profile (uncompounding, custom performance metric tagging). _send_writable_to_whylabs() serializes a profile in either V0 or V1 format and uploads it to WhyLabs. _upload_view() is a convenience method that just calls _prepare_view_for_upload() then _send_writable_to_whylabs(). WhyLabsWriter::write() accepts a variety of data structures representing a profile: ViewResultSet, ProfileResultSet, SegmentedResultSet, DatasetProfile, DatasetProfileView, and SegmentedDatasetProfileView. WhyLabsWriterBase::_get_view_of_writable() converts all of those except SegmentedResultSet to either DatasetProfileView or SegmentedDatasetProfileView, which represent a single profile/segment to be uploaded to WhyLabs. The WhyLabs*Writer classes generally iterate over SegmentedResultSet uploading each segment.

  • Transactions do not support zipped profiles or reference profiles
  • Segmented batch or reference profiles can be zipped by adding zip=True argument to write()

Changes

TODO: describe API changes

Related

zipped batch profiles

  • [ ] I have reviewed the Guidelines for Contributing and the Code of Conduct.

richard-rogers avatar Mar 15 '24 16:03 richard-rogers

I found two examples that breaks with the changes in this PR:

  • performance_estimation.ipynb error: AttributeError: 'WhyLabsEstimationResultWriter' object has no attribute '_org_id'

This one's fixed. I also found a bug where it possible to try to write an EstimationResult with a None accuracy. Is there a use car for None accuracies?

  • Writing_Reference_Profiles_to_WhyLabs.ipynb

error: `❌ Failed to upload reference profile: cannot unpack non-iterable bool object ...

Also fixed.

richard-rogers avatar Apr 17 '24 14:04 richard-rogers

Please summarize what parts of the public API are being removed and deprecated with this change. We have to work through how much this affects the public surface area of our API and decide on major/minor version bump before we can merge these changes.

The main breaking change is in the Writable interface, which hopefully no-one is using:

    def write(self, path: Optional[str] = None, **kwargs: Any) -> Tuple[bool, str]:

becomes

    def write(
        self, path: Optional[str] = None, filename: Optional[str] = None, **kwargs: Any
    ) -> Tuple[bool, Union[str, List[str]]]:  # TODO: Union[str, List[Tupble[bool, str]]] ?

Writables were inconsistent about whether the original path was a directory to write to or the full filename, so now they're explicitly separate. A Writable can now write multiple files (e.g., SegmentedResultSet), so it can return a list of files written.

The writer() convenience method is also moved into the Writable class, and returns a generic wrapper that can handle any combination of Writable and Writer rather than having a separate wrapper class for each Writable.

Writer::write() can also optionally return a list of results when it writes multiple things:

    def write(
        self,
        file: Writable,
        dest: Optional[str] = None,
        **kwargs: Any,
-  ) -> Tuple[bool, str]:
+  ) -> Tuple[bool, Union[str, List[Tuple[bool, str]]]]:

richard-rogers avatar Apr 25 '24 22:04 richard-rogers

Please summarize what parts of the public API are being removed and deprecated with this change. We have to work through how much this affects the public surface area of our API and decide on major/minor version bump before we can merge these changes.

Ok its not breaking examples or public API, but adding functionality and some behavior cleanup, I think we can release in 1.4.0rc0

jamie256 avatar Apr 26 '24 18:04 jamie256