whylogs
whylogs copied to clipboard
WhyLabsWriter refactor
Description
Writable
represents output to be serialized by whylogs. It will write itself to 1 or more temporary files. A Writer
takes the temporary file(s) and sends them to their intended destination. The interface allows any Writer
to handle any Writable
, but in practice some Writer
s are only interested in certain types of Writable
s.
WhyLabsWriter
acts as the main entry point for sending data to WhyLabs. It makes use of a new WhyLabsClient
interface for interacting with WhyLabs REST APIs. WhyLabsWriter
has some deprecated methods that duplicate WhyLabsClient
functionality the sake of backwards compatibility. WhyLabsWriter
delegates to the WhyLabsTransactionWriter
, WhyLabsBatchWriter
, and WhyLabsReferenceWriter
classes according to the WhyLabsWriter::write()
use case. The 3 classes correspond to the 3 REST API endpoints for uploading profiles (transaction, batch/async, and reference).
WhyLabsWriterBase
implements a few utility methods shared by the various WhyLabs*Writer
classes. In particular, _prepare_view_for_upload()
handles processing required before uploading a profile (uncompounding, custom performance metric tagging). _send_writable_to_whylabs()
serializes a profile in either V0 or V1 format and uploads it to WhyLabs. _upload_view()
is a convenience method that just calls _prepare_view_for_upload()
then _send_writable_to_whylabs()
. WhyLabsWriter::write()
accepts a variety of data structures representing a profile: ViewResultSet
, ProfileResultSet
, SegmentedResultSet
, DatasetProfile
, DatasetProfileView
, and SegmentedDatasetProfileView
. WhyLabsWriterBase::_get_view_of_writable()
converts all of those except SegmentedResultSet
to either DatasetProfileView
or SegmentedDatasetProfileView
, which represent a single profile/segment to be uploaded to WhyLabs. The WhyLabs*Writer
classes generally iterate over SegmentedResultSet
uploading each segment.
- Transactions do not support zipped profiles or reference profiles
- Segmented batch or reference profiles can be zipped by adding
zip=True
argument towrite()
Changes
TODO: describe API changes
Related
- [ ] I have reviewed the Guidelines for Contributing and the Code of Conduct.
I found two examples that breaks with the changes in this PR:
- performance_estimation.ipynb error:
AttributeError: 'WhyLabsEstimationResultWriter' object has no attribute '_org_id'
This one's fixed. I also found a bug where it possible to try to write an EstimationResult
with a None
accuracy. Is there a use car for None
accuracies?
- Writing_Reference_Profiles_to_WhyLabs.ipynb
error: `❌ Failed to upload reference profile: cannot unpack non-iterable bool object ...
Also fixed.
Please summarize what parts of the public API are being removed and deprecated with this change. We have to work through how much this affects the public surface area of our API and decide on major/minor version bump before we can merge these changes.
The main breaking change is in the Writable
interface, which hopefully no-one is using:
def write(self, path: Optional[str] = None, **kwargs: Any) -> Tuple[bool, str]:
becomes
def write(
self, path: Optional[str] = None, filename: Optional[str] = None, **kwargs: Any
) -> Tuple[bool, Union[str, List[str]]]: # TODO: Union[str, List[Tupble[bool, str]]] ?
Writable
s were inconsistent about whether the original path
was a directory to write to or the full filename, so now they're explicitly separate. A Writable
can now write multiple files (e.g., SegmentedResultSet
), so it can return a list of files written.
The writer()
convenience method is also moved into the Writable
class, and returns a generic wrapper that can handle any combination of Writable
and Writer
rather than having a separate wrapper class for each Writable
.
Writer::write()
can also optionally return a list of results when it writes multiple things:
def write(
self,
file: Writable,
dest: Optional[str] = None,
**kwargs: Any,
- ) -> Tuple[bool, str]:
+ ) -> Tuple[bool, Union[str, List[Tuple[bool, str]]]]:
Please summarize what parts of the public API are being removed and deprecated with this change. We have to work through how much this affects the public surface area of our API and decide on major/minor version bump before we can merge these changes.
Ok its not breaking examples or public API, but adding functionality and some behavior cleanup, I think we can release in 1.4.0rc0