evidently icon indicating copy to clipboard operation
evidently copied to clipboard

Using Evidently UI with S3

Open sean-hickey-wf opened this issue 1 year ago • 8 comments

Hi,

I've recently seen your new EvidentlyUI an it looks fantastic! I am looking to implement it and I'm hoping to use an external storage system to store the rick json files rather than storing them locally - Is there a way I can achieve this and if so, do you have a minimal example of how to do this? I am struggling to implement this using the example provided in the docs.

Thanks!

sean-hickey-wf avatar Jul 31 '23 16:07 sean-hickey-wf

Hi @sean-hickey-wf, there is also a remote workspace example here

https://github.com/evidentlyai/evidently/tree/main/examples/service

elenasamuylova avatar Jul 31 '23 16:07 elenasamuylova

Hey @elenasamuylova thanks for getting back to me so quickly!

I think I just misunderstood the flow of what needs to be done. Based on what I can see (and please correct me if I'm wrong!) the flow should be for example:

  1. Test model predictions
  2. Push results to s3
  3. Create a script that pulls down results and then builds the monitoring dashboard on top

sean-hickey-wf avatar Jul 31 '23 17:07 sean-hickey-wf

Hi @sean-hickey-wf,

Right now the JSON snapshots must be stored at the same machine where the Evidently Monitoring service runs. They must be stored in a directory accessible by UI service.

In a simple scenario, you can compute snapshots locally, and run the UI locally.

In a more complex scenario, you can compute snapshots locally (or anywhere you can run the Evidently library to generate them), and then send the snapshots to a remote server where you run the Monitoring UI. You can use the remote workspace API to create and manage projects.

We do not have a more detailed example yet - this is a very fresh release, but we might add them in the future.

Where would you want to run the Evidently Monitoring UI service in your scenario?

CC @mike0sv if there is anything to add!

elenasamuylova avatar Jul 31 '23 17:07 elenasamuylova

Hey @elenasamuylova,

This is very much a side project for me that I am trying to learn so apologies if I am going about this the incorrect way or if what I am saying doesn't make complete sense.

In the current set up what I have is a Grafana dashboard that pulls metrics from a hosted Postgres DB. This is really handy because I have a scoring job that calculates metrics on new data which is then pushed to Postgres and Grafana updates periodically so the new data comes in.

I want to try and move away from that and test the new Evidently UI to see if I can get the same (ish) flow and I am struggling to connect the pieces which is probably inexperience on my part.

What I would like to do is:

  • Set up an Evidently Monitoring UI service (I assume Docker and EC2 would be the best place for this). I can get it to run locally pretty easily thanks to the example but I am struggling with the remote side (again, probably inexperience).
  • Push new reports to the service ( This part I am struggling with)
  • Refresh the Evidently UI to incorporate the new reports

Happy to wait for a more detailed example! I do think this could be fantastic for our monitoring needs though which is why I tried to set it up early!

sean-hickey-wf avatar Aug 01 '23 08:08 sean-hickey-wf

Going to answer my own questions here and ask some extras if you don't mind!

  • I now have a Remote Evidently UI service set up that can accept snapshots and I can visualise the results (Woohoo!)
  • Questions I have (I have not gotten as far as testing but I think you might already have the answers:
    1. Does the Evidently UI service update automagically or does that need to be controlled by the user? For example if I start adding new reports will the dashboard update to include these new reports or do I need to sort that myself?
    2. Has there been any consideration into how the snapshots should be stored on the remote service? For example, if the service experiences an outage or has to be pulled down it seems likely that all previous snapshots will need to be regenerated. Would it also make sense to back the snapshots up to S3 or similar and then sync to there in the first instance?

Thanks for all your help so far! Really excited to try and get this into production hence all the questions

sean-hickey-wf avatar Aug 01 '23 12:08 sean-hickey-wf

Hi @sean-hickey-wf

Thanks a lot for sharing - this is very helpful for us to understand what to document better and which examples to add in the future.

Push new reports to the service ( This part I am struggling with)

In the current scenario, this can be any script that would move the JSON snapshots from where you have them and push them to the workspace directory on the machine where the UI service runs.

Does the Evidently UI service update automagically or does that need to be controlled by the user? For example if I start adding new reports will the dashboard update to include these new reports or do I need to sort that myself?

Yes, if you add new snapshots to the workspace directory, they will be visible in the UI after you refresh the page. Please let us know if you observe a different behavior - we’ll take a look.

Has there been any consideration into how the snapshots should be stored on the remote service? For example, if the service experiences an outage or has to be pulled down it seems likely that all previous snapshots will need to be regenerated. Would it also make sense to back the snapshots up to S3 or similar and then sync to there in the first instance?

It would indeed make sense to back up the snapshots in production scenarios like you suggest. (To note, as long as you persist the original prediction logs, you can also at any point regenerate the snapshots for past periods and add them back to the workspace. Evidently allows assigning arbitrary timestamps so it's possible to populate the logs for past periods or deliver them async).

CC @mike0sv in case there is any suggested best practice!

Are there any blockers you face now to proceed with the implementation?

elenasamuylova avatar Aug 01 '23 13:08 elenasamuylova

In the next release we will improve report update logic? you will also be able to remove or update reports/projects without service restart, not only add new stuff.

As for remote storage, we don't support it, but we might in the future. You can expose workspace dir via volume if you run service inside docker container and then use something like aws s3 sync with cron to backup your snapshots

mike0sv avatar Aug 08 '23 08:08 mike0sv

@mike0sv @elenasamuylova thanks so much for all the help! The team and I will be setting this up for our models hopefully very soon and I think I now have the full picture of how to set it up and ensure all snapshots persist via the s3 sync idea.

Feel free to close this issue or you can leave it open if you think it will help any other budding data scientists!

sean-hickey-wf avatar Aug 13 '23 18:08 sean-hickey-wf