blaze icon indicating copy to clipboard operation
blaze copied to clipboard

amend history only when resource has changed

Open MM-Lehmann opened this issue 1 year ago • 3 comments

We're using blaze in a daily ETL full-load setup, i.e. every day we're uploading all of our ~50k data samples, letting blaze do the history management (in case of changed resource details). However, even when nothing has changed (99%+ of the data each day), a new history entry is created, filling the storage quite quickly. It feels like a bug or shortcoming that blaze does not recognize that nothing has changed and I would expect that no history is amended in this case.

MM-Lehmann avatar Jul 26 '22 06:07 MM-Lehmann

Hi Martin, I had this discussion in the FHIR Chat in 2020: https://chat.fhir.org/#narrow/stream/179166-implementers/topic/History.20of.20Resource.20Update.20with.20Identical.20Content

In the discussion it was agreed that the server can choose to either introduce new versions or not. In general new versions were preferred for clinical use-cases but it was also agreed that ETL processes can be a problem.

I tested that HAPI doesn't introduce new versions if the content doesn't change.

In the end, I don't see any support in the FHIR specification for deduplicating versions created by non-incremental ETL processes. Doing such no-op Updates is especially bad for Blaze, because it's designed around keeping track of every change. Blaze even has an Event-Driven Architecture were every update will result in a storage increase. Blaze may support cutting the history at some time, but today every update or delete should be considered as storage costly as the creation of a new resource. So only business relevant updates/deletes should be done.

I will leave that issue open in order to discuss how it would be possible to do the deduplication in your ETL process or even build an A/B Blaze deployment were you import every day and switch the sides for queries.

alexanderkiel avatar Aug 02 '22 15:08 alexanderkiel

Thanks for the summary. It's really hard to find out in our setup, which resources have changed or were deleted. The only scenario I can think of, is to download everything from blaze and compare each resource, effectively only uploading the differences. I hope upload and download don't have different structures, but I guess I will have to try this out some time. Right now, we're still resetting the volume when it's getting too big (see #399). Any chance for a proper warning from blaze instead of silent failure in this case?

MM-Lehmann avatar Aug 16 '22 12:08 MM-Lehmann

Any chance for a proper warning from blaze instead of silent failure in this case?

I would recommend to monitor your server using something like Prometheus and Node Exporter.

alexanderkiel avatar Aug 16 '22 15:08 alexanderkiel