lbrytools icon indicating copy to clipboard operation
lbrytools copied to clipboard

Clean up orphaned blobs from claims

Open belikor opened this issue 3 years ago • 1 comments

With LBRY when a claim is downloaded, it downloads blob files that are stored on the blobfiles directory. In Linux this is normally

/home/user/.local/share/lbrynet/blobfiles

However, if the claim is re-uploaded, for example, if the file is re-encoded, the blobs will be different. A new set of blobs will have to be downloaded, but the old blobs will remain in the system taking hard drive space.

A function needs to be created to examine the blobfiles directory so that only the currently managed claims have blobs. All other blobs, which are not tied to a specific claim, should be deleted so that they don't take unnecessary space in the system.


Each claim with a URI or 'claim_id' will have a "manifest" blob file. This blob file is named after the 'sd_hash' of the claim. This information is found under a specific key in the dictionary representing the claim, item["value"]["source"]["sd_hash"].

Inside this manifest blob file there is JSON data with all blobs that make the claim. Therefore, by examining this manifest blob file, we can know if all its blobs are present in the blobfiles directory or not.

We can get all claims with search.sort_files (lbrynet file list), and examine the 'sd_hash' of each of them, to find all blobs in blobfiles.

All additional blobs that don't seem to belong to any claim, that is, that are not contained in any manifest blob file, should be considered orphaned, and thus can be deleted from the system.

Reference documentation of how the content is encoded in LBRY by using blobs https://lbry.tech/spec#data

belikor avatar May 26 '21 23:05 belikor