MedCLIP icon indicating copy to clipboard operation
MedCLIP copied to clipboard

Change all URLs to new CDN

Open Mauville opened this issue 6 months ago • 1 comments

Medpix has moved all their content off to a cdn

Old links looked like https://medpix.nlm.nih.gov/images/full/synpic52419.jpg

Now they look like https://d168r5mdg5gtkq.cloudfront.net/medpix/img/full/synpic17159.jpg

The dataset links need to be modified. It appears that a simple rename should work, but if the cdn is constantly changing, then this could become a reoccurring problem.

A simple fix for the scraper is adding the following line

filename = url.split("/")[-1]
    url= f"https://d168r5mdg5gtkq.cloudfront.net/medpix/img/full/{filename}"
    urllib.request.urlretrieve(url, f"/content/drive/Shareddrives/DeepLearning/data/output/{filename}")

Mauville avatar Aug 14 '24 16:08 Mauville