C-MS-Celeb icon indicating copy to clipboard operation
C-MS-Celeb copied to clipboard

Can’t download the original MS-Celeb-1M dataset?

Open LiuJoffrey opened this issue 6 years ago • 7 comments

Hi, I would like to download the original datasets, but I can’t find the download link from the website. Could you please provide the original dataset file or the download link for me. Please.

Thank you so much

LiuJoffrey avatar Jun 07 '19 15:06 LiuJoffrey

It is a pity that we have lost the original image data due to our carelessness in data preserving in the last two years. This cleaned file list here is what we have now. We actually did not know that Microsoft Research has taken down the original data until we see these issues.

EB-Dodo avatar Jun 09 '19 22:06 EB-Dodo

Hi, I would like to download the original datasets, but I can’t find the download link from the website. Could you please provide the original dataset file or the download link for me. Please.

Thank you so much

Hi Joffrey, Did you managed to find out a download link for MS-Celeb-1M? thanks in advance

jjsjunior avatar Aug 26 '19 17:08 jjsjunior

Hi, I would like to download the original datasets, but I can’t find the download link from the website. Could you please provide the original dataset file or the download link for me. Please. Thank you so much

Hi Joffrey, Did you managed to find out a download link for MS-Celeb-1M? thanks in advance

Hi , do you find out a download link for MS-Celeb-1M?

youthM avatar Oct 14 '19 12:10 youthM

https://academictorrents.com/details/9e67eb7cc23c9417f39778a8e06cca5e26196a97/tech&hit=1&filelist=1

ha1990-12 avatar Nov 14 '19 08:11 ha1990-12

Hi, how do you process the tsv you get from this torrent? I'm not sure what each column contains or how to process it.

ibarrond avatar Apr 05 '21 19:04 ibarrond

This should do the task of extracting the images from .TSV

import argparse
import base64
import csv
import os
# import magic # Detect image type from buffer contents (disabled, all are jpg)

parser = argparse.ArgumentParser()
parser.add_argument('--croppedTSV', type=str)
parser.add_argument('--outputDir', type=str, default='raw')
args = parser.parse_args()

with open(args.croppedTSV, 'r') as tsvF:
    reader = csv.reader(tsvF, delimiter='\t')
    i = 0
    for row in reader:
        MID, imgSearchRank, faceID, data = row[0], row[1], row[4], base64.b64decode(row[-1])

        saveDir = os.path.join(args.outputDir, MID)
        savePath = os.path.join(saveDir, "{}-{}.jpg".format(imgSearchRank, faceID))

        # assert(magic.from_buffer(data) == 'JPEG image data, JFIF standard 1.01')

        os.makedirs(saveDir, exist_ok=True)
        with open(savePath, 'wb') as f:
            f.write(data)

        i += 1

        if i % 1000 == 0:
            print("Extracted {} images.".format(i))

ketan-b avatar May 20 '21 09:05 ketan-b