lbann Sample list for image data reader

PR #1401 supersedes this PR

Currently, sample list loading is interleaved among processes per trainer. This I believe is to distribute the load while processing a single text input file in case that the load is non trivial.
However, data store pre-loads the label as long as the image file name without interleaving. Thus, the sample order in the sample list does not match the label order in the data store.
This can be solved by reordering sample list after all_gather to make sure the order is as it is in the file. While this capability is added, another solution is used. m_labels, which maps a sample name (file name) to its label replaces the list that contains the pair of image file name and label, m_image_list.
Currently, to express the location of sample list file in the prototext input, data_reader_jag_conduit obtains the filename from index_list, and replaces the directory path with the one for the data files which is given by data_filedir. However, this approach forces users to put the sample list in the same directory as where data are and create mess. To allows users to put sample list anywhere they want, I modified to use the full path of index_list.
The sample lists for imagenet are created under /p/lustre2/brainusr/datasets/ILSVRC2012/sample_list. *_sample_list.txt is in the sample list format. *_image_list.txt is in the original format with the directory paths corrected from lscratchf to lustre2.
verified by two methods
- writing out sample list file and comparing it against the input file
- compare the image file name obtained from m_sample_list and m_image_list in fetch_datum() over multiple epochs.

Jul 31 '19 22:07 JaeseungYeom

This worked for me; see my blog post for today's mtg. Only comment: I think the "index_list:" field in the data reader should be renamed to "sample_list" for clarity.

I will make the change.

Aug 29 '19 17:08 JaeseungYeom

Unless anyone has any additional comment, I will merge this PR at the end of the day.

Sep 26 '19 17:09 JaeseungYeom

Unless anyone has any additional comment, I will merge this PR at the end of the day.

My only concern is testing: will it break anything?

Sep 26 '19 17:09 davidHysom

I want to look at this before it merges. I know that it has been waiting for a long time, but I still want to look through it before merging.

Brian C. Van Essen [email protected] (w) 925-422-9300 (c) 925-290-5470

On Sep 26, 2019, at 10:27 AM, davidHysom [email protected] wrote:

Unless anyone has any additional comment, I will merge this PR at the end of the day.

My only concern is testing: will it break anything?

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub, or mute the thread.

Sep 26 '19 17:09 bvanessen

I will be waiting.

Sep 27 '19 05:09 JaeseungYeom

Rebased and compiles. Will test tomorrow.

Nov 27 '19 06:11 JaeseungYeom

This PR is still not ready to merge. There are no clear sample lists in the imagenet datasets directory.

Dec 04 '19 02:12 bvanessen

This PR is still not ready to merge. There are no clear sample lists in the imagenet datasets directory. /p/lustre2/brainusr/datasets/ILSVRC2012/sample_list

Dec 04 '19 19:12 JaeseungYeom

There are many permutations of lists there with similar but not clear names. Many don’t conform to what I thought was the standard format as well.

Brian C. Van Essen [email protected] (w) 925-422-9300 (c) 925-290-5470

On Dec 4, 2019, at 11:26 AM, Jae-Seung Yeom [email protected] wrote:

This PR is still not ready to merge. There are no clear sample lists in the imagenet datasets directory. /p/lustre2/brainusr/datasets/ILSVRC2012/sample_list

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub, or unsubscribe.

Dec 04 '19 20:12 bvanessen

Hopefully, this helps. Otherwise, we can discuss more during the meeting.

/p/lustre2/brainusr/datasets/ILSVRC2012/Single_train_c0-9_01_filenames.txt and /p/lustre2/brainusr/datasets/ILSVRC2012/sample_list/Single_train_c0-9_01_image_list.txt are same files in the original format except that the former has the full path to the image file (which is outdated containing lscratchf) while the latter has only the last part of the path that actually varies.

The former is one of the original imagenet label files we have been using. For the latter, the common prefix part is specified in the data reader prototext with the parameter data_filedir. This is not the directory for the list file itself but the image data files. We still use the original file for labels.

In case of the sample list files, such a path prefix is specified at the third line in the file as usual. /p/lustre2/brainusr/datasets/ILSVRC2012/sample_list/Single_train_c0-9_01_sample_list.txt is the corresponding sample list file. It does not include the label information as we discussed.

data_reader {
  reader {
    ...
    data_filedir: "/p/lustre2/brainusr/datasets/ILSVRC2012/resized_256x256/train/"
    data_filename: "/p/lustre2/brainusr/datasets/ILSVRC2012/sample_list/Single_train_c0-9_01_image_list.txt"
    sample_list: "/p/lustre2/brainusr/datasets/ILSVRC2012/sample_list/Single_train_c0-9_01_sample_list.txt"
    ...
}

Dec 04 '19 23:12 JaeseungYeom

lbann lbann copied to clipboard

Sample list for image data reader

lbann
lbann copied to clipboard