lbann
lbann copied to clipboard
Sample list for image data reader
PR #1401 supersedes this PR
-
Currently, sample list loading is interleaved among processes per trainer. This I believe is to distribute the load while processing a single text input file in case that the load is non trivial.
-
However, data store pre-loads the label as long as the image file name without interleaving. Thus, the sample order in the sample list does not match the label order in the data store.
-
This can be solved by reordering sample list after all_gather to make sure the order is as it is in the file. While this capability is added, another solution is used.
m_labels
, which maps a sample name (file name) to its label replaces the list that contains the pair of image file name and label,m_image_list
. -
Currently, to express the location of sample list file in the prototext input,
data_reader_jag_conduit
obtains the filename fromindex_list
, and replaces the directory path with the one for the data files which is given bydata_filedir
. However, this approach forces users to put the sample list in the same directory as where data are and create mess. To allows users to put sample list anywhere they want, I modified to use the full path of index_list. -
The sample lists for imagenet are created under
/p/lustre2/brainusr/datasets/ILSVRC2012/sample_list
.*_sample_list.txt
is in the sample list format.*_image_list.txt
is in the original format with the directory paths corrected fromlscratchf
tolustre2
. -
verified by two methods
- writing out sample list file and comparing it against the input file
- compare the image file name obtained from m_sample_list and m_image_list in fetch_datum() over multiple epochs.
This worked for me; see my blog post for today's mtg. Only comment: I think the "index_list:" field in the data reader should be renamed to "sample_list" for clarity.
I will make the change.
Unless anyone has any additional comment, I will merge this PR at the end of the day.
Unless anyone has any additional comment, I will merge this PR at the end of the day.
My only concern is testing: will it break anything?
I want to look at this before it merges. I know that it has been waiting for a long time, but I still want to look through it before merging.
Brian C. Van Essen [email protected] (w) 925-422-9300 (c) 925-290-5470
On Sep 26, 2019, at 10:27 AM, davidHysom [email protected] wrote:
Unless anyone has any additional comment, I will merge this PR at the end of the day.
My only concern is testing: will it break anything?
— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub, or mute the thread.
I will be waiting.
Rebased and compiles. Will test tomorrow.
This PR is still not ready to merge. There are no clear sample lists in the imagenet datasets directory.
This PR is still not ready to merge. There are no clear sample lists in the imagenet datasets directory.
/p/lustre2/brainusr/datasets/ILSVRC2012/sample_list
There are many permutations of lists there with similar but not clear names. Many don’t conform to what I thought was the standard format as well.
Brian C. Van Essen [email protected] (w) 925-422-9300 (c) 925-290-5470
On Dec 4, 2019, at 11:26 AM, Jae-Seung Yeom [email protected] wrote:
This PR is still not ready to merge. There are no clear sample lists in the imagenet datasets directory. /p/lustre2/brainusr/datasets/ILSVRC2012/sample_list
— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub, or unsubscribe.
Hopefully, this helps. Otherwise, we can discuss more during the meeting.
/p/lustre2/brainusr/datasets/ILSVRC2012/Single_train_c0-9_01_filenames.txt
and /p/lustre2/brainusr/datasets/ILSVRC2012/sample_list/Single_train_c0-9_01_image_list.txt
are same files in the original format except that the former has the full path to the image file (which is outdated containing lscratchf) while the latter has only the last part of the path that actually varies.
The former is one of the original imagenet label files we have been using. For the latter, the common prefix part is specified in the data reader prototext with the parameter data_filedir
. This is not the directory for the list file itself but the image data files. We still use the original file for labels.
In case of the sample list files, such a path prefix is specified at the third line in the file as usual.
/p/lustre2/brainusr/datasets/ILSVRC2012/sample_list/Single_train_c0-9_01_sample_list.txt
is the corresponding sample list file. It does not include the label information as we discussed.
data_reader {
reader {
...
data_filedir: "/p/lustre2/brainusr/datasets/ILSVRC2012/resized_256x256/train/"
data_filename: "/p/lustre2/brainusr/datasets/ILSVRC2012/sample_list/Single_train_c0-9_01_image_list.txt"
sample_list: "/p/lustre2/brainusr/datasets/ILSVRC2012/sample_list/Single_train_c0-9_01_sample_list.txt"
...
}