armory
armory copied to clipboard
tfdsv4 resisc45
@davidslater do you recall why the dataset gets built to <data_dir>/resisc45_split
rather than <data_dir>/resisc45
? This discrepancy makes load.load()
think the dataset doesn't exist locally, since it's looking for the path <data_dir>/resisc45
build command:
python -m armory.datasets.build resisc45
It was because the original dataset did not contain splits. However, I think that we're fine to just call it resisc45
.
You will want to incorporate resisc45_dataset_partition.py
into your _generate_examples method
, I think.
In armory/data/resisc45/resisc45_split.py
, which I've copied to armory/datasets
, the data is already split into 3 tar files. I'm able to build the data (with splits) fine without touching resisc45_dataset_partition.py
, but the data is built to resisc45_split
dir:
I have no name!@b9bf7b1fac00:/workspace$ ls /armory/datasets/new_builds/resisc45_split/3.0.0/
dataset_info.json resisc45_split-test.tfrecord-00000-of-00001 resisc45_split-train.tfrecord-00002-of-00004
features.json resisc45_split-train.tfrecord-00000-of-00004 resisc45_split-train.tfrecord-00003-of-00004
label.labels.txt resisc45_split-train.tfrecord-00001-of-00004 resisc45_split-validation.tfrecord-00000-of-00001
I have no name!@b9bf7b1fac00:/workspace$ ls /armory/datasets/new_builds/resisc45
ls: cannot access '/armory/datasets/new_builds/resisc45': No such file or directory
if I change the dataset "name"
in the config from "resisc45"
to "resisc45_split"
, I can load the data fine. But with the name "resisc45"
, load.load()
looks for /armory/datasets/new_builds/resisc45
and throws an error.
Those URLs are what results from applying the resisc45_dataset_partition.py script to the original dataset NWPU-RESISC45.tar.gz
and then breaking into separate files.
I think that we probably want to just reference armory-public-data/resisc45/NWPU-RESISC45.tar.gz
and incorporate the script into the builder loop. Thoughts?
I can take a stab at it if you'd like.
after the above commit, I can build to resisc45
dir without error, and this no longer uses the 3 separate files. Not quite sure why I needed to add the hardcoded NWPU-RESISC45
in a couple places to get things working, though
Builds fine for me.
Removing the WIP, I'm able to run a scenario and see expected benign/adv output. @davidslater ready for re-review
done. Calling add_to_cache()
also attempts to upload to s3, although this yielded an error for me. Not before cached_datasets.json
was updated, though. Have you been using upload()
successfully? I noticed that all the url
's in the json are null
Where did it error? Do you have ARMORY_PRIVATE_S3_ID
and ARMORY_PRIVATE_S3_KEY
?
You can break down that operation with:
from armory.datasets import package, upload
package.package("resisc45")
package.update("resisc45")
package.verify("resisc45")
upload.upload("resisc45", public=True)