Scene-Graph-Benchmark.pytorch icon indicating copy to clipboard operation
Scene-Graph-Benchmark.pytorch copied to clipboard

Training on custom dataset

Open nullkatar opened this issue 5 years ago • 46 comments

Hello @KaihuaTang ! Thanks once again for awesome work. Nowadays I wanted to train one of your models on custom dataset, but unfortunately in your benchmark I saw only testing tools for custom images, so I decided to implement this on my own. So essentially I just want to replace vg files (namely "VG-SGG-dicts-with-attri.json" and "VG-SGG-with-attri.json") with similar ones for chosed dataset, but unfortunately I can't do this entirely for now. For now I already collected all required data for "VG-SGG-dicts-with-attri.json", but I absolutely have no idea how to create the second file. Can I ask you to provide some scripts for generating these files or something around it? Btw I'm doing this for GQA dataset.

nullkatar avatar Aug 29 '20 12:08 nullkatar

These files are generated based on https://github.com/danfeiX/scene-graph-TF-release/tree/master/data_tools

I just added attribute info, which is treated the same as the category.

KaihuaTang avatar Aug 30 '20 02:08 KaihuaTang

You can check generate_attribute_labels in https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch/tree/master/datasets/vg for the details of how I create attribute labels.

KaihuaTang avatar Aug 30 '20 02:08 KaihuaTang

@nullkatar Hi, have you created the customized training dataset? I am going to create one, but have no idea about that.

gladcolor avatar Sep 28 '20 03:09 gladcolor

@gladcolor Yep, I did it. Can you please write me your email so I can provide you with it?

nullkatar avatar Sep 30 '20 11:09 nullkatar

@nullkater, Thanks so much! My email is [email protected].

On Wed, Sep 30, 2020, 7:18 AM Leon Kochiev [email protected] wrote:

@gladcolor https://github.com/gladcolor Yep, I did it. Can you please write me your email so I can provide you with it?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch/issues/82#issuecomment-701326584, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFLVEOGB6BK6UUOQOEL7CH3SIMHXNANCNFSM4QO63DEA .

gladcolor avatar Sep 30 '20 11:09 gladcolor

@nullkatar I was also looking for something similar for the GQA dataset. Do you think you could send me an email with any steps that might help at [email protected]?

vrdn-23 avatar Oct 26 '20 23:10 vrdn-23

@gladcolor Yep, I did it. Can you please write me your email so I can provide you with it?

Can you plz send me another copy of that? I am also working on the customized training datasets, thanks. [email protected]

luo980 avatar Oct 29 '20 07:10 luo980

@gladcolor Yep, I did it. Can you please write me your email so I can provide you with it?

Hi @nullkatar, I am trying to train this scene graph technique on a customized dataset but facing problems, I would highly appreciate it if you could share the steps/code with me too. My email is [email protected]. Thanks in advance.

jaleedkhan avatar Dec 07 '20 06:12 jaleedkhan

Hi @nullkatar I'm trying to create a custom dataset, but not sure where to begin. My email is [email protected]. I would appreciate it if you could point me in the right direction.

BlockchainRev avatar Dec 28 '20 04:12 BlockchainRev

@gladcolor Yep, I did it. Can you please write me your email so I can provide you with it?

Hi @nullkatar, I'm also looking for something similar for the GQA dataset, can you plz send me another copy of that? My email is [email protected]. Thanks so much!

qiantianwen avatar Jan 06 '21 04:01 qiantianwen

@gladcolorYep, I did it. Can you please write me your email so I can provide you with it?

Thank you for awesome work,could you help me to create a custom dataset? My email is [email protected] Thanks so much.

JackWhite-rwx avatar Jan 16 '21 03:01 JackWhite-rwx

@nullkatar I am also looking at GQA dataset with the codebase. Could you also share with me your resource to [email protected] Thanks in advance!

cramraj8 avatar Feb 01 '21 05:02 cramraj8

@nullkatar I am also trying to test the performance on GQA dataset with the codebase. Could you please share with me your resource to [email protected] ? Thanks in advance !

bashirulazam avatar Feb 04 '21 21:02 bashirulazam

Hi @nullkatar, I am trying to create a custom dataset too. Could you please share your resource to [email protected]. Thanks!

paridhimaheshwari2708 avatar Mar 04 '21 06:03 paridhimaheshwari2708

@gladcolor Yep, I did it. Can you please write me your email so I can provide you with it?

Hi, @nullkatar , I am trying to train on my custom dataset too. Could you please share with me your method or code to [email protected] ? Thanks!

D-Mer avatar Mar 16 '21 15:03 D-Mer

@gladcolor Yep, I did it. Can you please write me your email so I can provide you with it?

Hi, @nullkatar , I am trying to train on my custom dataset too. Could you please share with me your method or code to [email protected] ? Thanks!

Hi, I am trying to test on my custom dataset too. Could you plz send me another copy of that? My email is [email protected] Thanks so much!

Lyon52222 avatar Mar 24 '21 05:03 Lyon52222

Hi, @D-Mer I am trying to test on my custom dataset too. Could you plz send me another copy of that? My email is [email protected] Thanks so much!

Lyon52222 avatar Mar 24 '21 05:03 Lyon52222

@Lyon52222 Finnally I did it myself, which took me about 2days. But my code may only fit my task. It depends on the format of your cutsom dataset. Note that I use https://github.com/danfeiX/scene-graph-TF-release/tree/master/data_tools, which @KaihuaTang has mentioned. In short, it needs 2 steps:

  1. convert your dataset to the Visual Genome format,see https://visualgenome.org/api/v0/api_readme. for SGG, we need image_data.json, objects.json, relationships.json.
  2. use the mentioned repository to create the desired files which we need in this repository. What I did is the step 1 and some trivial changes to make the code run. But an annoying thing occured: It will change some label/box and filter instance/relations. Maybe the better way is to do it ourselves. If you are interested, you can give me your qq, or we can discuss via email.

D-Mer avatar Mar 24 '21 06:03 D-Mer

@Lyon52222 Finnally I did it myself, which took me about 2days. But my code may only fit my task. It depends on the format of your cutsom dataset. Note that I use https://github.com/danfeiX/scene-graph-TF-release/tree/master/data_tools, which @KaihuaTang has mentioned. In short, it needs 2 steps:

  1. convert your dataset to the Visual Genome format,see https://visualgenome.org/api/v0/api_readme. for SGG, we need image_data.json, objects.json, relationships.json.
  2. use the mentioned repository to create the desired files which we need in this repository. What I did is the step 1 and some trivial changes to make the code run. But an annoying thing occured: It will change some label/box and filter instance/relations. Maybe the better way is to do it ourselves. If you are interested, you can give me your qq, or we can discuss via email.

I am interest in your process, my email is [email protected]. Can we discuss vis email?

tyshiwo1 avatar Apr 15 '21 03:04 tyshiwo1

@tyshiwo1 It took me several days to analyze and generate the data files by myself. The method I mentioned before will lead to some unpredictable problems. so I convert my dataset to the desired faster rcnn format directly. Well, We only need to add the "attribute" field to the format what faster rcnn needs. You can open the data files and see the structures, I suggest you do it yourself:

  1. VG-SGG-dicts-with-attri.json: a dict of your relation and object categories, like
{
	"label_to_idx": {
		"PC": 1,
		"aircraft": 2,
		},
	"idx_to_label": {
		"1": "PC",
		"2": "aircraft",
	},
	"predicate_to_idx": {
		"None": 1,
		"behind": 2, 
	},
	"idx_to_predicate": {
		"1": "None",
		"2": "behind",
	},
	"predicate_count": {
		"in front of": 48411,
		"next to": 30917,
	},
	"attribute_count": {}, # the vctree code needs, but in fact we can set it empty
	"idx_to_attribute": {},
	"attribute_to_idx": {},
}
  1. image_data.json: a list of image meta info, like [{"file_name": "000001.jpg", "image_id": 0, "height": 3137, "width": 4705}, {...} ...]

  2. the downloaded file VG-SGG-with-attri.h5 from DATASET.md: a hdf5 dataset, like

<HDF5 dataset "attributes": shape (122174, 10), type "<i8">
 # all zeros if you don't have attributes, the second dim makes no difference
[[0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]]
<HDF5 dataset "boxes_1024": shape (122174, 4), type "<i4">
[[363 397 187 559]
 [300 382  15   7]
 [696 352 112 457]
 [843 361 301 639]]
<HDF5 dataset "boxes_512": shape (122174, 4), type "<i4">
[[182 199  94 280]
 [150 191   8   4]
 [348 176  56 229]
 [421 181 151 320]]
<HDF5 dataset "img_to_first_box": shape (14630,), type "<i4">
[ 0 10 19 29]
<HDF5 dataset "img_to_first_rel": shape (14630,), type "<i4">
[ 0 17 27 42]
<HDF5 dataset "img_to_last_box": shape (14630,), type "<i4">
[ 9 18 28 31]
<HDF5 dataset "img_to_last_rel": shape (14630,), type "<i4">
[16 26 41 43]
<HDF5 dataset "labels": shape (122174, 1), type "<i4">
[[67]
 [37]
 [67]
 [67]]
<HDF5 dataset "predicates": shape (162003, 1), type "<i4">
 # the relation label index
[[ 9]
 [ 2]
 [ 2]
 [15]]
<HDF5 dataset "relationships": shape (162003, 2), type "<i4">
 # the object ids in each ralation
[[0 1]
 [2 3]
 [4 2]
 [5 4]]
<HDF5 dataset "split": shape (14630,), type "<i4">
 # this is the split signal of 0 train/ 1 val/ 2 test
[0 0 0 0]

D-Mer avatar Apr 15 '21 14:04 D-Mer

Hi, @nullkatar, I am also having troubles creating a custom dataset. Could you please share your approach with me at [email protected]? Thank you!

szysad avatar Aug 04 '21 09:08 szysad

@tyshiwo1 It took me several days to analyze and generate the data files by myself. The method I mentioned before will lead to some unpredictable problems. so I convert my dataset to the desired faster rcnn format directly. Well, We only need to add the "attribute" field to the format what faster rcnn needs. You can open the data files and see the structures, I suggest you do it yourself:

  1. VG-SGG-dicts-with-attri.json: a dict of your relation and object categories, like
{
	"label_to_idx": {
		"PC": 1,
		"aircraft": 2,
		},
	"idx_to_label": {
		"1": "PC",
		"2": "aircraft",
	},
	"predicate_to_idx": {
		"None": 1,
		"behind": 2, 
	},
	"idx_to_predicate": {
		"1": "None",
		"2": "behind",
	},
	"predicate_count": {
		"in front of": 48411,
		"next to": 30917,
	},
	"attribute_count": {}, # the vctree code needs, but in fact we can set it empty
	"idx_to_attribute": {},
	"attribute_to_idx": {},
}
  1. image_data.json: a list of image meta info, like [{"file_name": "000001.jpg", "image_id": 0, "height": 3137, "width": 4705}, {...} ...]
  2. the downloaded file VG-SGG-with-attri.h5 from DATASET.md: a hdf5 dataset, like
<HDF5 dataset "attributes": shape (122174, 10), type "<i8">
 # all zeros if you don't have attributes, the second dim makes no difference
[[0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]]
<HDF5 dataset "boxes_1024": shape (122174, 4), type "<i4">
[[363 397 187 559]
 [300 382  15   7]
 [696 352 112 457]
 [843 361 301 639]]
<HDF5 dataset "boxes_512": shape (122174, 4), type "<i4">
[[182 199  94 280]
 [150 191   8   4]
 [348 176  56 229]
 [421 181 151 320]]
<HDF5 dataset "img_to_first_box": shape (14630,), type "<i4">
[ 0 10 19 29]
<HDF5 dataset "img_to_first_rel": shape (14630,), type "<i4">
[ 0 17 27 42]
<HDF5 dataset "img_to_last_box": shape (14630,), type "<i4">
[ 9 18 28 31]
<HDF5 dataset "img_to_last_rel": shape (14630,), type "<i4">
[16 26 41 43]
<HDF5 dataset "labels": shape (122174, 1), type "<i4">
[[67]
 [37]
 [67]
 [67]]
<HDF5 dataset "predicates": shape (162003, 1), type "<i4">
 # the relation label index
[[ 9]
 [ 2]
 [ 2]
 [15]]
<HDF5 dataset "relationships": shape (162003, 2), type "<i4">
 # the object ids in each ralation
[[0 1]
 [2 3]
 [4 2]
 [5 4]]
<HDF5 dataset "split": shape (14630,), type "<i4">
 # this is the split signal of 0 train/ 1 val/ 2 test
[0 0 0 0]

Hi @D-Mer good job on making your custom dataset work with this framework. Could you go into more details about creating custom VG-SGG-with-attri.h5 file?

szysad avatar Aug 04 '21 09:08 szysad

@Lyon52222 Finnally I did it myself, which took me about 2days. But my code may only fit my task. It depends on the format of your cutsom dataset. Note that I use https://github.com/danfeiX/scene-graph-TF-release/tree/master/data_tools, which @KaihuaTang has mentioned. In short, it needs 2 steps:

  1. convert your dataset to the Visual Genome format,see https://visualgenome.org/api/v0/api_readme. for SGG, we need image_data.json, objects.json, relationships.json.
  2. use the mentioned repository to create the desired files which we need in this repository. What I did is the step 1 and some trivial changes to make the code run. But an annoying thing occured: It will change some label/box and filter instance/relations. Maybe the better way is to do it ourselves. If you are interested, you can give me your qq, or we can discuss via email.

Hi, @D-Mer, I also have trouble in training on custom dataset. Could you please share your approach, and here is my qq 1037443699 ,thank you!

fun1024 avatar Dec 07 '21 12:12 fun1024

@Lyon52222 Finnally I did it myself, which took me about 2days. But my code may only fit my task. It depends on the format of your cutsom dataset. Note that I use https://github.com/danfeiX/scene-graph-TF-release/tree/master/data_tools, which @KaihuaTang has mentioned. In short, it needs 2 steps:

  1. convert your dataset to the Visual Genome format,see https://visualgenome.org/api/v0/api_readme. for SGG, we need image_data.json, objects.json, relationships.json.
  2. use the mentioned repository to create the desired files which we need in this repository. What I did is the step 1 and some trivial changes to make the code run. But an annoying thing occured: It will change some label/box and filter instance/relations. Maybe the better way is to do it ourselves. If you are interested, you can give me your qq, or we can discuss via email.

Hi, @D-Mer, I also have trouble in training on custom dataset. Could you please share your approach, and here is my qq 1037443699 ,thank you!

@Lyon52222 Finnally I did it myself, which took me about 2days. But my code may only fit my task. It depends on the format of your cutsom dataset. Note that I use https://github.com/danfeiX/scene-graph-TF-release/tree/master/data_tools, which @KaihuaTang has mentioned. In short, it needs 2 steps:

  1. convert your dataset to the Visual Genome format,see https://visualgenome.org/api/v0/api_readme. for SGG, we need image_data.json, objects.json, relationships.json.
  2. use the mentioned repository to create the desired files which we need in this repository. What I did is the step 1 and some trivial changes to make the code run. But an annoying thing occured: It will change some label/box and filter instance/relations. Maybe the better way is to do it ourselves. If you are interested, you can give me your qq, or we can discuss via email.

Hi, I am also working on this, could you please share your process and codes? Here's my qq: 408079378 and my email: [email protected]. Thanks so much!

Yvonne0413 avatar Feb 20 '22 07:02 Yvonne0413

Hi, I am also working on this, could you please share your process and codes? Here's my email: [email protected]. Thanks so much!

DeepaliVerma avatar Feb 22 '22 12:02 DeepaliVerma

Hi @nullkatar, I am working on a customized dataset but facing problems, could you also send me an email with any steps that might help. My email is [email protected]. Thanks a lot.

Vincent-luo avatar Apr 01 '22 14:04 Vincent-luo

Hi @nullkatar , I have also met some difficulity when trying preparing my custom dataset. Could you share a copy of your code via email? Thank you so much! [email protected]

Mingyuan1997 avatar Sep 06 '22 19:09 Mingyuan1997

Hi @nullkatar , I have also met some difficulity when trying preparing my custom dataset. Could you share a copy of your code via email? [email protected] Thank you so much!

Chenghao-Ding avatar Nov 08 '22 16:11 Chenghao-Ding

感谢您的来信,我已收到。

Hi, I have alse met some difficulty when trying preparing my custom dataset.Could you share a copy of your solution to get [email protected]. Thanks.

zhuyibing avatar Feb 16 '23 06:02 zhuyibing

@gladcolor Yep, I did it. Can you please write me your email so I can provide you with it?

Hi, I have alse met some difficulty when trying preparing my custom dataset.Could you share a copy of your solution to get [email protected]. Thanks.

zhuyibing avatar Feb 16 '23 06:02 zhuyibing