scripts icon indicating copy to clipboard operation
scripts copied to clipboard

No valided stratified example error

Open croshong opened this issue 1 year ago • 11 comments

I have a question related to balancing and stratify_receptor.

When I train a model with default2018, It does not have any problem, But when I tried to train a model with dense, there are following error

File "train.py", line 935, in results = train_and_test_model(args, train_test_files[i], outname, cont) File "train.py", line 502, in train_and_test_model solver.step(test_interval) ValueError: No valid stratified examples.

The setting of balancing and stratify_receptor is same between dense.model and default2018.model

and in my training data there are positive and negative labeled samples.

Any possible clue to this error?

Thanks

croshong avatar Jun 07 '23 08:06 croshong

This presumably means you have a receptor that has only all positive or all negative examples.

dkoes avatar Jun 07 '23 13:06 dkoes

If that's the case, I think training with default2018.model also should fail, but it works well, so I guess there can be other reasson

croshong avatar Jun 07 '23 13:06 croshong

When I use the data included in https://github.com/gnina/models/tree/master/data/PDBBind2016/General_types/fixed_gen_uff_completeset_*, I got the same error in dense model. Is there something that I should modify ?

croshong avatar Jun 16 '23 08:06 croshong

I think the issue is I changed this from being a silent failure to an error, which is why things that use to work no longer work. The quickest fix is to filter the input data to remove the problematic entires.

dkoes avatar Jun 21 '23 14:06 dkoes

In my data, every entry has at least one positive or negative example so what you mean by problematic entry is the entry with too biased to positive or negative?

croshong avatar Jun 22 '23 14:06 croshong

There are quite a few examples in those files where a receptor only has one class of examples:

awk '{print $4,$1}' fixed_gen_uff_completeset_train0.types  | sort -u | awk '{print $1}' | uniq -c | sort -r -n

dkoes avatar Jul 24 '23 16:07 dkoes

Dense net caffe model architecture does not have a RMSD column, the def2018 architecture does. You need to use different types files for the Dense net, which I suspect is the source of the error.

francoep avatar Jul 24 '23 18:07 francoep

If that's the case, I think training with default2018.model also should fail, but it works well, so I guess there can be other reasson

Hi, i want to ask what is your original weightfile for default2018.model, did you use 'crossdock_2018_0.def' in gninasrc/lib/models/weights ?

Kerro-junior avatar Sep 17 '23 04:09 Kerro-junior

There is no one weight file for a given model architectures, but that would be the default model weights used when requesting the crossdock_2018 single model from gnina.

dkoes avatar Sep 18 '23 16:09 dkoes

Hi, I use some processed crystal samples to train the default2018.model, so there are only samples labeled '0'. So the error rose up: No valid stratified examples

How could I fix this issue?

Dadiao-shuai avatar Oct 31 '23 11:10 Dadiao-shuai

Turn of stratification/balancing.

dkoes avatar Oct 31 '23 12:10 dkoes