libmolgrid icon indicating copy to clipboard operation
libmolgrid copied to clipboard

libmolgrid issues about stratifying receptors

Open YanjingLiLi opened this issue 3 years ago • 1 comments

Hi authors, I have an issue using the stratify functions in ExampleProvider.

I tried two ways :

  1. train_samples = molgrid.ExampleProvider(ligmolcache=args.trligte, recmolcache=args.trrecte, shuffle=True, default_batch_size=args.batch_size, iteration_scheme=molgrid.IterationScheme.SmallEpoch, balanced=True, stratify_pos=3, stratify_step=1, stratify_max=6, stratify_min=0)

    train_samples.populate(args.trainfile)

(for the whole dataset, stratify_max=20958)

  1. train_samples = molgrid.ExampleProvider(ligmolcache=args.trligte, recmolcache=args.trrecte, shuffle=True, default_batch_size=args.batch_size, iteration_scheme=molgrid.IterationScheme.SmallEpoch, balanced=True, stratify_receptor=True)

    train_samples.populate(args.trainfile)

But when I ran them on cuda, neither of them can function properly. The GPU won't be used and after waiting for a long time, it can have error messages like: train_samples.populate(args.trainfile) ValueError: No valid examples found in training set. wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.

I have attached my whole dataset types and reduced_data types file here. Would you please take a look at what happens here? data.zip

YanjingLiLi avatar Jan 21 '23 19:01 YanjingLiLi

If you want to both balance and stratify, all of your strata need to have both positive and negative examples. They don't:

$ awk '{print $1,$4}' reduced_data.types  | sort -u
0 0
0 1
0 2
0 3
0 4
0 5
0 6
1 0
1 1
1 2
$ awk '{print $1,$4}' whole_data.types  | sort -u | grep -c "^1"
17596
$ awk '{print $1,$4}' whole_data.types  | sort -u | grep -c "^0"
20763

dkoes avatar Jan 26 '23 20:01 dkoes