libmolgrid issues about stratifying receptors
Hi authors, I have an issue using the stratify functions in ExampleProvider.
I tried two ways :
-
train_samples = molgrid.ExampleProvider(ligmolcache=args.trligte, recmolcache=args.trrecte, shuffle=True, default_batch_size=args.batch_size, iteration_scheme=molgrid.IterationScheme.SmallEpoch, balanced=True, stratify_pos=3, stratify_step=1, stratify_max=6, stratify_min=0)train_samples.populate(args.trainfile)
(for the whole dataset, stratify_max=20958)
-
train_samples = molgrid.ExampleProvider(ligmolcache=args.trligte, recmolcache=args.trrecte, shuffle=True, default_batch_size=args.batch_size, iteration_scheme=molgrid.IterationScheme.SmallEpoch, balanced=True, stratify_receptor=True)train_samples.populate(args.trainfile)
But when I ran them on cuda, neither of them can function properly. The GPU won't be used and after waiting for a long time, it can have error messages like:
train_samples.populate(args.trainfile) ValueError: No valid examples found in training set. wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
I have attached my whole dataset types and reduced_data types file here. Would you please take a look at what happens here? data.zip
If you want to both balance and stratify, all of your strata need to have both positive and negative examples. They don't:
$ awk '{print $1,$4}' reduced_data.types | sort -u
0 0
0 1
0 2
0 3
0 4
0 5
0 6
1 0
1 1
1 2
$ awk '{print $1,$4}' whole_data.types | sort -u | grep -c "^1"
17596
$ awk '{print $1,$4}' whole_data.types | sort -u | grep -c "^0"
20763