clrs icon indicating copy to clipboard operation
clrs copied to clipboard

There's a bug in running kmp_matcher algorithm at example run.py.

Open Jeff-Huang-SHU opened this issue 9 months ago • 2 comments

we can't add val_lengths, since create_samplers() alters the train_lengthsfor both kmp_matcher and naive_string_matcher.

  (
      train_samplers,
      val_samplers,
      val_sample_counts,
      test_samplers,
      test_sample_counts,
      spec_list,
  ) = create_samplers(
      rng=rng,
      train_lengths=train_lengths,
      algorithms=FLAGS.algorithms,
      val_lengths=[np.amax(train_lengths)],
      test_lengths=[-1],
      train_batch_size=FLAGS.batch_size,
  )

In create_samplers(), it will choose val_lengths first , instead of [np.amax(train_lengths)]. Therefore, it will raise a bug in MatcherSampler.

File "numpy/random/mtrand.pyx", line 944, in numpy.random.mtrand.RandomState.choice
ValueError: a must be greater than 0 unless no samples are taken

Jeff-Huang-SHU avatar Mar 18 '25 13:03 Jeff-Huang-SHU

I am facing the same issue. It would be good to understand the settings used to generated the benchmark dataset. Current run.py has defaults only for the "bfs" algorithm. It is unclear how the other algorithm data is generated.

abhitopia avatar Apr 29 '25 14:04 abhitopia

Hi Jeff,

Thank you for your issue and I apologise for not getting around to it earlier. Somehow I did not get notified when you sent it, and I went on leave in the meantime.

I already wrote a longer response to a similar issue raised by Abhi, and I believe both will be fixed by the same correction. I will report back once we have fully looked into it (hopefully sometime next week).

As a quick additional response to Abhi's remark:

Current run.py has defaults only for the "bfs" algorithm. It is unclear how the other algorithm data is generated.

Unless I'm mistaken (it's been a while since I double checked this script!) these are supposed to be the default hyperparameters for all algorithms -- the run.py script has ['bfs'] only in the algorithms field mainly to make the initial run simpler to execute for a CPU-only setup.

In principle it should be sufficient to update the list of algorithms to all of them to reflect the environment of the Generalist paper (perhaps after we fix the identified issue first :) ).

Thanks, Petar

PetarV- avatar Aug 24 '25 12:08 PetarV-

Confirming this has now been fixed in the latest merge.

PetarV- avatar Oct 15 '25 21:10 PetarV-