Haibin Wang comments

Results 18 comments of


                                            Haibin Wang

filter是否支持batch处理，以及怎么设置batch_size？

Hi, 我们这里的batch主要考虑mapper中一个样本生成多个样本的情况，返回时需要包装成batch，目前只有mapper支持batch功能，且输入batch大小固定为1。确实每个类型的op都应支持batch比较合理，且batch大小的设置应该开放给用户。但是这边用户可能需要考虑一下打batch的开销，如果batch_op的加速不足以cover住这部分开销可能速度会更慢。

Reproduction of experiments

Hi, I have preprocessed the data by running `bash preprocessing/run.sh` and used the quality filter by running `bash preprocessing/quality_scores/run_slurm_quality_stats.sh` and `bash data_selection/run_cmds.sh` in advance. We also turned the `--qualityfilter` on...

Reproduction of experiments

Actually, I believe your work is reasonable and I have been following it for a long time. I find your algorithms are totally different between your 'v1' and 'v3' released...

Reproduction of experiments

Hi, thanks very much. I had revised the `compute_domain_idxs` function as following in my experiment. ``` def compute_domain_idxs(filter_domains): ds_paths = dsname_to_args['pile']['task_name'] ds_dir = Path(ds_paths[0]).parent.parent domain_to_idxs = defaultdict(list) todo_domains = []...

Reproduction of experiments

Thank you for clarifying my confusion. Are you saying that you use the token distributions to compute the weights in 'v1' rather than learning two generative models as 'v1' suggests?

Reproduction of experiments

BTW, I am also confused about the different results of Top-k selection and resample selection. In my experiments, the performance of resample selection often falls between the performances of Top-k...

Reproduction of experiments

Thank you very much. Yes. The number is matching 1745766302. And the top-k means to not perturb the importance weights with Gumbel noise. I'm excited to see the further experiments.

Problems Encountered during Installation

The compiler require PyTorch > 2.0 in detectron2, but VBench require PyTorch < 2.0. The detectron2 have updated their codes. Please refer to https://github.com/facebookresearch/detectron2/commit/181aae36820af025eed1e33e58390f7ed9261e1a

Problems Encountered during Installation

A fair suggestion is that as a benchmarking library, it should not depend on third-party libraries.

Problems Encountered during Installation

> > The compiler require PyTorch > 2.0 in detectron2, but VBench require PyTorch < 2.0. The detectron2 have updated their codes. Please refer to [facebookresearch/detectron2@181aae3](https://github.com/facebookresearch/detectron2/commit/181aae36820af025eed1e33e58390f7ed9261e1a) > > > >...