PengWenChen comments

Results 14 comments of


                                            PengWenChen

run online-DHP

@tjinjin95 Hi. I've tried to run the online code by myself and it works, but I can't exactly reproduce the MO score written on paper. The paper says the average...

support additional special tokens

Hi there~ I would also like to ask about adding special tokens. The case is that for some models, such as Qwen1.5, the special tokens are not in the vocab.json...

Make caching location optional.

Hi @knighton. Thanks for your reply! My working environment cannot access as root account. If one of my partner runs your package first and somehow generates some cache files, such...

Make caching location optional.

Hi @knighton. I found another root path here: https://github.com/mosaicml/streaming/blob/main/streaming/base/stream.py#L166 After modifying both `self._filelock_root` in `dataset.py` and `root` in `stream.py`, the scripts can successfully executed! But I still want to confirm...

Make caching location optional.

Hi there! The sharedmemory seems not be cleaned up successfully by the first or other users and then the error occurs: `Permission denied: '/00000_locals'`. I also found another closed issue:...

Make caching location optional.

Hi @Skylion007. Thanks for your reply. I haven't tried this to force destroy the leftover shared memory. If I encounter the same problem in the future, I will give it...

Make caching location optional.

For these path modification: Yes. My path is readable and writeable, and I can change the permission by `chmod` either. https://github.com/mosaicml/streaming/issues/546#issuecomment-1868672842 As for `/000000_locals` issue, I failed to change the...

LanguageCrossEntropy logs nan when bash pruning.sh

Hi @xiamengzhou, I also encounter this issue with the original dynamic loading setup in `pruning.sh`. `set_names=[cc,github,book,stackexchange,wiki,arxiv,c4]` `proportion=[0.67,0.045,0.045,0.02,0.045,0.025,0.15]` And NaN happens in the first batch when calculating `metric/train/stackexchange_LanguageCrossEntropy`. The environment I...

LanguageCrossEntropy logs nan when bash pruning.sh

Hi @xiamengzhou! Thanks for your reply. However, I can not access google drive where I am working :( Could you please upload the processed data to this repository? It would...

LanguageCrossEntropy logs nan when bash pruning.sh

Hi, @xiamengzhou! The proportion updating fails because of NaN loss on evaluation data. And it is because of the missing data of some subdatasets. I solved this issue by increasing...