Conglong Li

Results 18 comments of Conglong Li

The scripts in training_scripts/other_language/ are very old scripts and haven't been updated. We will work on fixing it, but you can also simply apply these args into any English data...

@kindaQ HF datasets has caching (https://huggingface.co/docs/datasets/cache) so that if you copy the downloaded data into a machine and properly set HF_DATASETS_CACHE, that machine can directly load the datasets without downloading....

Based on HF's doc https://huggingface.co/docs/datasets/loading#offline, could you try to set HF_DATASETS_OFFLINE to 1 to enable full offline mode?

@kindaQ Thanks for the clarifications. Now I agree that this PR is needed. I finished my review and left some comments that need your fix. Please also write a short...

@kindaQ this PR was having formatting issues. I helped to fix it this time, but next time please make sure to use pre-commit to resolve them: "pre-commit install" then "pre-commit...

@chainyo @aleksandr-smechov Can you first try with the datasets we used in the example scripts? If using those datasets also leads to this problem, then it seems like a bug....

@chainyo Based on your original description, you have changed quite some things from the original example we tested: dataset, embedding size, etc. We won't be able to guarantee that it...

Closing due to lack of activities, feel free to reopen/create a new issue when you have more info to share.