transformers
                                
                                 transformers copied to clipboard
                                
                                    transformers copied to clipboard
                            
                            
                            
                        ๐ [i18n-KO] Translating docs to Korean
Hi!
Let's bring the documentation to all the Korean-speaking community ๐ (currently 9 out of 77 complete)
Would you want to translate? Please follow the ๐ค TRANSLATING guide. Here is a list of the files ready for translation. Let us know in this issue if you'd like to translate any, and we'll add your name to the list.
Some notes:
- Please translate using an informal tone (imagine you are talking with a friend about transformers ๐ค).
- Please translate in a gender-neutral way.
- Add your translations to the folder called koinside the source folder.
- Register your translation in ko/_toctree.yml; please follow the order of the English version.
- Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue. Please ping @ArthurZucker, @sgugger and @eunseojo for review.
- ๐ If you'd like others to help you with the translation, you can also post in the ๐ค forums.
- With the HuggingFace Documentation l10n initiative of Pseudo Lab, full translation will be done even faster. ๐ Please give us your support! Cheers to our team ๐@0525hhgus, @KIHOON71, @gabrielwithappy, @jungnerd, @sim-so, @HanNayeoniee, @wonhyeongseo
์๋ ํ์ธ์!
ํ๊ตญ์ด๋ฅผ ์ฌ์ฉํ๋ ๋ชจ๋๊ฐ ๊ธฐ์ ๋ฌธ์๋ฅผ ์ฝ์ ์ ์๊ฒ ํด๋ณด์์ ๐ (ํ์ฌ 77๊ฐ ๋ฌธ์ ์ค 9๊ฐ ์๋ฃ)
๋ฒ์ญ์ ์ฐธ์ฌํ๊ณ  ์ถ์ผ์ ๊ฐ์? ๐ค ๋ฒ์ญ ๊ฐ์ด๋๋ฅผ ๋จผ์  ์ฝ์ด๋ณด์๊ธฐ ๋ฐ๋๋๋ค. ๋ ๋ถ๋ถ์ ๋ฒ์ญํด์ผํ  ํ์ผ๋ค์ด ๋์ด๋์ด ์์ต๋๋ค. ์์
ํ๊ณ  ๊ณ์  ํ์ผ์ด ์๋ค๋ฉด ์ฌ๊ธฐ์ ๊ฐ๋จํ ์๋ ค์ฃผ์ธ์. ์ค๋ณต๋์ง ์๋๋ก ์์
์ค์ผ๋ก ํ์ํด๋๊ฒ์.
์ฐธ๊ณ ์ฌํญ:
- ๊ธฐ์ ๋ฌธ์์ด์ง๋ง (์น๊ตฌ์๊ฒ ์ค๋ช ๋ฃ๋ฏ์ด) ์ฝ๊ฒ ์ฝํ๋ฉด ์ข๊ฒ ์ต๋๋ค. ์กด๋๋ง ๋ก ์จ์ฃผ์๋ฉด ๊ฐ์ฌํ๊ฒ ์ต๋๋ค.
- ์ฑ๋ณ์ ์ผ๋ถ ์ธ์ด(์คํ์ธ์ด, ํ๋์ค์ด ๋ฑ)์๋ง ์ ์ฉ๋๋ ์ฌํญ์ผ๋ก, ํ๊ตญ์ด์ ๊ฒฝ์ฐ ๋ฒ์ญ๊ธฐ๋ฅผ ์ฌ์ฉํ์ ํ ๋ฌธ์ฅ ๊ธฐํธ์ ์กฐ์ฌ ๋ฑ์ด ์๋ง๋์ง ํ์ธํด์ฃผ์๊ธฐ ๋ฐ๋๋๋ค.
- ์์ค ํด๋ ์๋ koํด๋์ ๋ฒ์ญ๋ณธ์ ๋ฃ์ด์ฃผ์ธ์.
- ๋ชฉ์ฐจ(ko/_toctree.yml)๋ ํจ๊ป ์ ๋ฐ์ดํธํด์ฃผ์ธ์. ์์ด ๋ชฉ์ฐจ์ ์์๊ฐ ๋์ผํด์ผ ํฉ๋๋ค.
- ๋ชจ๋ ๋ง์น์
จ๋ค๋ฉด, ๊ธฐ๋ก์ด ์ํํ๋๋ก PR์ ์ฌ์ค ๋ ํ์ฌ ์ด์(#20179)๋ฅผ ๋ด์ฉ์ ๋ฃ์ด์ฃผ์๊ธฐ ๋ฐ๋๋๋ค. ๋ฆฌ๋ทฐ ์์ฒญ์ @ArthurZucker๋, @sgugger๋, @eunseojo๋๊ป ์์ฒญํด์ฃผ์ธ์.
- ๐ ์ปค๋ฎค๋ํฐ์ ๋ง์๊ป ํ๋ณดํด์ฃผ์๊ธฐ ๋ฐ๋๋๋ค! ๐ค ํฌ๋ผ์ ์ฌ๋ฆฌ์ ๋ ์ข์์.
- ๊ฐ์ง์ฐ๊ตฌ์์ ์ด๋์ ํฐ๋ธ๋ก ๋ฒ์ญ์ด ๋์ฑ ๋น ๋ฅด๊ฒ ์งํ๋ ์์ ์ ๋๋ค. ๐ ๋ง์ ์์ ๋ถํ๋๋ ค์! ์ฐ๋ฆฌํ ํ์ดํ ๐ @0525hhgus, @KIHOON71, @gabrielwithappy, @jungnerd, @sim-so, @HanNayeoniee, @wonhyeongseo
GET STARTED
- [x] ๐ค Transformers https://github.com/huggingface/transformers/pull/20180
- [x] Quick tour https://github.com/huggingface/transformers/pull/20946
- [x] Installation https://github.com/huggingface/transformers/pull/20948
TUTORIAL
- [x] Pipelines for inference https://github.com/huggingface/transformers/pull/22508
- [x] Load pretrained instances with an AutoClass https://github.com/huggingface/transformers/pull/22533
- [x] Preprocess https://github.com/huggingface/transformers/pull/22578
- [x] Fine-tune a pretrained model https://github.com/huggingface/transformers/pull/22670
- [x] Distributed training with ๐ค Accelerate https://github.com/huggingface/transformers/pull/22830
- [ ] Share a model
HOW-TO GUIDES
GENERAL USAGE
- [x] Create a custom architecture https://github.com/huggingface/transformers/pull/22754
- [x] Sharing custom models https://github.com/huggingface/transformers/pull/22534
- [x] Train with a script https://github.com/huggingface/transformers/pull/22793
- [x] Run training on Amazon SageMaker https://github.com/huggingface/transformers/pull/22509
- [ ] Converting from TensorFlow checkpoints
- [x] Export to ONNX https://github.com/huggingface/transformers/pull/22806
- [ ] Export to TorchScript
- [ ] Troubleshoot
NATURAL LANGUAGE PROCESSING
- [x] Use tokenizers from ๐ค Tokenizers https://github.com/huggingface/transformers/pull/22956
- [ ] Inference for multilingual models
- [ ] Text generation strategies
TASK GUIDES
- [x] Text classification https://github.com/huggingface/transformers/pull/22655
- [x] Token classification https://github.com/huggingface/transformers/pull/22945
- [ ] Question answering
- [ ] Causal language modeling
- [x] Masked language modeling https://github.com/huggingface/transformers/pull/22838
- [x] Translation https://github.com/huggingface/transformers/pull/22805
- [x] Summarization https://github.com/huggingface/transformers/pull/22783
- [ ] Multiple choice
AUDIO
- [ ] Audio classification
- [ ] Automatic speech recognition
COMPUTER VISION
- [ ] Image classification
- [ ] Semantic segmentation
- [ ] Video classification
- [ ] Object detection
- [ ] Zero-shot object detection
- [ ] Zero-shot image classification
- [ ] Depth estimation
MULTIMODAL
- [x] Image captioning https://github.com/huggingface/transformers/pull/22943
- [ ] Document Question Answering
PERFORMANCE AND SCALABILITY
- [ ] Overview
- [ ] Training on one GPU
- [ ] Training on many GPUs
- [ ] Training on CPU
- [ ] Training on many CPUs
- [ ] Training on TPUs
- [ ] Training on TPU with TensorFlow
- [ ] Training on Specialized Hardware
- [ ] Inference on CPU
- [ ] Inference on one GPU
- [ ] Inference on many GPUs
- [ ] Inference on Specialized Hardware
- [ ] Custom hardware for training
- [ ] Instantiating a big model
- [ ] Debugging
- [ ] Hyperparameter Search using Trainer API
- [ ] XLA Integration for TensorFlow Models
CONTRIBUTE
- [ ] How to contribute to transformers?
- [ ] How to add a model to ๐ค Transformers?
- [ ] How to convert a ๐ค Transformers model to TensorFlow?
- [ ] How to add a pipeline to ๐ค Transformers?
- [ ] Testing
- [ ] Checks on a Pull Request
- [ ] ๐ค Transformers Notebooks
- [ ] Community resources
- [ ] Benchmarks
- [ ] Migrating from previous packages
CONCEPTUAL GUIDES
- [ ] Philosophy
- [ ] Glossary
- [ ] What ๐ค Transformers can do
- [ ] How ๐ค Transformers solve tasks
- [ ] The Transformer model family
- [ ] Summary of the tokenizers
- [ ] Attention mechanisms
- [ ] Padding and truncation
- [ ] BERTology
- [ ] Perplexity of fixed-length models
- [ ] Pipelines for webserver inference
Other relevant PRs along the way
- Enable easy Table of Contents editing https://github.com/huggingface/transformers/pull/22581
- Added forgotten internal English anchors for sagemaker.mdxhttps://github.com/huggingface/transformers/pull/22549
- Fixed anchor links for auto_class,traininghttps://github.com/huggingface/transformers/pull/22796
- Update ToC from upstream https://github.com/huggingface/transformers/pull/23112
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hello @sgugger, may you please add the WIP tag to this issue? Thank you so much.
For contributors and PseudoLab team members, please see a PR template gist (raw) that could ease your first PR experience. @0525hhgus, @KIHOON71, @gabrielwithappy, @jungnerd, @sim-so, @HanNayeoniee, @wonhyeongseo
Dear @sgugger, would you add document label to this issue?
I think other issues for the translation have a document label.
Thank you in advance
@wonhyeongseo
I changed my PR with a new PR template. would you change
Load pretrained instances with an AutoClass to [WIP]๐[i18n-KO] Translate autoclass_tutorial to Korean and Fix the typo of quicktour #22533
@sgugger wow! Thank you a million! :-)
@sgugger Dear HuggingFace Team,
I hope you are doing well. My name is Wonhyeong Seo from the Pseudo Lab team. As you may know, we are actively working on localizing the huggingface/transformers repository documentation into Korean. Our goal is to make this valuable resource more accessible to Korean-speaking users, thereby promoting the development of NLP and machine learning in Korea and beyond.
We are currently in the process of applying for government sponsorship to support our localization efforts. To strengthen our application, we kindly request your permission to use the documentation's Google Analytics data to include in our reports. This data will help us demonstrate the impact of our work and the potential benefits of localizing the documentation.
Additionally, we would be grateful for any feedback or suggestions from the HuggingFace team regarding our localization project. Your insights will be invaluable in ensuring our efforts align with your vision and standards, and in fostering a successful collaboration.
Thank you for considering our request. We look forward to your response and the opportunity to work together to expand the reach of the huggingface/transformers repository.
Best regards, Hyunseo Yun, Kihoon Son, Gabriel Yang, Sohyun Sim, Nayeon Han, Woojun Jung, Wonhyeong Seo The Localization Initiative members of Pseudo Lab
Hey @wonhyeongseo, thanks for all you work on translating the documentation to Korean!
Do you mind contacting me at lysandre at hf.co so we may see how best to help you?
Welcome to a simple guide on how to use ChatGPT to speed up the translation process. By following these guidelines, you can create a first draft in less than an hour. Please note that it is essential to proofread your work thoroughly before sharing it with your colleagues.
(Optional) If you want to extract only the content without code blocks, tables, and redundant new lines, you can use the command sed '/```/,/```/d' file.md | sed '/^|.*|$/d' | sed '/^$/N;/^\n$/D'. In case you are using a mobile device, you can check the link https://sed.js.org/ for using sed online.
To initiate the translation process, you need to provide your sentences as input to ChatGPT. Your first prompt should look like this:
What do these sentences about Hugging Face Transformers (a machine learning library) mean in Korean? Please do not translate the word after a ๐ค emoji as it is a product name.
```md
<your sentences>
After submitting the first prompt, you can use the following prefix for the next ten prompts:
```next-part
<your sentences>
Note that after ten prompts, you must remind ChatGPT of the task if you are not using LangChain.
By following these guidelines, you can create a first draft of your translation in a shorter time frame. However, it is crucial to emphasize that the quality of the final output depends on the accuracy of the input and the proofreading process.
PS: Please note that we do not have a Korean LLM that can automate the proofreading process at the moment. However, in July, Naver plans to launch their HyperCLOVA Korean LLM model, which might automate the entire process. We are optimistic that our government proposal will be accepted, allowing us to increase our talent pool and work towards achieving a more automated translation process with them.
Dear @LysandreJik ,
I hope you are doing well. I wanted to inform you that I have sent an email with the subject line "[i18n-KO] Request for Collaboration: Hugging Face Mentorship Program." Whenever you have a moment, please take a look and provide a response. Thank you so much for your interest to this collaboration. If you have any questions, please don't hesitate to contact me.
Best regards, Wonhyeong Seo
@gabrielwithappy @sim-so @jungnerd @HanNayeoniee @0525hhgus @KIHOON71
From this merge of model_sharing.mdx #22991 , I learned that we don't have to git rebase -i as other open source libraries mandate. Therefore, I propose we commit in 4 steps like this:
- docs: ko: <file-name>- As we always do for the first commit. Copy the initial English file under- koand edit TOC: both external and (soon-to-be-automated) internal.
From this point forward, you may need to squash commits in each step.
- feat: [nmt|manual] draft- Machine-translate the entire file with: dedicated translators, prompts, or any kind of automation. You may choose to translate manually, and that is ok as long as you specify it in the commit message.
- fix: manual edits- Proofread the draft thoroughly.
- fix: resolve suggestions- Get reviews and resolve suggestions.
With this, it will be easier for collaborators to see the original English and your changes side by side. Not to mention, we can use diffs as pre-training data for the in-house rlhf translation model.
@ArthurZucker @sgugger , when merging a PR, how is the main commit message decided if there are multiple commits? Do you have to manually write it, or is the first commit message of the PR selected? Thank you for your insights and continued support. Much love from Korea ๐ฐ๐ท๐๐๐
The main commit message is the title of the PR.
Hey all! As some people were interested in a place to discuss about translations, we opened a category in the HF Discord server with a category for internationalization and translation efforts, including a Korean channel!
Hi Pseudo Lab friends! I just wanted to provide a quick update on where the translation progress currently stands:
- 73% done โ
- 6 PRs pending review; once merged, you'll be up to 81% ๐
- 15 files left to translate before โจ 100% โจ
Great work, and big thanks again for all your contributions to fully translate the ๐ค Transformers documentation.
์๋
ํ์ธ์ ๊ฐ์ธ์ ์ผ๋ก text generation part์ ๋ฒ์ญ์ ์ฐธ์ฌํ๊ณ ์ ํฉ๋๋ค.
draft๊ฐ ์์ฑ๋๋ฉด PR๋ณด๋ด๋๋ฆฌ๊ฒ ์ต๋๋ค!
Hi All! I would like to participate the translation job (especailly the part of text generation).
If a first draft is done, I will send a PR request and then let you know.
huggingface_hub์ docs๋ฅผ transformer๋ก ์๋ชป ๋ฉ์ ํ์ต๋๋ค. ํ์ฌ ์์ ํด ๋์์ผ๋ฉฐ, ๋ฐ๋ก ์ ๋ฉ์ ์ ๋ฌด์ํด์ฃผ์ธ์. ์ฃ์กํฉ๋๋ค.
I incorrectly mentioned huggingface_hub's docs as a transformer, I've fixed it now, please ignore the comment immediately above, sorry.