course icon indicating copy to clipboard operation
course copied to clipboard

Translate to Italian

Open lewtun opened this issue 2 years ago β€’ 55 comments

Hi there πŸ‘‹

Let's translate the course to Italian so that the whole community can benefit from this resource 🌎!

Below are the chapters and files that need translating - let us know here if you'd like to translate any and we'll add your name to the list. Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue.

πŸ™‹ If you'd like others to help you with the translation, you can also post in our forums or tag @_lewtun on Twitter to gain some visibility.

Chapters

0 - Setup

1 - Transformer models

2 - Using πŸ€— Transformers

3 - Fine-tuning a pretrained model

4 - Sharing models and tokenizers

5 - The πŸ€— Datasets library

6 - The πŸ€— Tokenizers library

7 - Main NLP tasks

8 - How to ask for help

Events

lewtun avatar Mar 29 '22 12:03 lewtun

Hello Lewis, I'd love to contribute! I'm a postdoctoral researcher in theoretical linguistics at the University of Cambridge, UK. Italian is my native language. I'd love to traslate modules 1 and 2 for a start. The only thing that scares me a bit is that I'm new to GitHub, so I might end up needing some help...

CaterinaBi avatar Mar 29 '22 15:03 CaterinaBi

Hey @CaterinaBi thank you - we'd love to have your help with the translation! Feel free to create a post and tag me with @lewtun on our forums (https://discuss.huggingface.co/c/course/20) if you need some help on the GitHub side πŸ€—

lewtun avatar Mar 29 '22 15:03 lewtun

Hey @lewtun, amazing! I'll start straight away. I'll translate the 'Transformer models' and 'Using HF Transformers' then. Do you want me to take care of the Setup instructions, too?

CaterinaBi avatar Mar 29 '22 15:03 CaterinaBi

Hey @lewtun, amazing! I'll start straight away. I'll translate the 'Transformer models' and 'Using HF Transformers' then. Do you want me to take care of the Setup instructions, too?

Awesome! Sorry I tagged you for the Setup section by accident 😬 . On the other hand, that might be an easy way to get familiar with GitHub and pull requests, so maybe you'd like to start there?

lewtun avatar Mar 29 '22 15:03 lewtun

Yes, I'll start from there and tag you in the forum if I get lost (get ready to hear from me soon!).

CaterinaBi avatar Mar 29 '22 16:03 CaterinaBi

Hi @lewtun, I'd also like to contribute. I have a PhD in computational neuroscience from Γ‰cole Polytechnique FΓ©dΓ©rale de Lausanne. Italian is my native language as well. How about I start with module 3, and see how it goes? Is that acceptable?

@lewtun, @CaterinaBi, should we think of a way to "standardize" our translations (a shared glossary/vocabulary or something similar)? To make sure we all translate common things such as "train a neural network" in the same way.

Thank you!

sharkovsky avatar Mar 29 '22 16:03 sharkovsky

Hey @sharkovsky, thanks for helping out and good idea about a shared glossary! Feel free to create a comment here which lists the core terms. I'll also add your name to module 3 :)

lewtun avatar Mar 29 '22 16:03 lewtun

Hi! I would also be happy to help. I'm a phd student at Northeastern University and Italian is my native language. I can take chapter 4 if no one is working on that.

ClonedOne avatar Mar 29 '22 18:03 ClonedOne

Hi @sharkovsky, having a shared glossary is a terrific idea. What about we take a few days to go through the materials, then have a quick chat and publish the standardised translations here?

CaterinaBi avatar Mar 29 '22 18:03 CaterinaBi

Hi @ClonedOne thank you for offering to help! I've added your name to Chapter 4 πŸš€ !

lewtun avatar Mar 30 '22 07:03 lewtun

Hi! I would like to help. I am a PhD student at University of Napoli "L'Orientale", and Italian is my native language. I could work on Chapter 5 if nobody's working on it!

Nolanogenn avatar Mar 30 '22 11:03 Nolanogenn

Thank you @Nolanogenn for offering to help! I've added your name to the list πŸ™

lewtun avatar Mar 30 '22 13:03 lewtun

@lewtun, @CaterinaBi, @ClonedOne, @Nolanogenn maybe we can come up with a strategy for how to translate common words, for example "machine learning". Here are some options:

  1. always leave it in english
  2. always translate it in italian (in this case, the official translation is apprendimento automatico)
  3. always translate it in italian, but in the first instance have the english term also associated to it.

The third option looks something like "apprendimento automatic (machine learning in inglese)"

Wikipedia seems to favour option 3, and I would also vote in favour of that. I know that the italian term always sounds a bit "weird", but I feel that since we're making the effort to do a translation anyway, it's nice to try to use as many italian words as possible.

But I'm open to discussion, what is your opinion?

sharkovsky avatar Mar 30 '22 15:03 sharkovsky

Thanks for the insight and suggestions into how we can handle the machine learning jargon @sharkovsky ! I really like the analogy with Wikipedia, so would also favour option (3) too. I'm putting together a general TRANSLATING.md guide, so will add this suggestion if the other Italian speakers agree it makes sense :)

lewtun avatar Mar 30 '22 15:03 lewtun

Ah, another issue that comes up in italian and may appear in other languages is how you want to address the reader. In english you say: "But what if you want to ...?" In italian you should choose between:

  1. (informal singular you) "Ma cosa fare se vuoi ....?"
  2. (informal plural you) "Ma cosa fare se volete ....?"
  3. (formal singular) "Ma cosa fare se vuole ...?"
  4. (impersonal) "Ma cosa fare se si vuole ...?"

Option 4 is equivalent to the english "But what if one wants to ...?"

I would vote for option 4, except those rare cases where it sounds really clunky and weird, where I would fall back on option 2. But as before, I am open to discuss other ideas!

You'll probably have the same issue in other languages (french and spanish at least, I assume), so you want to enforce a "centralized" approach through your TRANSLATING.md I'll be happy to follow that as well.

sharkovsky avatar Mar 30 '22 15:03 sharkovsky

@CaterinaBi, @ClonedOne, @Nolanogenn you can find my first attempt at a translation of one file in my fork. I'm happy to receive some feedback if you think some things can be improved/better expressed... I'd rather discuss as much as possible now that we're still in a preliminary phase πŸ˜„

In a provisional manner, I also created a first glossary of terms that I think could be useful. But again, I'm happy to discuss both the translations and the format of the glossary! For example, now that I think of it, putting a file in my fork is probably not the best way to share a glossary.... @lewtun do you have any suggestions for something that we could all see and edit?

sharkovsky avatar Mar 30 '22 16:03 sharkovsky

Hi @lewtun, I'm A Data Scientist and a Chatbot Developer and I'd like to help with chapter 6. I'm attending πŸ€—HF course and was about to start that chapter, it would be great to translate it while learning.

Italian is my main language!

davidemastricci avatar Mar 30 '22 17:03 davidemastricci

@sharkovsky totally agree with both your points. I really like the glossary idea! I ended up with mostly the same translations :) except for a couple of things I'd like to suggest. Maybe we should move the discussion about the glossary on a forum post, so that it's easier to access it and suggest edits?

ClonedOne avatar Mar 30 '22 17:03 ClonedOne

@lewtun, @CaterinaBi, @ClonedOne, @Nolanogenn maybe we can come up with a strategy for how to translate common words, for example "machine learning". Here are some options:

  1. always leave it in english
  2. always translate it in italian (in this case, the official translation is apprendimento automatico)
  3. always translate it in italian, but in the first instance have the english term also associated to it.

The third option looks something like "apprendimento automatic (machine learning in inglese)"

Wikipedia seems to favour option 3, and I would also vote in favour of that. I know that the italian term always sounds a bit "weird", but I feel that since we're making the effort to do a translation anyway, it's nice to try to use as many italian words as possible.

But I'm open to discussion, what is your opinion?

@sharkovsky Since there is a little barrier approaching Hugging Face library, meaning that you should be familiar with terms like Machine Learning and Deep Learning, adding translation that sounds weird in Italian (ex. "apprendimento automatico" or "apprendimento profondo") could make reading less fluent.

davidemastricci avatar Mar 30 '22 17:03 davidemastricci

Thanks for this great discussion @sharkovsky - it definitely exposes some subtleties with translation projects :)

For the glossary, I suppose the simplest thing right now would be to share a Google / Notion doc that others can make suggestions to. Notion is probably easier since it supports Markdown and will make it simple to copy back to this repo.

As for how we distribute the glossary, I see two possibilities:

  1. Include it as a standalone file to help guide translators
  2. Add it as a new chapter (e.g. at the very end of the course) in an MDX file and render that on the website.

If you think a glossary would be helpful for course readers, then I would favour option 2.

lewtun avatar Mar 31 '22 14:03 lewtun

Ah, another issue that comes up in italian and may appear in other languages is how you want to address the reader. In english you say: "But what if you want to ...?" In italian you should choose between:

  1. (informal singular you) "Ma cosa fare se vuoi ....?"
  2. (informal plural you) "Ma cosa fare se volete ....?"
  3. (formal singular) "Ma cosa fare se vuole ...?"
  4. (impersonal) "Ma cosa fare se si vuole ...?"

Option 4 is equivalent to the english "But what if one wants to ...?"

I would vote for option 4, except those rare cases where it sounds really clunky and weird, where I would fall back on option 2. But as before, I am open to discuss other ideas!

You'll probably have the same issue in other languages (french and spanish at least, I assume), so you want to enforce a "centralized" approach through your TRANSLATING.md I'll be happy to follow that as well.

Hi guys,

sorry for the late reply but I took a day off yesterday.

I agree with the need to standardise our translations, although I am quite torn when it comes to the question of whether or not we want to translate the technical terms. I believe that if we want a clean Italian version we should use the proposed form 'apprendimento automatico (machine learning)' but at the same time it's true that it's almost a pity to do so while literally anyone in Italy says 'machine learning' (I had to google the translation myself, I wasn't even aware that 'apprendimento automatico' was a thing). So what do we do? @davidemastricci, you had a good point there.

As for the way we address the reader that @sharkovsky mentioned ('But what if you want to ...?') I believe the best translation in Italian would be with an infinitive: 'Ma cosa/come fare per...'. None of the ones that were suggested sound natural to me.

What about the glossary, are we going to add a .mdx file here?

CaterinaBi avatar Mar 31 '22 19:03 CaterinaBi

@CaterinaBi, @ClonedOne, @Nolanogenn you can find my first attempt at a translation of one file in my fork. I'm happy to receive some feedback if you think some things can be improved/better expressed... I'd rather discuss as much as possible now that we're still in a preliminary phase πŸ˜„

In a provisional manner, I also created a first glossary of terms that I think could be useful. But again, I'm happy to discuss both the translations and the format of the glossary! For example, now that I think of it, putting a file in my fork is probably not the best way to share a glossary.... @lewtun do you have any suggestions for something that we could all see and edit?

Hi @sharkovsky , I've checked out your fork and your first translation seems fine to me ;)

CaterinaBi avatar Mar 31 '22 19:03 CaterinaBi

Hi everyone, following @ClonedOne sensible suggestion, I converted this discussion into a forum post.

@CaterinaBi, @lewtun, @davidemastricci I tried to interpret your votes, but please feel free to correct any mistakes I made.

Everyone, please let's use the forum post to discuss from now on since it will be much clearer. I will try to monitor it closely and add any words that you suggest to the vocabulary as quickly as possible.

sharkovsky avatar Apr 01 '22 08:04 sharkovsky

@sharkovsky forum link do not work anymore.

davidemastricci avatar Apr 01 '22 14:04 davidemastricci

@sharkovsky forum link do not work anymore.

yeah it's been marked as spam by the automatic filter, it should be back soon I hope.

sharkovsky avatar Apr 01 '22 14:04 sharkovsky

Hello everyone, I am Michele and recently graduated in Artificial Intelligence at the University of Bologna. Italian is my native language and I would be glad to join the translation! Since Chapter 7 is still to be assigned, I propose to help with that.

michimichiamo avatar Apr 01 '22 14:04 michimichiamo

Hey @sharkovsky thanks for creating the forum post ~I've asked one of the admins to unblock it and hopefully that happens soon~ Edit: it's fixed!

@michimichiamo I've added your name to the list - welcome!

lewtun avatar Apr 01 '22 14:04 lewtun

Hey @sharkovsky thanks for creating the forum post - I've asked one of the admins to unblock it and hopefully that happens soon 🀞

Thank you, it just got unblocked :D

sharkovsky avatar Apr 01 '22 14:04 sharkovsky

By the way, I realised from the other translations that we need the first section from Chapter 1 to be translated in order for course to render on the website. My suggestions would be:

  • @CaterinaBi would you like to open a pull request with the first section translated?
  • Iterate on the glossary in the forum, and then add it as a new "chapter" to the course.

For the second point, we can then have a section in the _toctree.yml file with something like:

- title: Glossario
  sections:
  - local: glossary/1
    title: Glossario 

I think this way Italian readers can benefit from the great work you're doing to handle the various bits of ML jargon!

lewtun avatar Apr 01 '22 14:04 lewtun

Hi @lewtun,

just to clarify: we've been calling it a glossary, but it's actually just a vocabulary, i.e. a 1-1 mapping of english and italian terms. Does this meet your expectations? Or did you want us to provide short explanations for each term as well? In any case, I'm happy to oblige!

@lewtun is there a way to make my forum post editable by anyone, so everyone can contribute to the glossary directly? Otherwise I'm happy to monitor the post and add terms as people suggest them, but of course letting everyone do it for themselves would simplify things.

To everyone: the decision on whether we should translate technical terms such as "machine learning", "training set", "labels", etc... still hasn't been finalized. Ideally, everyone should express an opinion! Please let us know what you think in the forum!

Finally, I've added a new "question" in the post: should we translate the comments in the code?

sharkovsky avatar Apr 01 '22 14:04 sharkovsky