YAML: extract keys (with scope)
The name of the parser: Yaml The command line you used to run ctags:
ctags --options=NONE test.yaml
The content of input file:
en:
test: 2
other: 10
The tags output you are not satisfied with:
!_TAG_FILE_FORMAT 2 /extended format; --format=1 will not append ;" to lines/
!_TAG_FILE_SORTED 1 /0=unsorted, 1=sorted, 2=foldcase/
!_TAG_OUTPUT_EXCMD mixed /number, pattern, mixed, or combineV2/
!_TAG_OUTPUT_FILESEP slash /slash or backslash/
!_TAG_OUTPUT_MODE u-ctags /u-ctags or e-ctags/
!_TAG_PATTERN_LENGTH_LIMIT 96 /0 for no limit/
!_TAG_PROC_CWD /home/akemrir/ //
!_TAG_PROGRAM_AUTHOR Universal Ctags Team //
!_TAG_PROGRAM_NAME Universal Ctags /Derived from Exuberant Ctags/
!_TAG_PROGRAM_URL https://ctags.io/ /official site/
!_TAG_PROGRAM_VERSION 5.9.0 /p5.9.20220828.0/
The tags output you expect:
...
en.test test.yaml /^main(void)$/;" kind:function line:2 language:Yaml signature:(void) keys
en.other test.yaml /^main(void)$/;" kind:function line:3 language:Yaml signature:(void) keys
...
this could be tricky to omit one level?
...
test test.yaml /^main(void)$/;" kind:function line:2 language:Yaml signature:(void) keys
other test.yaml /^main(void)$/;" kind:function line:3 language:Yaml signature:(void) keys
...
The version of ctags:
$ ctags --version
Universal Ctags 5.9.0(p5.9.20220828.0), Copyright (C) 2015-2022 Universal Ctags Team
Universal Ctags is derived from Exuberant Ctags.
Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert
Compiled: Aug 29 2022, 21:05:02
URL: https://ctags.io/
Optional compiled features: +wildcards, +regex, +iconv, +option-directory, +xpath, +json, +interactive, +sandbox, +yaml, +packcc, +optscript, +pcre2
How do you get ctags binary:
Archlinux official repository extra/ctags 1:5.9.20220828.0-1
I would have a map of keys paths from yaml file. This is for use with I18n ruby gem, for translations. Then I could for example use it with vim to jump directly to translation and suggest path from vim addon.
Any hints to start off somewhere? Maybe it would be similar to SASS?
Any hints to start off somewhere?
u-ctags has a Yaml parser based on libyaml. See https://github.com/universal-ctags/ctags/blob/master/parsers/yaml.c.
I wonder if your ctags executable is linked to libyaml. Try --list-features option:
$ ./ctags --list-features | grep yaml
yaml linked with library for parsing yaml input
Even if your ctags executable is linked to libyaml, the yaml parser doesn't extract keys.
You can implement what you want in two ways. A. developing a subparser specialized to "I18n ruby gem". https://github.com/universal-ctags/ctags/blob/master/parsers/ansibleplaybook.c and https://github.com/universal-ctags/ctags/blob/master/parsers/openapi.c took the same approach. You can get hints from them. See also https://docs.ctags.io/en/latest/running-multi-parsers.html#base-sub-parsers .
B. extending the yaml parser to extract keys in a yaml file You have to study many things in this approach.
- the concept of scope in ctags (See man ctags)
- study parsers/json.c Our json parser extracts all keys.
- study libyaml
- study corkAPI https://docs.ctags.io/en/latest/internal.html?highlight=cork#cork-api
I can give you more hints. The first step is to choose one of two.
Thanks for hints. I need to get into topic :)
I think A approach is ok. This is specific functionality to grab full/partial paths, fitting into subparser.
I think A approach is ok. This is specific functionality to grab full/partial paths, fitting into subparser.
Ok. Could you tell me more about "I18n ruby gem"? I think I can provide more specific hints.
I18n gem uses yaml files as source of translations. For example when you have this:
---
en:
sequel:
errors:
or: "or"
Then when general english is used in session, this ruby call will get text from it:
I18n.t('sequel.errors.or') # will get "or"
I'am thinking about approach to this, index from beggining or from certain level or both:
en.sequel.errors.or
sequel.errors.or
Key level indexing like in json is very simple, could be done the regexp way --regex-yaml=REGEX with [\w_]+: for example.
But whole path would be very useful to be more precise with checking what was used.
And I will be able to open in few languages places where key is placed.
Thank you for your explanation. I found more questions.
A. I think https://github.com/svenfuchs/rails-i18n/blob/master/rails/locale/en.yml is a real example input file. Am I correct? If I'm correct, what string do you want to extract from .yml file including an array like:
---
en:
date:
abbr_day_names:
- Sun
In this example, an array element is at the leaf. If an array is in the middle of the path like:
---
en:
syscall:
- name: read
arity: 3
- name: write
arity: 3
what kind of string do you expect ctags extracts?
C. Other than having .yml as an extension, I think there is no rule for the file name. Am I correct?
A. Yes, real life example. I am intrested only in key. To quickly jump with tags in vim where it's placed, so I expect:
en.date.abbr_day_names
date.abbr_day_names
en.syscall
syscall
B. When they are used in code with full form they appear as I18n.t('date.abbr_day_names'), I18n.t('syscall')
Then templating language (erb/haml) or ruby uses them in iteration.
We could also encounter this syntax, so called "Flow" scalars
en:
key: >
Your long
string here.
command: |
echo "--- Install gems"
bundle install
Scalar indicators: '|' : Block scalar indicator. '>' : Folded scalar indicator. '-' : Strip chomp modifier ('|-' or '>-'). '+' : Keep chomp modifier ('|+' or '>+'). 1-9 : Explicit indentation modifier ('|1' or '>2'). # Modifiers can be combined ('|2-', '>+1').
So in fact general capture line looks like this: key: null (before array) key: a key: 2 key: "test" key: 'test' key: > key: | key: |- etc
This reference card will be useful later: https://yaml.org/refcard.html
C. I agree. User could filter files by using --exclude
I've seen both yaml and yml extensions in use.
Two questions: D. Could we limit it with parameter? To start gathering from some level without this en: or when needed to allow it. E. Could we gather both? Full key and only "end" of it? Maybe with use of kind to optionally turn of each?
Thanks for helping me out.
I have one idea, to take into account lines with colon. To get only matching this ^\s?[a-zA-Z0-9_]:
About A, my understanding is that we can ignore the array (or sequence in Yaml) regardless of the position in a YAML tree structure.
About B I would like to confirm that you want to extract en.key, en.command, key and command from
en:
key: >
Your long
string here.
command: |
echo "--- Install gems"
bundle install
Am I correct?
Two questions: D. Could we limit it with parameter? To start gathering from some level without this en: or when needed to allow it. E. Could we gather both? Full key and only "end" of it? Maybe with use of kind to optionally turn of each?
Yes for both questions.
https://github.com/ruby-i18n/i18n looks like the reference.
A. yes B. yes And the last one, also yes
Thank you. The last question is the name of parser. Do you have any idea? "I18NRubyGem" looks suitable for me. How do you think?
That's good name.
Thank you. I will make a prototype based on this discussion.
We have one critical limitation.
You must enable the "I18NRubyGem" manually when you want to use it.
Any automatic parser detection may not work.
because .yaml and .yml are so generic as a hint for choosing a parser.
In the command line, you must do like "ctags --languages=+I18NRubyGem ...".
ok, no problem I will use ~/.ctags.d configuration
I found this need overhauling of yaml.c.
It conflicts with idea of sub-parser? When I would choose this way? Instead of changing yaml.c ?
The overhaul was done. I implemented what I wrote here and made a pull request (#3895).
Very well done. I had to compile it on my own machine and it worked almost in all cases. I can't make it to index key name, the last part of it. I have only full i18n paths in tags file.
---
pl:
emails:
title: PanTracker
footer: Zasilane przez PanTracker
registration:
subject: Rejestracja konta
title: Rejestracja
message: Twoje konto zostało pomyślnie założone, aktywuj je przy pomocy skrótu poniżej.
button: Aktywuj
emails.footer config/locales/emails.pl.yml /^ footer: Zasilane przez PanTracker$/;" kind:key line:5 extras:subparser,domainless
emails.registration.button config/locales/emails.pl.yml /^ button: Aktywuj$/;" kind:key line:10 extras:subparser,domainless
emails.registration.message config/locales/emails.pl.yml /^ message: Twoje konto zostało pomyślnie założone, aktywuj je przy pomocy skrótu poniż/;" kind:key line:9 extras:subparser,domainless
emails.registration.subject config/locales/emails.pl.yml /^ subject: Rejestracja konta$/;" kind:key line:7 extras:subparser,domainless
emails.registration.title config/locales/emails.pl.yml /^ title: Rejestracja$/;" kind:key line:8 extras:subparser,domainless
emails.title config/locales/emails.pl.yml /^ title: PanTracker$/;" kind:key line:4 extras:subparser,domainless
pl.emails.footer config/locales/emails.pl.yml /^ footer: Zasilane przez PanTracker$/;" kind:key line:5 extras:subparser,domainful
pl.emails.registration.button config/locales/emails.pl.yml /^ button: Aktywuj$/;" kind:key line:10 extras:subparser,domainful
pl.emails.registration.message config/locales/emails.pl.yml /^ message: Twoje konto zostało pomyślnie założone, aktywuj je przy pomocy skrótu poniż/;" kind:key line:9 extras:subparser,domainful
pl.emails.registration.subject config/locales/emails.pl.yml /^ subject: Rejestracja konta$/;" kind:key line:7 extras:subparser,domainful
pl.emails.registration.title config/locales/emails.pl.yml /^ title: Rejestracja$/;" kind:key line:8 extras:subparser,domainful
pl.emails.title config/locales/emails.pl.yml /^ title: PanTracker$/;" kind:key line:4 extras:subparser,domainful
But I saw it in unit tests, propably I am missing some setting. Very nice :)
Very well done. I had to compile it on my own machine and it worked almost in all cases. I can't make it to index key name, the last part of it. I have only full i18n paths in tags file.
Did you expect ctags emits tags for title, footer, subject, title, message, and button?
But I saw it in unit tests, propably I am missing some setting.
No, you were not. The parser doesn't emit the last components are tags.
After getting your reply, I will update the pull request.
Thank you for your feedback.
Hi, yes
All "tails" would be welcome. It would be useful when some use shorter version in scope of some context.
Just checked newest changes. Very nice and explanatory documentation. Could you append info about this?
--langmap=I18nRubyGem:.yml
This worked for me without force. It may be useful for others? What do you think?
--langmap=I18nRubyGem:.yml
That is understandable, but this puts other YAML-based parsers into chaos.
This is the reason I confirmed https://github.com/universal-ctags/ctags/issues/3523#issuecomment-1302670006 .
I'm wrong in English. So, if you know good sentences that balance between the limitations, avoiding chaos and usability, could you write them down here? I will merge them into my pull request.
No, it's good. But it needs more detail. Not everyone will catchup like we do. So in general to make it usable, user would need to something like this.
ctags --sort=yes -f tags
ctags --sort=yes --extras=+q --language-force=I18nRubyGem --languages=+I18nRubyGem --fields=+E --exclude=tags -a -f tags
First one would do generic preparation, second will append to tags file.
I am missing something from manual? About general usage?
Secondly, if langmap for one yaml based parser is bad, should I do it this way? I feel that I am missing something.
--langmap=+I18nRubyGem:.yml
--langmap=+Yaml:.yml
--langmap=+AnsiblePlaybook:.yml
Writing about -a option in the man page looks helpful. I will add sentences to the man page.
Secondly, if langmap for one yaml based parser is bad, should I do it this way? I feel that I am missing something.
--langmap=+I18nRubyGem:.yml --langmap=+Yaml:.yml --langmap=+AnsiblePlaybook:.yml
These don't work at all. If they work, ctags may have a bug.
Is there possibility to enable all parsers without specifying single?
Is there possibility to enable all parsers without specifying single?
There is no possibility. If you think the limitation is critical. I will withdraw #3895. I recognized this was a critical limitation. https://github.com/universal-ctags/ctags/issues/3523#issuecomment-1302670006
YAML files using .yaml or .yml as file extensions are not self-descriptive. There is no heuristic for recognizing whether a .yaml file is for I18nRubyGem or not.
Unlike XML, YAML is not self-descriptive. Only with the user's intervention, ctags can only do something useful for such input.
ok, thanks for your time and explanation very nice addition :+1:
I found an excellent heuristic; a YAML file may be an I18nRubyGem file if the top-level entries are locale names.