ctags icon indicating copy to clipboard operation
ctags copied to clipboard

YAML: extract keys (with scope)

Open akemrir opened this issue 3 years ago • 17 comments

The name of the parser: Yaml The command line you used to run ctags:

ctags --options=NONE test.yaml

The content of input file:

en:
  test: 2
  other: 10

The tags output you are not satisfied with:

!_TAG_FILE_FORMAT	2	/extended format; --format=1 will not append ;" to lines/
!_TAG_FILE_SORTED	1	/0=unsorted, 1=sorted, 2=foldcase/
!_TAG_OUTPUT_EXCMD	mixed	/number, pattern, mixed, or combineV2/
!_TAG_OUTPUT_FILESEP	slash	/slash or backslash/
!_TAG_OUTPUT_MODE	u-ctags	/u-ctags or e-ctags/
!_TAG_PATTERN_LENGTH_LIMIT	96	/0 for no limit/
!_TAG_PROC_CWD	/home/akemrir/	//
!_TAG_PROGRAM_AUTHOR	Universal Ctags Team	//
!_TAG_PROGRAM_NAME	Universal Ctags	/Derived from Exuberant Ctags/
!_TAG_PROGRAM_URL	https://ctags.io/	/official site/
!_TAG_PROGRAM_VERSION	5.9.0	/p5.9.20220828.0/

The tags output you expect:

...
en.test	test.yaml	/^main(void)$/;"	kind:function	line:2	language:Yaml	signature:(void)	keys
en.other	test.yaml	/^main(void)$/;"	kind:function	line:3	language:Yaml	signature:(void)	keys
...

this could be tricky to omit one level?
...
test	test.yaml	/^main(void)$/;"	kind:function	line:2	language:Yaml	signature:(void)	keys
other	test.yaml	/^main(void)$/;"	kind:function	line:3	language:Yaml	signature:(void)	keys
...

The version of ctags:

$ ctags --version
Universal Ctags 5.9.0(p5.9.20220828.0), Copyright (C) 2015-2022 Universal Ctags Team
Universal Ctags is derived from Exuberant Ctags.
Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert
  Compiled: Aug 29 2022, 21:05:02
  URL: https://ctags.io/
  Optional compiled features: +wildcards, +regex, +iconv, +option-directory, +xpath, +json, +interactive, +sandbox, +yaml, +packcc, +optscript, +pcre2

How do you get ctags binary:

Archlinux official repository extra/ctags 1:5.9.20220828.0-1


I would have a map of keys paths from yaml file. This is for use with I18n ruby gem, for translations. Then I could for example use it with vim to jump directly to translation and suggest path from vim addon.

Any hints to start off somewhere? Maybe it would be similar to SASS?

akemrir avatar Oct 31 '22 15:10 akemrir

Any hints to start off somewhere?

u-ctags has a Yaml parser based on libyaml. See https://github.com/universal-ctags/ctags/blob/master/parsers/yaml.c.

I wonder if your ctags executable is linked to libyaml. Try --list-features option:

$ ./ctags --list-features | grep yaml
yaml              linked with library for parsing yaml input

Even if your ctags executable is linked to libyaml, the yaml parser doesn't extract keys.

You can implement what you want in two ways. A. developing a subparser specialized to "I18n ruby gem". https://github.com/universal-ctags/ctags/blob/master/parsers/ansibleplaybook.c and https://github.com/universal-ctags/ctags/blob/master/parsers/openapi.c took the same approach. You can get hints from them. See also https://docs.ctags.io/en/latest/running-multi-parsers.html#base-sub-parsers .

B. extending the yaml parser to extract keys in a yaml file You have to study many things in this approach.

  • the concept of scope in ctags (See man ctags)
  • study parsers/json.c Our json parser extracts all keys.
  • study libyaml
  • study corkAPI https://docs.ctags.io/en/latest/internal.html?highlight=cork#cork-api

I can give you more hints. The first step is to choose one of two.

masatake avatar Nov 01 '22 01:11 masatake

Thanks for hints. I need to get into topic :)

akemrir avatar Nov 02 '22 16:11 akemrir

I think A approach is ok. This is specific functionality to grab full/partial paths, fitting into subparser.

akemrir avatar Nov 03 '22 07:11 akemrir

I think A approach is ok. This is specific functionality to grab full/partial paths, fitting into subparser.

Ok. Could you tell me more about "I18n ruby gem"? I think I can provide more specific hints.

masatake avatar Nov 03 '22 13:11 masatake

I18n gem uses yaml files as source of translations. For example when you have this:

---
en:
  sequel:
    errors:
      or: "or"

Then when general english is used in session, this ruby call will get text from it:

I18n.t('sequel.errors.or') # will get "or"

I'am thinking about approach to this, index from beggining or from certain level or both:

en.sequel.errors.or
sequel.errors.or

Key level indexing like in json is very simple, could be done the regexp way --regex-yaml=REGEX with [\w_]+: for example. But whole path would be very useful to be more precise with checking what was used. And I will be able to open in few languages places where key is placed.

akemrir avatar Nov 03 '22 15:11 akemrir

Thank you for your explanation. I found more questions.

A. I think https://github.com/svenfuchs/rails-i18n/blob/master/rails/locale/en.yml is a real example input file. Am I correct? If I'm correct, what string do you want to extract from .yml file including an array like:

---
en:
  date:
    abbr_day_names:
    - Sun

In this example, an array element is at the leaf. If an array is in the middle of the path like:

---
en:
  syscall:
    - name: read
      arity: 3
    - name: write
      arity: 3

what kind of string do you expect ctags extracts?

C. Other than having .yml as an extension, I think there is no rule for the file name. Am I correct?

masatake avatar Nov 03 '22 19:11 masatake

A. Yes, real life example. I am intrested only in key. To quickly jump with tags in vim where it's placed, so I expect:

en.date.abbr_day_names
date.abbr_day_names

en.syscall
syscall

B. When they are used in code with full form they appear as I18n.t('date.abbr_day_names'), I18n.t('syscall') Then templating language (erb/haml) or ruby uses them in iteration.

We could also encounter this syntax, so called "Flow" scalars

en:
  key: >
    Your long
    string here.
  command: |
    echo "--- Install gems"
    bundle install

Scalar indicators: '|' : Block scalar indicator. '>' : Folded scalar indicator. '-' : Strip chomp modifier ('|-' or '>-'). '+' : Keep chomp modifier ('|+' or '>+'). 1-9 : Explicit indentation modifier ('|1' or '>2'). # Modifiers can be combined ('|2-', '>+1').

So in fact general capture line looks like this: key: null (before array) key: a key: 2 key: "test" key: 'test' key: > key: | key: |- etc

This reference card will be useful later: https://yaml.org/refcard.html

C. I agree. User could filter files by using --exclude I've seen both yaml and yml extensions in use.

Two questions: D. Could we limit it with parameter? To start gathering from some level without this en: or when needed to allow it. E. Could we gather both? Full key and only "end" of it? Maybe with use of kind to optionally turn of each?

Thanks for helping me out.

akemrir avatar Nov 03 '22 20:11 akemrir

I have one idea, to take into account lines with colon. To get only matching this ^\s?[a-zA-Z0-9_]:

akemrir avatar Nov 03 '22 20:11 akemrir

About A, my understanding is that we can ignore the array (or sequence in Yaml) regardless of the position in a YAML tree structure.

About B I would like to confirm that you want to extract en.key, en.command, key and command from

en:
  key: >
    Your long
    string here.
  command: |
    echo "--- Install gems"
    bundle install

Am I correct?

Two questions: D. Could we limit it with parameter? To start gathering from some level without this en: or when needed to allow it. E. Could we gather both? Full key and only "end" of it? Maybe with use of kind to optionally turn of each?

Yes for both questions.

masatake avatar Nov 03 '22 20:11 masatake

https://github.com/ruby-i18n/i18n looks like the reference.

masatake avatar Nov 03 '22 20:11 masatake

A. yes B. yes And the last one, also yes

akemrir avatar Nov 03 '22 20:11 akemrir

Thank you. The last question is the name of parser. Do you have any idea? "I18NRubyGem" looks suitable for me. How do you think?

masatake avatar Nov 03 '22 20:11 masatake

That's good name.

akemrir avatar Nov 03 '22 21:11 akemrir

Thank you. I will make a prototype based on this discussion.

We have one critical limitation. You must enable the "I18NRubyGem" manually when you want to use it. Any automatic parser detection may not work. because .yaml and .yml are so generic as a hint for choosing a parser.

In the command line, you must do like "ctags --languages=+I18NRubyGem ...".

masatake avatar Nov 03 '22 21:11 masatake

ok, no problem I will use ~/.ctags.d configuration

akemrir avatar Nov 04 '22 06:11 akemrir

I found this need overhauling of yaml.c.

masatake avatar Dec 09 '22 00:12 masatake

It conflicts with idea of sub-parser? When I would choose this way? Instead of changing yaml.c ?

akemrir avatar Dec 10 '22 15:12 akemrir

The overhaul was done. I implemented what I wrote here and made a pull request (#3895).

masatake avatar Dec 23 '23 09:12 masatake

Very well done. I had to compile it on my own machine and it worked almost in all cases. I can't make it to index key name, the last part of it. I have only full i18n paths in tags file.

---
pl:
  emails:
    title: PanTracker
    footer: Zasilane przez PanTracker
    registration:
      subject: Rejestracja konta
      title: Rejestracja
      message: Twoje konto zostało pomyślnie założone, aktywuj je przy pomocy skrótu poniżej.
      button: Aktywuj
emails.footer	config/locales/emails.pl.yml	/^    footer: Zasilane przez PanTracker$/;"	kind:key	line:5	extras:subparser,domainless
emails.registration.button	config/locales/emails.pl.yml	/^      button: Aktywuj$/;"	kind:key	line:10	extras:subparser,domainless
emails.registration.message	config/locales/emails.pl.yml	/^      message: Twoje konto zostało pomyślnie założone, aktywuj je przy pomocy skrótu poniż/;"	kind:key	line:9	extras:subparser,domainless
emails.registration.subject	config/locales/emails.pl.yml	/^      subject: Rejestracja konta$/;"	kind:key	line:7	extras:subparser,domainless
emails.registration.title	config/locales/emails.pl.yml	/^      title: Rejestracja$/;"	kind:key	line:8	extras:subparser,domainless
emails.title	config/locales/emails.pl.yml	/^    title: PanTracker$/;"	kind:key	line:4	extras:subparser,domainless
pl.emails.footer	config/locales/emails.pl.yml	/^    footer: Zasilane przez PanTracker$/;"	kind:key	line:5	extras:subparser,domainful
pl.emails.registration.button	config/locales/emails.pl.yml	/^      button: Aktywuj$/;"	kind:key	line:10	extras:subparser,domainful
pl.emails.registration.message	config/locales/emails.pl.yml	/^      message: Twoje konto zostało pomyślnie założone, aktywuj je przy pomocy skrótu poniż/;"	kind:key	line:9	extras:subparser,domainful
pl.emails.registration.subject	config/locales/emails.pl.yml	/^      subject: Rejestracja konta$/;"	kind:key	line:7	extras:subparser,domainful
pl.emails.registration.title	config/locales/emails.pl.yml	/^      title: Rejestracja$/;"	kind:key	line:8	extras:subparser,domainful
pl.emails.title	config/locales/emails.pl.yml	/^    title: PanTracker$/;"	kind:key	line:4	extras:subparser,domainful

But I saw it in unit tests, propably I am missing some setting. Very nice :)

akemrir avatar Dec 23 '23 12:12 akemrir

Very well done. I had to compile it on my own machine and it worked almost in all cases. I can't make it to index key name, the last part of it. I have only full i18n paths in tags file.

Did you expect ctags emits tags for title, footer, subject, title, message, and button?

But I saw it in unit tests, propably I am missing some setting.

No, you were not. The parser doesn't emit the last components are tags.

After getting your reply, I will update the pull request.

Thank you for your feedback.

masatake avatar Dec 23 '23 15:12 masatake

Hi, yes

All "tails" would be welcome. It would be useful when some use shorter version in scope of some context.

akemrir avatar Dec 23 '23 17:12 akemrir

Just checked newest changes. Very nice and explanatory documentation. Could you append info about this?

--langmap=I18nRubyGem:.yml

This worked for me without force. It may be useful for others? What do you think?

akemrir avatar Dec 23 '23 20:12 akemrir

--langmap=I18nRubyGem:.yml

That is understandable, but this puts other YAML-based parsers into chaos.

This is the reason I confirmed https://github.com/universal-ctags/ctags/issues/3523#issuecomment-1302670006 .

I'm wrong in English. So, if you know good sentences that balance between the limitations, avoiding chaos and usability, could you write them down here? I will merge them into my pull request.

masatake avatar Dec 23 '23 21:12 masatake

No, it's good. But it needs more detail. Not everyone will catchup like we do. So in general to make it usable, user would need to something like this.

ctags --sort=yes -f tags
ctags --sort=yes --extras=+q --language-force=I18nRubyGem --languages=+I18nRubyGem --fields=+E --exclude=tags -a -f tags

First one would do generic preparation, second will append to tags file.

I am missing something from manual? About general usage?

Secondly, if langmap for one yaml based parser is bad, should I do it this way? I feel that I am missing something.

--langmap=+I18nRubyGem:.yml
--langmap=+Yaml:.yml
--langmap=+AnsiblePlaybook:.yml

akemrir avatar Dec 24 '23 11:12 akemrir

Writing about -a option in the man page looks helpful. I will add sentences to the man page.

Secondly, if langmap for one yaml based parser is bad, should I do it this way? I feel that I am missing something.

--langmap=+I18nRubyGem:.yml --langmap=+Yaml:.yml --langmap=+AnsiblePlaybook:.yml

These don't work at all. If they work, ctags may have a bug.

masatake avatar Dec 24 '23 16:12 masatake

Is there possibility to enable all parsers without specifying single?

akemrir avatar Dec 25 '23 09:12 akemrir

Is there possibility to enable all parsers without specifying single?

There is no possibility. If you think the limitation is critical. I will withdraw #3895. I recognized this was a critical limitation. https://github.com/universal-ctags/ctags/issues/3523#issuecomment-1302670006

YAML files using .yaml or .yml as file extensions are not self-descriptive. There is no heuristic for recognizing whether a .yaml file is for I18nRubyGem or not.

masatake avatar Dec 25 '23 09:12 masatake

Unlike XML, YAML is not self-descriptive. Only with the user's intervention, ctags can only do something useful for such input.

masatake avatar Dec 25 '23 09:12 masatake

ok, thanks for your time and explanation very nice addition :+1:

akemrir avatar Dec 25 '23 10:12 akemrir

I found an excellent heuristic; a YAML file may be an I18nRubyGem file if the top-level entries are locale names.

masatake avatar Dec 25 '23 13:12 masatake