difftastic icon indicating copy to clipboard operation
difftastic copied to clipboard

Failure to identify diff in YAML files if line with key is longer than 128 chars

Open marianosimone opened this issue 2 years ago • 5 comments

Once a yaml has a line with a key that is longer than 128 characters (note that this might be the key itself, or indentation!), difft fails to see any differences afterwards:

Inputs

file1.yaml:

---
test:
 
 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa: 'a'

file2.yaml:

---
test:
 
 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa: 'b'
  fail: true

Run

$ difft file1.yaml file2.yaml
file2.yaml --- YAML
No syntactic changes.

Initial notes

simple keys are restricted to a single line and must not span more than 1024 stream characters (hence the need for the flow-key context). Note the 1024 character limit is in terms of Unicode characters rather than stream octets, and that it includes the separation following the key itself.

  • pyyaml also seems to have trouble with 128 characters. Might be unrelated, but might be that there's some underlying common assumption. See https://github.com/yaml/pyyaml/issues/157

Versions

  • OS: macOS Sierra 10.12.6
  • Difftastic: 0.26.3

marianosimone avatar Apr 24 '22 00:04 marianosimone

I'm not able to reproduce this. Could you attach the exact files you're having an issue with?

If I copy-paste your examples, I'm seeing this:

Screenshot from 2022-04-23 18-34-26

Difftastic uses purple to highlight missing position information. It's not ideal in this case, but I believe the YAML syntax is wrong (1 space indent on aaa..., 2 space indent on fail).

If I change fail: true to use a 1 space indent, I get the correct result.

Wilfred avatar Apr 24 '22 01:04 Wilfred

Sure thing, find both of them attached.

Given that GitHub didn't let me upload yamls, I renamed to csv.... and when I did that, difft correctly figured out the differences... so there's definitely something fishy going on with the yaml parsing ;)

Rename to yaml before testing:

file1.csv file2.csv

image

marianosimone avatar Apr 24 '22 04:04 marianosimone

Thanks. This looks like a tree-sitter-yaml parsing issue: difftastic is getting exactly the same parse tree for both.

$ difft --dump-ts /tmp/file2.yaml
stream (0, 0) - (2, 128)
  document (0, 0) - (2, 128)
    --- (0, 0) - (0, 3) "---"
    block_node (1, 0) - (2, 128)
      block_mapping (1, 0) - (2, 128)
        block_mapping_pair (1, 0) - (2, 128)
          flow_node (1, 0) - (1, 4)
            plain_scalar (1, 0) - (1, 4)
              string_scalar (1, 0) - (1, 4) "test"
          : (1, 4) - (1, 5) ":"
          block_node (2, 2) - (2, 128)
            block_mapping (2, 2) - (2, 128)
              block_mapping_pair (2, 2) - (2, 128)
                flow_node (2, 2) - (2, 127)
                  plain_scalar (2, 2) - (2, 127)
                    string_scalar (2, 2) - (2, 127) "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
                : (2, 127) - (2, 128) ":"

it's missing everything after the long key name.

Wilfred avatar Apr 24 '22 07:04 Wilfred

Another interesting thing that might hint you to what's going on: if you add a difference before the long key, everything works just fine again:

e.g.

file1.yaml

---
test1: true
test:
  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa: 'b'
  fail: true

file2.yaml

---
test2: true
test:
  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa: 'a'
image

marianosimone avatar Apr 24 '22 16:04 marianosimone

I am not really sure if this is the same issue, but I am having a similar trouble with comparing two helm resources:

file1.yml

{{- if and (.Values.airflow.users) (not .Values.airflow.usersUpdate) }}
{{- $podNodeSelector := include "airflow.podNodeSelector" (dict "Release" .Release "Values" .Values "nodeSelector" .Values.airflow.sync.nodeSelector) }}
{{- $podAffinity := include "airflow.podAffinity" (dict "Release" .Release "Values" .Values "affinity" .Values.airflow.sync.affinity) }}
{{- $podTolerations := include "airflow.podTolerations" (dict "Release" .Release "Values" .Values "tolerations" .Values.airflow.sync.tolerations) }}
{{- $podSecurityContext := include "airflow.podSecurityContext" (dict "Release" .Release "Values" .Values "securityContext" .Values.airflow.sync.securityContext) }}
{{- $extraPipPackages := .Values.airflow.extraPipPackages }}
{{- $volumeMounts := include "airflow.volumeMounts" (dict "Release" .Release "Values" .Values "extraPipPackages" $extraPipPackages) }}
{{- $volumes := include "airflow.volumes" (dict "Release" .Release "Values" .Values "extraPipPackages" $extraPipPackages) }}
apiVersion: batch/v1
kind: Job
metadata:
  name: {{ include "airflow.fullname" . }}-sync-users
  labels:
    app: {{ include "airflow.labels.app" . }}
    component: sync-users
    chart: {{ include "airflow.labels.chart" . }}
    release: {{ .Release.Name }}
    heritage: {{ .Release.Service }}
    {{- if .Values.airflow.sync.labels }}
    {{- toYaml .Values.airflow.sync.labels | nindent 4 }}
    {{- end }}
  annotations:
    helm.sh/hook: post-install,post-upgrade
    helm.sh/hook-weight: "0"
    helm.sh/hook-delete-policy: before-hook-creation
    {{- if .Values.airflow.sync.annotations }}
    {{- toYaml .Values.airflow.sync.annotations | nindent 4 }}
    {{- end }}
spec:
  template:
    metadata:
      annotations:
        checksum/secret-config-envs: {{ include (print $.Template.BasePath "/config/secret-config-envs.yaml") . | sha256sum }}
        checksum/secret-local-settings: {{ include (print $.Template.BasePath "/config/secret-local-settings.yaml") . | sha256sum }}
        checksum/sync-users-script: {{ include "airflow.sync.sync_users.py" . | sha256sum }}
        {{- if .Values.airflow.podAnnotations }}
        {{- toYaml .Values.airflow.podAnnotations | nindent 8 }}
        {{- end }}
        {{- if .Values.airflow.sync.podAnnotations }}
        {{- toYaml .Values.airflow.sync.podAnnotations | nindent 8 }}
        {{- end }}
        {{- if .Values.airflow.sync.safeToEvict }}
        cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
        {{- end }}
      labels:
        app: {{ include "airflow.labels.app" . }}
        component: sync-users
        release: {{ .Release.Name }}
        {{- if .Values.airflow.sync.podLabels }}
        {{- toYaml .Values.airflow.sync.podLabels | nindent 8 }}
        {{- end }}
    spec:
      restartPolicy: OnFailure
      {{- if .Values.airflow.image.pullSecret }}
      imagePullSecrets:
        - name: {{ .Values.airflow.image.pullSecret }}
      {{- end }}
      {{- if $podNodeSelector }}
      nodeSelector:
        {{- $podNodeSelector | nindent 8 }}
      {{- end }}
      {{- if $podAffinity }}
      affinity:
        {{- $podAffinity | nindent 8 }}
      {{- end }}
      {{- if $podTolerations }}
      tolerations:
        {{- $podTolerations | nindent 8 }}
      {{- end }}
      {{- if $podSecurityContext }}
      securityContext:
        {{- $podSecurityContext | nindent 8 }}
      {{- end }}
      serviceAccountName: {{ include "airflow.serviceAccountName" . }}
      initContainers:
        {{- if $extraPipPackages }}
        {{- include "airflow.init_container.install_pip_packages" (dict "Release" .Release "Values" .Values "extraPipPackages" $extraPipPackages) | indent 8 }}
        {{- end }}
        {{- if and (.Values.dags.gitSync.enabled)  (not .Values.dags.persistence.enabled)  }}
        ## git-sync is included so "airflow plugins" & "python packages" can be stored in the dags repo
        {{- include "airflow.container.git_sync" (dict "Release" .Release "Values" .Values "sync_one_time" "true") | indent 8 }}
        {{- end }}
        {{- include "airflow.init_container.check_db" (dict "Release" .Release "Values" .Values "volumeMounts" $volumeMounts) | indent 8 }}
        {{- include "airflow.init_container.wait_for_db_migrations" (dict "Release" .Release "Values" .Values "volumeMounts" $volumeMounts) | indent 8 }}
      containers:
        - name: sync-airflow-users
          {{- include "airflow.image" . | indent 10 }}
          resources:
            {{- toYaml .Values.airflow.sync.resources | nindent 12 }}
          envFrom:
            {{- include "airflow.envFrom" . | indent 12 }}
          env:
            {{- include "airflow.env" . | indent 12 }}
          command:
            {{- include "airflow.command" . | indent 12 }}
          args:
            - "python"
            - "-u"
            - "/mnt/scripts/sync_users.py"
          volumeMounts:
            {{- $volumeMounts | indent 12 }}
            - name: scripts
              mountPath: /mnt/scripts
              readOnly: true
            {{- if .Values.airflow.usersTemplates }}
            - name: templates
              mountPath: "/mnt/templates"
              readOnly: true
            {{- end }}
      volumes:
        {{- $volumes | indent 8 }}
        - name: scripts
          secret:
            secretName: {{ include "airflow.fullname" . }}-sync-users
        {{- if .Values.airflow.usersTemplates }}
        - name: templates
          projected:
            sources:
              {{- range $k, $v := .Values.airflow.usersTemplates }}
              {{- if not (regexMatch "^[a-zA-Z_][a-zA-Z0-9_]*$" $k) }}
              {{ required "each key in `airflow.usersTemplates` must match the regex: ^[a-zA-Z_][a-zA-Z0-9_]*$" nil }}
              {{- end }}
              {{- if eq ($v.kind | lower) "configmap" }}
              - configMap:
                  name: {{ $v.name | quote }}
                  items:
                    - key: {{ $v.key | quote }}
                      path: {{ $k | quote }}
              {{- else if eq ($v.kind | lower) "secret" }}
              - secret:
                  name: {{ $v.name | quote }}
                  items:
                    - key: {{ $v.key | quote }}
                      path: {{ $k | quote }}
              {{- else }}
              {{ required "each `kind` in `airflow.usersTemplates` must be one of ['configmap', 'secret']!" nil }}
              {{- end }}
              {{- end }}
        {{- end }}
{{- end }}

file2.yml

{{- if and (.Values.airflow.users) (not .Values.airflow.usersUpdate) }}
{{- $podNodeSelector := include "airflow.podNodeSelector" (dict "Release" .Release "Values" .Values "nodeSelector" .Values.airflow.sync.nodeSelector) }}
{{- $podAffinity := include "airflow.podAffinity" (dict "Release" .Release "Values" .Values "affinity" .Values.airflow.sync.affinity) }}
{{- $podTolerations := include "airflow.podTolerations" (dict "Release" .Release "Values" .Values "tolerations" .Values.airflow.sync.tolerations) }}
{{- $podSecurityContext := include "airflow.podSecurityContext" (dict "Release" .Release "Values" .Values "securityContext" .Values.airflow.sync.securityContext) }}
{{- $extraPipPackages := .Values.airflow.extraPipPackages }}
{{- $volumeMounts := include "airflow.volumeMounts" (dict "Release" .Release "Values" .Values "extraPipPackages" $extraPipPackages) }}
{{- $volumes := include "airflow.volumes" (dict "Release" .Release "Values" .Values "extraPipPackages" $extraPipPackages) }}
apiVersion: batch/v1
kind: Job
metadata:
  name: {{ include "airflow.fullname" . }}-sync-users
  labels:
    app: {{ include "airflow.labels.app" . }}
    component: sync-users
    chart: {{ include "airflow.labels.chart" . }}
    release: {{ .Release.Name }}
    heritage: {{ .Release.Service }}
    {{- if .Values.airflow.sync.labels }}
    {{- toYaml .Values.airflow.sync.labels | nindent 4 }}
    {{- end }}
  annotations:
    helm.sh/hook: post-install,post-upgrade
    helm.sh/hook-weight: "0"
    helm.sh/hook-delete-policy: before-hook-creation
    {{- if .Values.airflow.sync.annotations }}
    {{- toYaml .Values.airflow.sync.annotations | nindent 4 }}
    {{- end }}
spec:
  template:
    metadata:
      annotations:
        checksum/secret-config-envs: {{ include (print $.Template.BasePath "/config/secret-config-envs.yaml") . | sha256sum }}
        checksum/secret-local-settings: {{ include (print $.Template.BasePath "/config/secret-local-settings.yaml") . | sha256sum }}
        checksum/sync-users-script: {{ include "airflow.sync.sync_users.py" . | sha256sum }}
        {{- if .Values.airflow.podAnnotations }}
        {{- toYaml .Values.airflow.podAnnotations | nindent 8 }}
        {{- end }}
        {{- if .Values.airflow.sync.podAnnotations }}
        {{- toYaml .Values.airflow.sync.podAnnotations | nindent 8 }}
        {{- end }}
        {{- if .Values.airflow.sync.safeToEvict }}
        cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
        {{- end }}
      labels:
        app: {{ include "airflow.labels.app" . }}
        component: sync-users
        release: {{ .Release.Name }}
        {{- if .Values.airflow.sync.podLabels }}
        {{- toYaml .Values.airflow.sync.podLabels | nindent 8 }}
        {{- end }}
    spec:
      restartPolicy: OnFailure
      {{- if .Values.airflow.image.pullSecret }}
      imagePullSecrets:
        - name: {{ .Values.airflow.image.pullSecret }}
      {{- end }}
      {{- if $podNodeSelector }}
      nodeSelector:
        {{- $podNodeSelector | nindent 8 }}
      {{- end }}
      {{- if $podAffinity }}
      affinity:
        {{- $podAffinity | nindent 8 }}
      {{- end }}
      {{- if $podTolerations }}
      tolerations:
        {{- $podTolerations | nindent 8 }}
      {{- end }}
      {{- if $podSecurityContext }}
      securityContext:
        {{- $podSecurityContext | nindent 8 }}
      {{- end }}
      serviceAccountName: {{ include "airflow.serviceAccountName" . }}
      initContainers:
        {{- if $extraPipPackages }}
        {{- include "airflow.init_container.install_pip_packages" (dict "Release" .Release "Values" .Values "extraPipPackages" $extraPipPackages) | indent 8 }}
        {{- end }}
        {{- if .Values.dags.persistence.enabled  }}
        {{- include "airflow.container.dag_pvc_copy" (dict "Release" .Release "Values" .Values) | indent 8 }}
        {{ else }}
        ## git-sync is included so "airflow plugins" & "python packages" can be stored in the dags repo
        {{- include "airflow.container.git_sync" (dict "Release" .Release "Values" .Values "sync_one_time" "true") | indent 8 }}
        {{- end }}
        {{- include "airflow.init_container.check_db" (dict "Release" .Release "Values" .Values "volumeMounts" $volumeMounts) | indent 8 }}
        {{- include "airflow.init_container.wait_for_db_migrations" (dict "Release" .Release "Values" .Values "volumeMounts" $volumeMounts) | indent 8 }}
      containers:
        - name: sync-airflow-users
          {{- include "airflow.image" . | indent 10 }}
          resources:
            {{- toYaml .Values.airflow.sync.resources | nindent 12 }}
          envFrom:
            {{- include "airflow.envFrom" . | indent 12 }}
          env:
            {{- include "airflow.env" . | indent 12 }}
          command:
            {{- include "airflow.command" . | indent 12 }}
          args:
            - "python"
            - "-u"
            - "/mnt/scripts/sync_users.py"
          volumeMounts:
            {{- $volumeMounts | indent 12 }}
            - name: scripts
              mountPath: /mnt/scripts
              readOnly: true
            {{- if .Values.airflow.usersTemplates }}
            - name: templates
              mountPath: "/mnt/templates"
              readOnly: true
            {{- end }}
      volumes:
        {{- $volumes | indent 8 }}
        - name: scripts
          secret:
            secretName: {{ include "airflow.fullname" . }}-sync-users
        {{- if .Values.airflow.usersTemplates }}
        - name: templates
          projected:
            sources:
              {{- range $k, $v := .Values.airflow.usersTemplates }}
              {{- if not (regexMatch "^[a-zA-Z_][a-zA-Z0-9_]*$" $k) }}
              {{ required "each key in `airflow.usersTemplates` must match the regex: ^[a-zA-Z_][a-zA-Z0-9_]*$" nil }}
              {{- end }}
              {{- if eq ($v.kind | lower) "configmap" }}
              - configMap:
                  name: {{ $v.name | quote }}
                  items:
                    - key: {{ $v.key | quote }}
                      path: {{ $k | quote }}
              {{- else if eq ($v.kind | lower) "secret" }}
              - secret:
                  name: {{ $v.name | quote }}
                  items:
                    - key: {{ $v.key | quote }}
                      path: {{ $k | quote }}
              {{- else }}
              {{ required "each `kind` in `airflow.usersTemplates` must be one of ['configmap', 'secret']!" nil }}
              {{- end }}
              {{- end }}
        {{- end }}
{{- end }}
❯ difft file1.yml file2.yml             
file2.yml --- YAML
No syntactic changes.

The problem seems to have sth to do with YAML because when I rename the files as txt files the diff works correctly:

❯ difft file1.txt file2.txt
file2.txt --- Text
76         {{- if $extraPipPackages }}          76         {{- if $extraPipPackages }}
77         {{- include "airflow.init_container. 77         {{- include "airflow.init_container.i
.. install_pip_packages" (dict "Release" .Relea .. nstall_pip_packages" (dict "Release" .Release
.. se "Values" .Values "extraPipPackages" $extr ..  "Values" .Values "extraPipPackages" $extraPi
.. aPipPackages) | indent 8 }}                  .. pPackages) | indent 8 }}
78         {{- end }}                           78         {{- end }}
79         {{- if and (.Values.dags.gitSync.ena 79         {{- if .Values.dags.persistence.enabl
.. bled)  (not .Values.dags.persistence.enabled .. ed  }}
.. )  }}                                        .. 
..                                              80         {{- include "airflow.container.dag_pv
..                                              .. c_copy" (dict "Release" .Release "Values" .Va
..                                              .. lues) | indent 8 }}
..                                              81         {{ else }}
80         ## git-sync is included so "airflow  82         ## git-sync is included so "airflow p
.. plugins" & "python packages" can be stored i .. lugins" & "python packages" can be stored in 
.. n the dags repo                              .. the dags repo
81         {{- include "airflow.container.git_s 83         {{- include "airflow.container.git_sy
.. ync" (dict "Release" .Release "Values" .Valu .. nc" (dict "Release" .Release "Values" .Values
.. es "sync_one_time" "true") | indent 8 }}     ..  "sync_one_time" "true") | indent 8 }}
82         {{- end }}                           84         {{- end }}

karakanb avatar May 12 '22 23:05 karakanb

@karakanb your case looks like Helm not being valid YAML, see #377.

Wilfred avatar Oct 31 '22 07:10 Wilfred

Looking at the YAML parser, it has some suspiciously small integer types in the scanner:

struct Scanner {
  int16_t row;
  int16_t col;
  int16_t blk_imp_row;
  int16_t blk_imp_col;
  int16_t blk_imp_tab;
  vector<int16_t> ind_typ_stk;
  vector<int16_t> ind_len_stk;

  // temp
  int16_t end_row;
  int16_t end_col;
  int16_t cur_row;
  int16_t cur_col;
  int32_t cur_chr;
  int8_t sch_stt;

Wilfred avatar Oct 31 '22 07:10 Wilfred