difftastic
difftastic copied to clipboard
Failure to identify diff in YAML files if line with key is longer than 128 chars
Once a yaml has a line with a key that is longer than 128 characters (note that this might be the key itself, or indentation!), difft
fails to see any differences afterwards:
Inputs
file1.yaml
:
---
test:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa: 'a'
file2.yaml
:
---
test:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa: 'b'
fail: true
Run
$ difft file1.yaml file2.yaml
file2.yaml --- YAML
No syntactic changes.
Initial notes
- The YAML spec says:
simple keys are restricted to a single line and must not span more than 1024 stream characters (hence the need for the flow-key context). Note the 1024 character limit is in terms of Unicode characters rather than stream octets, and that it includes the separation following the key itself.
- pyyaml also seems to have trouble with 128 characters. Might be unrelated, but might be that there's some underlying common assumption. See https://github.com/yaml/pyyaml/issues/157
Versions
- OS: macOS Sierra 10.12.6
- Difftastic: 0.26.3
I'm not able to reproduce this. Could you attach the exact files you're having an issue with?
If I copy-paste your examples, I'm seeing this:
Difftastic uses purple to highlight missing position information. It's not ideal in this case, but I believe the YAML syntax is wrong (1 space indent on aaa...
, 2 space indent on fail
).
If I change fail: true
to use a 1 space indent, I get the correct result.
Sure thing, find both of them attached.
Given that GitHub didn't let me upload yaml
s, I renamed to csv
.... and when I did that, difft
correctly figured out the differences... so there's definitely something fishy going on with the yaml parsing ;)
Rename to yaml
before testing:

Thanks. This looks like a tree-sitter-yaml parsing issue: difftastic is getting exactly the same parse tree for both.
$ difft --dump-ts /tmp/file2.yaml
stream (0, 0) - (2, 128)
document (0, 0) - (2, 128)
--- (0, 0) - (0, 3) "---"
block_node (1, 0) - (2, 128)
block_mapping (1, 0) - (2, 128)
block_mapping_pair (1, 0) - (2, 128)
flow_node (1, 0) - (1, 4)
plain_scalar (1, 0) - (1, 4)
string_scalar (1, 0) - (1, 4) "test"
: (1, 4) - (1, 5) ":"
block_node (2, 2) - (2, 128)
block_mapping (2, 2) - (2, 128)
block_mapping_pair (2, 2) - (2, 128)
flow_node (2, 2) - (2, 127)
plain_scalar (2, 2) - (2, 127)
string_scalar (2, 2) - (2, 127) "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
: (2, 127) - (2, 128) ":"
it's missing everything after the long key name.
Another interesting thing that might hint you to what's going on: if you add a difference before the long key, everything works just fine again:
e.g.
file1.yaml
---
test1: true
test:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa: 'b'
fail: true
file2.yaml
---
test2: true
test:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa: 'a'

I am not really sure if this is the same issue, but I am having a similar trouble with comparing two helm resources:
file1.yml
{{- if and (.Values.airflow.users) (not .Values.airflow.usersUpdate) }}
{{- $podNodeSelector := include "airflow.podNodeSelector" (dict "Release" .Release "Values" .Values "nodeSelector" .Values.airflow.sync.nodeSelector) }}
{{- $podAffinity := include "airflow.podAffinity" (dict "Release" .Release "Values" .Values "affinity" .Values.airflow.sync.affinity) }}
{{- $podTolerations := include "airflow.podTolerations" (dict "Release" .Release "Values" .Values "tolerations" .Values.airflow.sync.tolerations) }}
{{- $podSecurityContext := include "airflow.podSecurityContext" (dict "Release" .Release "Values" .Values "securityContext" .Values.airflow.sync.securityContext) }}
{{- $extraPipPackages := .Values.airflow.extraPipPackages }}
{{- $volumeMounts := include "airflow.volumeMounts" (dict "Release" .Release "Values" .Values "extraPipPackages" $extraPipPackages) }}
{{- $volumes := include "airflow.volumes" (dict "Release" .Release "Values" .Values "extraPipPackages" $extraPipPackages) }}
apiVersion: batch/v1
kind: Job
metadata:
name: {{ include "airflow.fullname" . }}-sync-users
labels:
app: {{ include "airflow.labels.app" . }}
component: sync-users
chart: {{ include "airflow.labels.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
{{- if .Values.airflow.sync.labels }}
{{- toYaml .Values.airflow.sync.labels | nindent 4 }}
{{- end }}
annotations:
helm.sh/hook: post-install,post-upgrade
helm.sh/hook-weight: "0"
helm.sh/hook-delete-policy: before-hook-creation
{{- if .Values.airflow.sync.annotations }}
{{- toYaml .Values.airflow.sync.annotations | nindent 4 }}
{{- end }}
spec:
template:
metadata:
annotations:
checksum/secret-config-envs: {{ include (print $.Template.BasePath "/config/secret-config-envs.yaml") . | sha256sum }}
checksum/secret-local-settings: {{ include (print $.Template.BasePath "/config/secret-local-settings.yaml") . | sha256sum }}
checksum/sync-users-script: {{ include "airflow.sync.sync_users.py" . | sha256sum }}
{{- if .Values.airflow.podAnnotations }}
{{- toYaml .Values.airflow.podAnnotations | nindent 8 }}
{{- end }}
{{- if .Values.airflow.sync.podAnnotations }}
{{- toYaml .Values.airflow.sync.podAnnotations | nindent 8 }}
{{- end }}
{{- if .Values.airflow.sync.safeToEvict }}
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
{{- end }}
labels:
app: {{ include "airflow.labels.app" . }}
component: sync-users
release: {{ .Release.Name }}
{{- if .Values.airflow.sync.podLabels }}
{{- toYaml .Values.airflow.sync.podLabels | nindent 8 }}
{{- end }}
spec:
restartPolicy: OnFailure
{{- if .Values.airflow.image.pullSecret }}
imagePullSecrets:
- name: {{ .Values.airflow.image.pullSecret }}
{{- end }}
{{- if $podNodeSelector }}
nodeSelector:
{{- $podNodeSelector | nindent 8 }}
{{- end }}
{{- if $podAffinity }}
affinity:
{{- $podAffinity | nindent 8 }}
{{- end }}
{{- if $podTolerations }}
tolerations:
{{- $podTolerations | nindent 8 }}
{{- end }}
{{- if $podSecurityContext }}
securityContext:
{{- $podSecurityContext | nindent 8 }}
{{- end }}
serviceAccountName: {{ include "airflow.serviceAccountName" . }}
initContainers:
{{- if $extraPipPackages }}
{{- include "airflow.init_container.install_pip_packages" (dict "Release" .Release "Values" .Values "extraPipPackages" $extraPipPackages) | indent 8 }}
{{- end }}
{{- if and (.Values.dags.gitSync.enabled) (not .Values.dags.persistence.enabled) }}
## git-sync is included so "airflow plugins" & "python packages" can be stored in the dags repo
{{- include "airflow.container.git_sync" (dict "Release" .Release "Values" .Values "sync_one_time" "true") | indent 8 }}
{{- end }}
{{- include "airflow.init_container.check_db" (dict "Release" .Release "Values" .Values "volumeMounts" $volumeMounts) | indent 8 }}
{{- include "airflow.init_container.wait_for_db_migrations" (dict "Release" .Release "Values" .Values "volumeMounts" $volumeMounts) | indent 8 }}
containers:
- name: sync-airflow-users
{{- include "airflow.image" . | indent 10 }}
resources:
{{- toYaml .Values.airflow.sync.resources | nindent 12 }}
envFrom:
{{- include "airflow.envFrom" . | indent 12 }}
env:
{{- include "airflow.env" . | indent 12 }}
command:
{{- include "airflow.command" . | indent 12 }}
args:
- "python"
- "-u"
- "/mnt/scripts/sync_users.py"
volumeMounts:
{{- $volumeMounts | indent 12 }}
- name: scripts
mountPath: /mnt/scripts
readOnly: true
{{- if .Values.airflow.usersTemplates }}
- name: templates
mountPath: "/mnt/templates"
readOnly: true
{{- end }}
volumes:
{{- $volumes | indent 8 }}
- name: scripts
secret:
secretName: {{ include "airflow.fullname" . }}-sync-users
{{- if .Values.airflow.usersTemplates }}
- name: templates
projected:
sources:
{{- range $k, $v := .Values.airflow.usersTemplates }}
{{- if not (regexMatch "^[a-zA-Z_][a-zA-Z0-9_]*$" $k) }}
{{ required "each key in `airflow.usersTemplates` must match the regex: ^[a-zA-Z_][a-zA-Z0-9_]*$" nil }}
{{- end }}
{{- if eq ($v.kind | lower) "configmap" }}
- configMap:
name: {{ $v.name | quote }}
items:
- key: {{ $v.key | quote }}
path: {{ $k | quote }}
{{- else if eq ($v.kind | lower) "secret" }}
- secret:
name: {{ $v.name | quote }}
items:
- key: {{ $v.key | quote }}
path: {{ $k | quote }}
{{- else }}
{{ required "each `kind` in `airflow.usersTemplates` must be one of ['configmap', 'secret']!" nil }}
{{- end }}
{{- end }}
{{- end }}
{{- end }}
file2.yml
{{- if and (.Values.airflow.users) (not .Values.airflow.usersUpdate) }}
{{- $podNodeSelector := include "airflow.podNodeSelector" (dict "Release" .Release "Values" .Values "nodeSelector" .Values.airflow.sync.nodeSelector) }}
{{- $podAffinity := include "airflow.podAffinity" (dict "Release" .Release "Values" .Values "affinity" .Values.airflow.sync.affinity) }}
{{- $podTolerations := include "airflow.podTolerations" (dict "Release" .Release "Values" .Values "tolerations" .Values.airflow.sync.tolerations) }}
{{- $podSecurityContext := include "airflow.podSecurityContext" (dict "Release" .Release "Values" .Values "securityContext" .Values.airflow.sync.securityContext) }}
{{- $extraPipPackages := .Values.airflow.extraPipPackages }}
{{- $volumeMounts := include "airflow.volumeMounts" (dict "Release" .Release "Values" .Values "extraPipPackages" $extraPipPackages) }}
{{- $volumes := include "airflow.volumes" (dict "Release" .Release "Values" .Values "extraPipPackages" $extraPipPackages) }}
apiVersion: batch/v1
kind: Job
metadata:
name: {{ include "airflow.fullname" . }}-sync-users
labels:
app: {{ include "airflow.labels.app" . }}
component: sync-users
chart: {{ include "airflow.labels.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
{{- if .Values.airflow.sync.labels }}
{{- toYaml .Values.airflow.sync.labels | nindent 4 }}
{{- end }}
annotations:
helm.sh/hook: post-install,post-upgrade
helm.sh/hook-weight: "0"
helm.sh/hook-delete-policy: before-hook-creation
{{- if .Values.airflow.sync.annotations }}
{{- toYaml .Values.airflow.sync.annotations | nindent 4 }}
{{- end }}
spec:
template:
metadata:
annotations:
checksum/secret-config-envs: {{ include (print $.Template.BasePath "/config/secret-config-envs.yaml") . | sha256sum }}
checksum/secret-local-settings: {{ include (print $.Template.BasePath "/config/secret-local-settings.yaml") . | sha256sum }}
checksum/sync-users-script: {{ include "airflow.sync.sync_users.py" . | sha256sum }}
{{- if .Values.airflow.podAnnotations }}
{{- toYaml .Values.airflow.podAnnotations | nindent 8 }}
{{- end }}
{{- if .Values.airflow.sync.podAnnotations }}
{{- toYaml .Values.airflow.sync.podAnnotations | nindent 8 }}
{{- end }}
{{- if .Values.airflow.sync.safeToEvict }}
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
{{- end }}
labels:
app: {{ include "airflow.labels.app" . }}
component: sync-users
release: {{ .Release.Name }}
{{- if .Values.airflow.sync.podLabels }}
{{- toYaml .Values.airflow.sync.podLabels | nindent 8 }}
{{- end }}
spec:
restartPolicy: OnFailure
{{- if .Values.airflow.image.pullSecret }}
imagePullSecrets:
- name: {{ .Values.airflow.image.pullSecret }}
{{- end }}
{{- if $podNodeSelector }}
nodeSelector:
{{- $podNodeSelector | nindent 8 }}
{{- end }}
{{- if $podAffinity }}
affinity:
{{- $podAffinity | nindent 8 }}
{{- end }}
{{- if $podTolerations }}
tolerations:
{{- $podTolerations | nindent 8 }}
{{- end }}
{{- if $podSecurityContext }}
securityContext:
{{- $podSecurityContext | nindent 8 }}
{{- end }}
serviceAccountName: {{ include "airflow.serviceAccountName" . }}
initContainers:
{{- if $extraPipPackages }}
{{- include "airflow.init_container.install_pip_packages" (dict "Release" .Release "Values" .Values "extraPipPackages" $extraPipPackages) | indent 8 }}
{{- end }}
{{- if .Values.dags.persistence.enabled }}
{{- include "airflow.container.dag_pvc_copy" (dict "Release" .Release "Values" .Values) | indent 8 }}
{{ else }}
## git-sync is included so "airflow plugins" & "python packages" can be stored in the dags repo
{{- include "airflow.container.git_sync" (dict "Release" .Release "Values" .Values "sync_one_time" "true") | indent 8 }}
{{- end }}
{{- include "airflow.init_container.check_db" (dict "Release" .Release "Values" .Values "volumeMounts" $volumeMounts) | indent 8 }}
{{- include "airflow.init_container.wait_for_db_migrations" (dict "Release" .Release "Values" .Values "volumeMounts" $volumeMounts) | indent 8 }}
containers:
- name: sync-airflow-users
{{- include "airflow.image" . | indent 10 }}
resources:
{{- toYaml .Values.airflow.sync.resources | nindent 12 }}
envFrom:
{{- include "airflow.envFrom" . | indent 12 }}
env:
{{- include "airflow.env" . | indent 12 }}
command:
{{- include "airflow.command" . | indent 12 }}
args:
- "python"
- "-u"
- "/mnt/scripts/sync_users.py"
volumeMounts:
{{- $volumeMounts | indent 12 }}
- name: scripts
mountPath: /mnt/scripts
readOnly: true
{{- if .Values.airflow.usersTemplates }}
- name: templates
mountPath: "/mnt/templates"
readOnly: true
{{- end }}
volumes:
{{- $volumes | indent 8 }}
- name: scripts
secret:
secretName: {{ include "airflow.fullname" . }}-sync-users
{{- if .Values.airflow.usersTemplates }}
- name: templates
projected:
sources:
{{- range $k, $v := .Values.airflow.usersTemplates }}
{{- if not (regexMatch "^[a-zA-Z_][a-zA-Z0-9_]*$" $k) }}
{{ required "each key in `airflow.usersTemplates` must match the regex: ^[a-zA-Z_][a-zA-Z0-9_]*$" nil }}
{{- end }}
{{- if eq ($v.kind | lower) "configmap" }}
- configMap:
name: {{ $v.name | quote }}
items:
- key: {{ $v.key | quote }}
path: {{ $k | quote }}
{{- else if eq ($v.kind | lower) "secret" }}
- secret:
name: {{ $v.name | quote }}
items:
- key: {{ $v.key | quote }}
path: {{ $k | quote }}
{{- else }}
{{ required "each `kind` in `airflow.usersTemplates` must be one of ['configmap', 'secret']!" nil }}
{{- end }}
{{- end }}
{{- end }}
{{- end }}
❯ difft file1.yml file2.yml
file2.yml --- YAML
No syntactic changes.
The problem seems to have sth to do with YAML because when I rename the files as txt
files the diff works correctly:
❯ difft file1.txt file2.txt
file2.txt --- Text
76 {{- if $extraPipPackages }} 76 {{- if $extraPipPackages }}
77 {{- include "airflow.init_container. 77 {{- include "airflow.init_container.i
.. install_pip_packages" (dict "Release" .Relea .. nstall_pip_packages" (dict "Release" .Release
.. se "Values" .Values "extraPipPackages" $extr .. "Values" .Values "extraPipPackages" $extraPi
.. aPipPackages) | indent 8 }} .. pPackages) | indent 8 }}
78 {{- end }} 78 {{- end }}
79 {{- if and (.Values.dags.gitSync.ena 79 {{- if .Values.dags.persistence.enabl
.. bled) (not .Values.dags.persistence.enabled .. ed }}
.. ) }} ..
.. 80 {{- include "airflow.container.dag_pv
.. .. c_copy" (dict "Release" .Release "Values" .Va
.. .. lues) | indent 8 }}
.. 81 {{ else }}
80 ## git-sync is included so "airflow 82 ## git-sync is included so "airflow p
.. plugins" & "python packages" can be stored i .. lugins" & "python packages" can be stored in
.. n the dags repo .. the dags repo
81 {{- include "airflow.container.git_s 83 {{- include "airflow.container.git_sy
.. ync" (dict "Release" .Release "Values" .Valu .. nc" (dict "Release" .Release "Values" .Values
.. es "sync_one_time" "true") | indent 8 }} .. "sync_one_time" "true") | indent 8 }}
82 {{- end }} 84 {{- end }}
@karakanb your case looks like Helm not being valid YAML, see #377.
Looking at the YAML parser, it has some suspiciously small integer types in the scanner:
struct Scanner {
int16_t row;
int16_t col;
int16_t blk_imp_row;
int16_t blk_imp_col;
int16_t blk_imp_tab;
vector<int16_t> ind_typ_stk;
vector<int16_t> ind_len_stk;
// temp
int16_t end_row;
int16_t end_col;
int16_t cur_row;
int16_t cur_col;
int32_t cur_chr;
int8_t sch_stt;