Reqnroll icon indicating copy to clipboard operation
Reqnroll copied to clipboard

Inconsistent code file encoding in this repo

Open gasparnagy opened this issue 5 months ago • 14 comments

Reqnroll Version

n/a

Which test runner are you using?

MSTest

Test Runner Version Number

n/a

.NET Implementation

.NET 9.0

Test Execution Method

Visual Studio Test Explorer

Content of reqnroll.json configuration file

No response

Issue Description

Just noticed that some of the diffs are not displaying because it things that the file is binary. For me it seemed that the problem is that it was Unicode, but I also checked a few other files, and some was UTF-8 (without BOM), some was UTF-8 (with BOM).

Somehow we would need to ensure in CI that the file encoding is consistent.

Is there an easy/standard way to do that?

Steps to Reproduce

See for example: https://github.com/reqnroll/Reqnroll/pull/725/commits/a3107def71974808092448fa42d1725513093d4f

Link to Repro Project

No response

gasparnagy avatar Aug 07 '25 17:08 gasparnagy

Is the only difference with and without BOM? They all are UTF8?

Also curious, how did you do the check?

304NotModified avatar Aug 07 '25 18:08 304NotModified

Is the only difference with and without BOM? They all are UTF8?

No sorry. I wasn't clear. The problematic files (treated as binary) were Unicode encoded (URF-16).

But when I checked random other files, I found some that is UTF8 with BOM (e.g. Reqnroll\Formatters\ExecutionTracking\StepTrackerFactory.cs) and some are just URF8 without BOM (e.g. Reqnroll\ScenarioContext.cs).

Also curious, how did you do the check?

Opened them with Notepad2... Not very scalable. 😉

Image

gasparnagy avatar Aug 07 '25 18:08 gasparnagy

Related to #718?

gasparnagy avatar Aug 07 '25 19:08 gasparnagy

Related to #718?

I think just coincidence. Or I missed it's character encoding week 😆

304NotModified avatar Aug 07 '25 20:08 304NotModified

@gasparnagy should we really label this a bug?

304NotModified avatar Aug 07 '25 20:08 304NotModified

@gasparnagy should we really label this a bug?

No, that's a mistake.

gasparnagy avatar Aug 07 '25 20:08 gasparnagy

Related to #718?

I think just coincidence. Or I missed it's character encoding week 😆

😊 My consideration was: @clrudolphi had problems with his VS to save the files to utf8, and both the utf-16 and the utf8-bom files were related to Cucumber Messages. Of course this still might be a coincidence.

In any case, with this issue, I would like to find a way to check the consistent encoding in CI. So that we get an early notification if something is wrong.

gasparnagy avatar Aug 07 '25 20:08 gasparnagy

What is not valid in your opinion? The BOM of UTF8 is optional. Some editors write it, others don't. And ASCII is a subset of UTF8 (without BOM).

I think we should prevent UTF16, as it's not needed - but we will see it in the PR ;) Windows encodings is also something we should prevent (like Windows 1252 AKA ANSI).

Detecting character encoding is unfortunately hard and sometimes nothing more than an educated guess :)

304NotModified avatar Aug 07 '25 20:08 304NotModified

I have ran a Bash script to get the encodings of all files, it's using file -i

It checked all the files (1682) and found 6 files with UTF16LE.

Reqnroll/Formatters/ExecutionTracking/PickleExecutionTracker.cs Reqnroll/Formatters/ExecutionTracking/PickleExecutionTrackerFactory.cs Reqnroll/Formatters/ExecutionTracking/TestCaseExecutionTracker.cs Reqnroll/Formatters/RuntimeSupport/TraceListenerFormatterLog.cs Tests/Reqnroll.RuntimeTests/Formatters/ExecutionTracking/HookStepExecutionTrackerTests.cs Tests/Reqnroll.RuntimeTests/Formatters/ExecutionTracking/OrderFixingMessagePublisherTests.cs

All other files are ASCII (subset of UTF8) of UTF8

Script used, generated by LLM

#!/bin/bash

# Function to get encoding of a file
get_encoding() {
    local file="$1"
    # Get encoding using the `file` command
    encoding=$(file -i "$file" | awk -F "=" '{print $2}')
    echo "${file#./} - Encoding: $encoding"
}

# Find all files in the current directory and subdirectories (recursively),
# excluding `bin` and `obj` directories and hidden directories
find . -type f ! -path '*/.*/*' ! -path '*/bin/*' ! -path '*/obj/*' | while read -r file; do
    # Check if the file is a text file
    if file "$file" | grep -q 'text'; then
        get_encoding "$file"
    fi
done

304NotModified avatar Aug 07 '25 21:08 304NotModified

See https://github.com/reqnroll/Reqnroll/pull/729 for the fix of the 6 files.

304NotModified avatar Aug 07 '25 21:08 304NotModified

~This looks like the solution~

~working-tree-encoding in .gitattributes~

~https://git-scm.com/docs/gitattributes#_working_tree_encoding~

update: nope, that is the encoding in your working tree, so local. Not in the repo

304NotModified avatar Aug 07 '25 21:08 304NotModified

Apologies team. This was my fault. Somehow my VisualStudio editor was configured for UTF-16.

Thanks @304NotModified for cleaning up my mess.

clrudolphi avatar Aug 07 '25 21:08 clrudolphi

Somehow my VisualStudio editor was configured for UTF-16.

Was this a test for https://github.com/reqnroll/Reqnroll/issues/718? 👼

304NotModified avatar Aug 07 '25 21:08 304NotModified

So we missed that in a review, I see them in https://github.com/reqnroll/Reqnroll/pull/685 now.

There is too bad not a Github App for this check.

304NotModified avatar Aug 07 '25 21:08 304NotModified