xunit
xunit copied to clipboard
Xunit doesn't correctly serialize unicode strings in theory data
This is related to https://developercommunity.visualstudio.com/content/problem/696704/visual-studio-2019-1622-test-explorer-and-live-uni.html
Theories that include unicode characters in the theory data fail when run from VS. That happens as the serialization helper doesn't round trip unicode strings correctly.
The example below demonstrates the behaviour. The ClassWithUnicodeCharacters contains the test that fails the serialize/deserialize round trip.
class ClassWithUnicodeCharacters
{
public static IEnumerable<object[]> StringTestData = new[] { new object[] { "\uD800" } };
[Theory]
[MemberData("StringTestData")]
public void Test(string x) { }
}
I verified the behaviour by adding a new unit test to SerializationTests.cs and building xunit. Here's the unit test that I think should be passing. If the StringTestData contains regular characters such as "str" the test passes.
[Fact]
public static void TheoryWithUnicode()
{
var sourceProvider = new NullSourceInformationProvider();
var assemblyInfo = Reflector.Wrap(Assembly.GetExecutingAssembly());
var discoverer = new XunitTestFrameworkDiscoverer(assemblyInfo, sourceProvider, SpyMessageSink.Create());
var sink = new TestDiscoverySink();
discoverer.Find(typeof(ClassWithUnicodeCharacters).FullName, false, sink, TestFrameworkOptions.ForDiscovery());
sink.Finished.WaitOne();
var test = sink.TestCases[0];
var roundTripped = SerializationHelper.Deserialize<ITestCase>(SerializationHelper.Serialize(test));
Assert.Equal("\uD800", roundTripped.TestMethodArguments[0]);
}
Are there any workarounds while we wait for a solution?
The workaround today would be to disable theory pre-enumeration ([MemberData(..., DisableDiscoveryEnumeration = true)]
) for any problematic data.
I am facing this issue now with InlineData, it's oddly causing my test to fail on a remote machine but succeed locally. I'm using the é character in a parameter string. This still hasn't been fixed?
This seems like a framework/unicode issue because this test fails:
[Fact]
public static void ExampleDoesNotRoundtrip()
{
var i = "\ud800";
var d = Encoding.UTF8.GetString(Encoding.UTF8.GetBytes(i));
Assert.Equal(i, d);
}
If we run the original test with a data point that does roundtrip (e.g. `"\ua800") everything works out fine.
@bradwilson I suggest closing this issue.
There is a related discussion here: https://github.com/xunit/xunit/discussions/2626
Non-Unicode legal strings will get "mangled" when converted to Unicode during the serialization process because we convert to UTF-8. A single D800 is, by itself, not legal Unicode.
A second workaround is to use character arrays for non-Unicode data, and then convert them back into non-Unicode strings yourself in the test:
public static TheoryData<char[]> CharArrayTestData = new() { "\uD800".ToCharArray() };
[Theory]
[MemberData(nameof(CharArrayTestData))]
public void MyTest(char[] data)
{
var dataAsString = new string(data);
// ...
}
As noted in the discussion, I'm leaning towards "by design" for the existing behavior because of the serialization costs associated with us using character arrays full time (the size of the serialized data roughly doubles, assuming most of your string's characters would fit inside a single 8-bit value in UTF-8, which is true for the majority of Latin-based languages).
Closing as "by design".