Fixed-Width Text File Support
Rust's Serde already implements this: https://docs.rs/fixed_width/latest/fixed_width/
I'd be willing to take a stab at this. I keep running into these types of files and while deserializing using ranges in a constructor that accepts a string is fine, it can be time consuming for large types with lots of fields.
Seems like a cool idea — do you have a particular implementation in mind?
Sample Input:
John Doe 1234502301011970
Janette Doe 6789002201011969
Model:
// Unlike JSON/XML, the explicit field length cannot be discovered based on the layout of the file.
public sealed class FixedFieldInfoAttribute(int offset, int length) : Attribute
{
public int Offset => offset;
public int Length => length;
}
[GenerateSerde]
public record Person
(
[property: FixedFieldInfo(0, 10)] string FirstName,
[property: FixedFieldInfo(10, 10)] string LastName,
[property: FixedFieldInfo(20, 5)] int EmployeeId,
[property: FixedFieldInfo(25, 3)] int Age,
[property: FixedFieldInfo(28, 8)] DateOnly Birthday
);
Usage:
// String fields are always trimmed. Parsed fields like int and Date/Time have their input trimmed.
// We should probably have some sort of attribute for Date/Time types so that we can specify their expected format to pass to `ParseExact()`.
IEnumerable<Person> people = File.ReadAllLines("input.txt").Select(FixedWidthDeserializer.Deserialize<Person>);
foreach (var person in people)
{
Console.WriteLine(person);
}
// Person { FirstName: John, LastName: Doe, EmployeeId: 12345, Age: 23, Birthday: 01011970 }
// Person { FirstName: Janette, LastName: Doe, EmployeeId: 67890, Age: 22, Birthday: 01011969 }
I do wonder if a range parameter would be better for the attribute, to use like [property: FixedFieldInfo([0..10]) string FirstName might be more consumer friendly, though most engines I see that consume fixed-width text files expect an offset and length like the model above.
In theory a custom proxy should be able to do this, and you probably wouldn’t need custom syntax. Interesting that rust does it by a trait, haven’t seen that before.
I've been experimenting with an implementation using a proxy. So far, only serialization exists, but you can see a working example on my .NET Lab.
Is this along the lines of what you would imagine or would you take a different approach?
I've changed things up in my fork. Following the json example, I've built a serializer/deserializer, plus a few supporting types.
Everything should be working on the serialization side, but I'm running into issues with getting test versions of the serde package to actually run their source generators, so I get CS0311 at call sites.
[GenerateSerde]
public partial record class Person
(
[property: FixedFieldInfo(0, 10)] string FirstName,
[property: FixedFieldInfo(10, 10)] string LastName,
[property: FixedFieldInfo(20, 5)] int EmployeeId,
[property: FixedFieldInfo(25, 8, "yyyyMMdd")] DateTime EmploymentStartDate,
[property: FixedFieldInfo(33, 8, "yyyyMMdd")] DateTime Birthday
);
Person[] people =
[
new ("John", "Doe", 12345, new DateTime(1990, 1, 1), new DateTime(1970, 1, 1)),
new ("Janette", "Doe", 67890, new DateTime(1989, 1, 1), new DateTime(1969, 1, 1))
];
Console.WriteLine("Serialized Output:");
string serialized = string.Join(Environment.NewLine, SerializeMany(people)); // CS0311
Console.WriteLine(serialized);
private static IEnumerable<string> SerializeMany<T>(IEnumerable<T> values)
where T : ISerializeProvider<T>
{
foreach (var value in values)
{
yield return FixedWidthSerializer.Serialize(value);
}
}
CS0311: The type 'Person' cannot be used as type parameter 'T' in the generic type or method 'Program.SerializeMany<T>(IEnumerable<T>)'. There is no implicit reference conversion from 'Person' to 'Serde.ISerializeProvider<Person>'.
Once I get the generation issue sorted (I'm very confident it's just something on my end) and wrap up the deserialization portion, I'll submit it as a PR.
The FixedFieldInfoAttribute accepts an offset, field length, and an optional format parameter.
If the format parameter is applied and the field type implements IFormattable, it gets passed to ToString(format). It also gets special handling for boolean values, since it's not uncommon for "Y", "T", or "1" to be considered truthy and "N", "F", or "0" to be considered falsy.
With booleans, it expects a format string in the form of "true/false", where the values on either side of the slash represent the text to serialize to or deserialize from. If no format string is provided, it just uses bool.ToString().
I'll make sure all of this is added to the documentation should we get to the point where the PR is accepted.
My original expectation was that you could use a proxy like in https://github.com/serdedotnet/serde/blob/main/samples/ProxySerialize.cs
The problem is that you can't specify the parameters inside the attribute, because attributes are only allowed to have typeof expressions, not full invocations. I wonder if the fix is adding a way to pass parameters to the proxy via attributes. Basically, the proxy type would look like:
partial record FixedWidthString
// Specify that this proxy is a provider for the FixedWidthString Serde object,
// which is used to serialize and deserialize the string type.
: ISerdeProvider<VersionProxy, VersionSerdeObj, Version>
{
public static VersionSerdeObj Instance { get; } = new VersionSerdeObj();
}
// Serde object for the string type
sealed partial class FixedWidthStringSerdeObj(int len) : ISerde<string>
{
public ISerdeInfo SerdeInfo => StringProxy.SerdeInfo;
public void Serialize(string version, ISerializer serializer)
{
if (version.Length > len) throw new InvalidOperationException("Too long");
serializer.SerializeString(version);
}
public string Deserialize(IDeserializer deserializer)
{
var str = deserializer.ReadString();
if (str.Length > len) throw new InvalidOperationException("Too long");
return str;
}
}
Then to use it maybe we would have
class MyType
{
[SerdeMemberOptions(Proxy = typeof(FixedWidthStringSerdeObj), ProxyParameters = new[] { ... })]
public string F { get; init; }
}
My concern with this solution is that every property of this type, except those ignored by (de)serialization, will require a very lengthy SerdeMemberOptionsAttribute annotation. The work around for this is one of two options:
- Define a generic SerdeObj, like
FixedWidthSerdeObj<T>where T is the type you want to (de)serialize, then ignore the serializer/deserializer parameters in the corresponding method and serialize/deserialize directly there. Implementors would be required to implement the generic object with a static type, so something likepublic class PersonSerdeObject : FixedWidthSerdeObject<Person>. This is the least amount of effort, but leads to poor discovery because you would need to consume it through something like JsonSerializer. - Define a serializer/deserializer specifically for FixedWidth files. This is a bit more complex and a lot more work, but the end result is better discoverability.
Did I miss anything?