serde icon indicating copy to clipboard operation
serde copied to clipboard

Fixed-Width Text File Support

Open TheBrambleShark opened this issue 2 months ago • 9 comments

Rust's Serde already implements this: https://docs.rs/fixed_width/latest/fixed_width/

I'd be willing to take a stab at this. I keep running into these types of files and while deserializing using ranges in a constructor that accepts a string is fine, it can be time consuming for large types with lots of fields.

TheBrambleShark avatar Oct 27 '25 14:10 TheBrambleShark

Seems like a cool idea — do you have a particular implementation in mind?

agocke avatar Oct 27 '25 15:10 agocke

Sample Input:

John      Doe       1234502301011970
Janette   Doe       6789002201011969

Model:

// Unlike JSON/XML, the explicit field length cannot be discovered based on the layout of the file.
public sealed class FixedFieldInfoAttribute(int offset, int length) : Attribute
{
    public int Offset => offset;
    public int Length => length;
}

[GenerateSerde]
public record Person
(
    [property: FixedFieldInfo(0, 10)] string FirstName,
    [property: FixedFieldInfo(10, 10)] string LastName,
    [property: FixedFieldInfo(20, 5)] int EmployeeId,
    [property: FixedFieldInfo(25, 3)] int Age,
    [property: FixedFieldInfo(28, 8)] DateOnly Birthday
);

Usage:


// String fields are always trimmed. Parsed fields like int and Date/Time have their input trimmed.
// We should probably have some sort of attribute for Date/Time types so that we can specify their expected format to pass to `ParseExact()`.
IEnumerable<Person> people = File.ReadAllLines("input.txt").Select(FixedWidthDeserializer.Deserialize<Person>);

foreach (var person in people)
{
    Console.WriteLine(person);
}

// Person { FirstName: John, LastName: Doe, EmployeeId: 12345, Age: 23, Birthday: 01011970 }
// Person { FirstName: Janette, LastName: Doe, EmployeeId: 67890, Age: 22, Birthday: 01011969 }

TheBrambleShark avatar Oct 27 '25 15:10 TheBrambleShark

I do wonder if a range parameter would be better for the attribute, to use like [property: FixedFieldInfo([0..10]) string FirstName might be more consumer friendly, though most engines I see that consume fixed-width text files expect an offset and length like the model above.

TheBrambleShark avatar Oct 27 '25 15:10 TheBrambleShark

In theory a custom proxy should be able to do this, and you probably wouldn’t need custom syntax. Interesting that rust does it by a trait, haven’t seen that before.

agocke avatar Oct 27 '25 15:10 agocke

I've been experimenting with an implementation using a proxy. So far, only serialization exists, but you can see a working example on my .NET Lab.

Is this along the lines of what you would imagine or would you take a different approach?

TheBrambleShark avatar Oct 28 '25 14:10 TheBrambleShark

I've changed things up in my fork. Following the json example, I've built a serializer/deserializer, plus a few supporting types.

Everything should be working on the serialization side, but I'm running into issues with getting test versions of the serde package to actually run their source generators, so I get CS0311 at call sites.

[GenerateSerde]
public partial record class Person
(
    [property: FixedFieldInfo(0, 10)] string FirstName,
    [property: FixedFieldInfo(10, 10)] string LastName,
    [property: FixedFieldInfo(20, 5)] int EmployeeId,
    [property: FixedFieldInfo(25, 8, "yyyyMMdd")] DateTime EmploymentStartDate,
    [property: FixedFieldInfo(33, 8, "yyyyMMdd")] DateTime Birthday
);

Person[] people =
[
    new ("John", "Doe", 12345, new DateTime(1990, 1, 1), new DateTime(1970, 1, 1)),
    new ("Janette", "Doe", 67890, new DateTime(1989, 1, 1), new DateTime(1969, 1, 1))
];

Console.WriteLine("Serialized Output:");
string serialized = string.Join(Environment.NewLine, SerializeMany(people)); // CS0311
Console.WriteLine(serialized);

private static IEnumerable<string> SerializeMany<T>(IEnumerable<T> values)
    where T : ISerializeProvider<T>
{
    foreach (var value in values)
    {
        yield return FixedWidthSerializer.Serialize(value);
    }
}

CS0311: The type 'Person' cannot be used as type parameter 'T' in the generic type or method 'Program.SerializeMany<T>(IEnumerable<T>)'. There is no implicit reference conversion from 'Person' to 'Serde.ISerializeProvider<Person>'.

Once I get the generation issue sorted (I'm very confident it's just something on my end) and wrap up the deserialization portion, I'll submit it as a PR.

TheBrambleShark avatar Oct 29 '25 20:10 TheBrambleShark

The FixedFieldInfoAttribute accepts an offset, field length, and an optional format parameter.

If the format parameter is applied and the field type implements IFormattable, it gets passed to ToString(format). It also gets special handling for boolean values, since it's not uncommon for "Y", "T", or "1" to be considered truthy and "N", "F", or "0" to be considered falsy.

With booleans, it expects a format string in the form of "true/false", where the values on either side of the slash represent the text to serialize to or deserialize from. If no format string is provided, it just uses bool.ToString().

I'll make sure all of this is added to the documentation should we get to the point where the PR is accepted.

TheBrambleShark avatar Oct 29 '25 20:10 TheBrambleShark

My original expectation was that you could use a proxy like in https://github.com/serdedotnet/serde/blob/main/samples/ProxySerialize.cs

The problem is that you can't specify the parameters inside the attribute, because attributes are only allowed to have typeof expressions, not full invocations. I wonder if the fix is adding a way to pass parameters to the proxy via attributes. Basically, the proxy type would look like:

partial record FixedWidthString
    // Specify that this proxy is a provider for the FixedWidthString Serde object,
    // which is used to serialize and deserialize the string type.
    : ISerdeProvider<VersionProxy, VersionSerdeObj, Version>
{
    public static VersionSerdeObj Instance { get; } = new VersionSerdeObj();
}

// Serde object for the string type
sealed partial class FixedWidthStringSerdeObj(int len) : ISerde<string>
{
    public ISerdeInfo SerdeInfo => StringProxy.SerdeInfo;

    public void Serialize(string version, ISerializer serializer)
    {
        if (version.Length > len) throw new InvalidOperationException("Too long");
        serializer.SerializeString(version);
    }

    public string Deserialize(IDeserializer deserializer)
    {
        var str = deserializer.ReadString();
        if (str.Length > len) throw new InvalidOperationException("Too long");
        return str;
    }
}

Then to use it maybe we would have

class MyType
{
    [SerdeMemberOptions(Proxy = typeof(FixedWidthStringSerdeObj), ProxyParameters = new[] { ... })]
    public string F { get; init; }
}

agocke avatar Oct 29 '25 20:10 agocke

My concern with this solution is that every property of this type, except those ignored by (de)serialization, will require a very lengthy SerdeMemberOptionsAttribute annotation. The work around for this is one of two options:

  1. Define a generic SerdeObj, like FixedWidthSerdeObj<T> where T is the type you want to (de)serialize, then ignore the serializer/deserializer parameters in the corresponding method and serialize/deserialize directly there. Implementors would be required to implement the generic object with a static type, so something like public class PersonSerdeObject : FixedWidthSerdeObject<Person>. This is the least amount of effort, but leads to poor discovery because you would need to consume it through something like JsonSerializer.
  2. Define a serializer/deserializer specifically for FixedWidth files. This is a bit more complex and a lot more work, but the end result is better discoverability.

Did I miss anything?

TheBrambleShark avatar Oct 30 '25 14:10 TheBrambleShark