serde icon indicating copy to clipboard operation
serde copied to clipboard

Implement support for FixedWidth Text Files

Open TheBrambleShark opened this issue 2 months ago • 5 comments

Fixes #276

The FixedWidthReader does not implement IByteReader as I didn't feel it was particularly necessary for this type. Instead, it works directly on the text of the line.

~~This does mean that I had no idea how to implement TryReadIndexWithName(ISerdeInfo). Any guidance here would be appreciated, even if that's changing FixedWidthReader to implement IByteReader.~~

The FixedWidthWriter is pretty straightforward- it just writes whatever object is passed to it, conditionally using the format string. When IFixedWidthSerializer.End(ISerdeInfo info) is called, a new line is written to the text writer. The intent is this can be chained to produce fixed width documents with one record on each line.

Overall I'm pretty happy with the implementation, but I'm sure there are things that I can clean up, so reviews are definitely appreciated.

The change to Directory.Build.props was to facilitate testing in my own assemblies, ~~but source generators don't appear to run so types annoted with GenerateSerdeAttribute don't end up implementing the interfaces implied by that annotation. I can change that to whatever we need before merging.~~


I've fixed the source generation and implemented TryReadIndexWithName(ISerdeInfo).

Only thing left is tests, though I'm not sure what specific fixed-width tests I should add.

TheBrambleShark avatar Oct 30 '25 19:10 TheBrambleShark

Thanks I’ll take a look this weekend

agocke avatar Oct 31 '25 15:10 agocke

Ah, I completely misunderstood the initial intent of this feature! I thought it was about fixed-width fields, not fixed-width file format. This makes so much more sense now.

The general approach makes sense -- a custom ISerializer and IDeserializer implementation. I have yet to look at the details.

agocke avatar Nov 02 '25 07:11 agocke

This looks like it's almost complete -- you implemented it pretty much how I expected:

  1. Write a new Deserializer implementation for a custom format.
  2. Create a new entry point for that format's Serialize/Deserialize methods.
  3. Add a new attribute that lets the type author specify the width of the field a. Your particular implementation is a little surprising to me -- I would have though the field length would be in bytes, not characters.
  4. On Serialize/Deserialize, extract the attributes needed for each field from the ISerdeInfo.
  5. For deserialization, just grab a span and deserialize from that.

I agree that the main thing left is tests. Also, I'm not sure this is the right place to put this. Generally I've split all non-JSON formats into their own repo, e.g. https://github.com/serdedotnet/serde.msgpack. Is there a particular reason that you need to put it in this repo? Since it doesn't require any source generation changes, I think it could live in its own NuGet package that people pull in as-necessary.

agocke avatar Nov 02 '25 23:11 agocke

Your particular implementation is a little surprising to me -- I would have though the field length would be in bytes, not characters.

This is due to how this type will be used. Implementers will know the character width of the column, likely by opening the file in a text editor and measuring it. To then need to convert that into a number of bytes would be tedious and would likely lead to comments specifying the field width in characters.

I would be ok with adding an alternate attribute that handles field length in bytes.

Is there a particular reason that you need to put it in this repo?

Earlier implementations of this relied on internal types, but that is no longer the case. If you'd like to create a repo for this library I can PR against that instead.

TheBrambleShark avatar Nov 03 '25 14:11 TheBrambleShark

Created a new repo here: https://github.com/serdedotnet/fixed-width

agocke avatar Nov 04 '25 18:11 agocke