Implement support for FixedWidth Text Files
Fixes #276
The FixedWidthReader does not implement IByteReader as I didn't feel it was particularly necessary for this type. Instead, it works directly on the text of the line.
~~This does mean that I had no idea how to implement TryReadIndexWithName(ISerdeInfo). Any guidance here would be appreciated, even if that's changing FixedWidthReader to implement IByteReader.~~
The FixedWidthWriter is pretty straightforward- it just writes whatever object is passed to it, conditionally using the format string. When IFixedWidthSerializer.End(ISerdeInfo info) is called, a new line is written to the text writer. The intent is this can be chained to produce fixed width documents with one record on each line.
Overall I'm pretty happy with the implementation, but I'm sure there are things that I can clean up, so reviews are definitely appreciated.
The change to Directory.Build.props was to facilitate testing in my own assemblies, ~~but source generators don't appear to run so types annoted with GenerateSerdeAttribute don't end up implementing the interfaces implied by that annotation. I can change that to whatever we need before merging.~~
I've fixed the source generation and implemented TryReadIndexWithName(ISerdeInfo).
Only thing left is tests, though I'm not sure what specific fixed-width tests I should add.
Thanks I’ll take a look this weekend
Ah, I completely misunderstood the initial intent of this feature! I thought it was about fixed-width fields, not fixed-width file format. This makes so much more sense now.
The general approach makes sense -- a custom ISerializer and IDeserializer implementation. I have yet to look at the details.
This looks like it's almost complete -- you implemented it pretty much how I expected:
- Write a new Deserializer implementation for a custom format.
- Create a new entry point for that format's Serialize/Deserialize methods.
- Add a new attribute that lets the type author specify the width of the field a. Your particular implementation is a little surprising to me -- I would have though the field length would be in bytes, not characters.
- On Serialize/Deserialize, extract the attributes needed for each field from the ISerdeInfo.
- For deserialization, just grab a span and deserialize from that.
I agree that the main thing left is tests. Also, I'm not sure this is the right place to put this. Generally I've split all non-JSON formats into their own repo, e.g. https://github.com/serdedotnet/serde.msgpack. Is there a particular reason that you need to put it in this repo? Since it doesn't require any source generation changes, I think it could live in its own NuGet package that people pull in as-necessary.
Your particular implementation is a little surprising to me -- I would have though the field length would be in bytes, not characters.
This is due to how this type will be used. Implementers will know the character width of the column, likely by opening the file in a text editor and measuring it. To then need to convert that into a number of bytes would be tedious and would likely lead to comments specifying the field width in characters.
I would be ok with adding an alternate attribute that handles field length in bytes.
Is there a particular reason that you need to put it in this repo?
Earlier implementations of this relied on internal types, but that is no longer the case. If you'd like to create a repo for this library I can PR against that instead.
Created a new repo here: https://github.com/serdedotnet/fixed-width