fo-dicom icon indicating copy to clipboard operation
fo-dicom copied to clipboard

Add support for saving new strings with multi-encoding

Open mrbean-bremen opened this issue 1 year ago • 10 comments

See #1789.

This implements the most relevant use case (only one encoding per value/component). I did not think much about optimization of the case that the first encoding cannot encode the string, as it is relatively rare. The standard case should almost not be impacted.

The problem with JIS X 201 and JIS X 208 mentioned in the issue is not handled here.

Checklist

  • [x] The pull request branch is in sync with latest commit on the fo-dicom/development branch
  • [x] I have updated API documentation
  • [x] I have included unit tests
  • [x] I have updated the change log
  • [x] I am listed in the CONTRIBUTORS file

mrbean-bremen avatar May 20 '24 19:05 mrbean-bremen

This is ready for review, except for the release notes (which I will probably add after the release, provided that this will be merged after the release).

mrbean-bremen avatar May 23 '24 16:05 mrbean-bremen

So this will stay draft until the 5.1.3 release? Just making sure I understand the reason for the "Draft" status.

amoerie avatar Jun 17 '24 07:06 amoerie

Well, I understood that the release should be very soon (though that was a few weeks ago), and didn't want to introduce potential regressions just before the release. Though I can of course add this to the current release (e.g. add the release notes and make it ready for review). What do you think?

mrbean-bremen avatar Jun 17 '24 17:06 mrbean-bremen

Given that fo-dicom.Codecs is also waiting for a release, I'd rather wait until after the release is out. This PR isn't that important - the feature has never been there, and that seems not to have been a real problem.

mrbean-bremen avatar Jun 19 '24 18:06 mrbean-bremen

Codecov Report

Attention: Patch coverage is 98.68421% with 1 line in your changes missing coverage. Please review.

Project coverage is 76.40%. Comparing base (552a142) to head (6914e7c). Report is 9 commits behind head on development.

Files with missing lines Patch % Lines
FO-DICOM.Core/DicomEncoding.cs 98.61% 0 Missing and 1 partial :warning:
Additional details and impacted files
@@               Coverage Diff               @@
##           development    #1791      +/-   ##
===============================================
+ Coverage        76.39%   76.40%   +0.01%     
===============================================
  Files              275      275              
  Lines            25486    25550      +64     
  Branches          3056     3064       +8     
===============================================
+ Hits             19469    19521      +52     
- Misses            5073     5082       +9     
- Partials           944      947       +3     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jun 28 '24 19:06 codecov[bot]

@gofal, @amoerie - I forgot about this PR... Did you have time to check it?

mrbean-bremen avatar Aug 08 '24 13:08 mrbean-bremen

I already reviewed it once and the changes seemed okay back then. The tests seem very nice, so I'm not too worried about correctness. I just hope we won't see serious performance regressions somewhere since you're touching some sensitive hot paths, but we will find out soon quickly haha. :-)

amoerie avatar Aug 08 '24 18:08 amoerie

Looks great to me, thanks! When doing performance checks, I noticed a bug in LazyByteBuffer, maybe you could add this to this PR. Usually when fo-dicom reads a file or receives a file via network, then the original binary data is kept in memory and only convertet to string if the tag is accessed. But when writing the file to disk then the original binary data is reused. So this conversion is only called, when a string-value of the DicomDataset is changed or a completely new DicomDataset is created. And in these scenarios - in order to avoid unnecessary conversion - the GetBytes is not called until it is first needed when writing the dataset to file or network. Therefore there is the class LazyByteBuffer. This class implements IByteBuffer, but it gets a factory method, which is called implicitly when accessing the Data property.

Now the DicomWriter.OnElement method

    public bool OnElement(DicomElement element)
    {
        WriteTagHeader(element.Tag, element.ValueRepresentation, element.Length);

        var buffer = element.Buffer;
        if (buffer is EndianByteBuffer ebb)
        {
            if (ebb.Endian != Endian.LocalMachine && ebb.Endian == _target.Endian)
            {
                buffer = ebb.Internal;
            }
        }
        else if (_target.Endian != Endian.LocalMachine && element.ValueRepresentation.UnitSize > 1)
        {
            buffer = new SwapByteBuffer(buffer, element.ValueRepresentation.UnitSize);
        }

        WriteBuffer(_target, buffer);

        return true;
    }

accesses the element.Buffer.Data twice, once in WriteTagHeader, where the length of the data is taken, and a second time finally in WriteBuffer, where the acutal data is copied to the output stream.

And now there is the bug in LazyByte Buffer: everytime the Bytes-property is accessed, the factory method is called.

    private byte[] Bytes => _bytes();

This is why each string is encoded twice. But this can easily be solved, by caching the output of the factory method. So if you could change the Lazy ByteBuffer to

public sealed class LazyByteBuffer : IByteBuffer
{
    private readonly Func<byte[]> _bytes;
    // add this field
    private byte[] _byteData = null;

    public LazyByteBuffer(Func<byte[]> bytes)
    {
        _bytes = bytes ?? throw new ArgumentNullException(nameof(bytes));
    }

   // change this property to access the field, and only to call the factory method if the field is null and store the result of the factory method into the field.
    private byte[] Bytes => _byteData ??= _bytes();
    

I would have done this myself, but the branch is not in fo-dicom repository, but in your personal repository and I cannot push there.

gofal avatar Aug 13 '24 10:08 gofal

I would have done this myself, but the branch is not in fo-dicom repository, but in your personal repository and I cannot push there.

This should not be a problem, I have done this several times. By default maintainers are allowed to edit PRs, this can be seen at the bottom of the right column of this page: "Maintainers are allowed to edit this pull request."

Though of course I can do that myself...

mrbean-bremen avatar Aug 13 '24 11:08 mrbean-bremen

Ok, done!

mrbean-bremen avatar Aug 13 '24 19:08 mrbean-bremen