Add support for saving new strings with multi-encoding
See #1789.
This implements the most relevant use case (only one encoding per value/component). I did not think much about optimization of the case that the first encoding cannot encode the string, as it is relatively rare. The standard case should almost not be impacted.
The problem with JIS X 201 and JIS X 208 mentioned in the issue is not handled here.
Checklist
- [x] The pull request branch is in sync with latest commit on the fo-dicom/development branch
- [x] I have updated API documentation
- [x] I have included unit tests
- [x] I have updated the change log
- [x] I am listed in the CONTRIBUTORS file
This is ready for review, except for the release notes (which I will probably add after the release, provided that this will be merged after the release).
So this will stay draft until the 5.1.3 release? Just making sure I understand the reason for the "Draft" status.
Well, I understood that the release should be very soon (though that was a few weeks ago), and didn't want to introduce potential regressions just before the release. Though I can of course add this to the current release (e.g. add the release notes and make it ready for review). What do you think?
Given that fo-dicom.Codecs is also waiting for a release, I'd rather wait until after the release is out. This PR isn't that important - the feature has never been there, and that seems not to have been a real problem.
Codecov Report
Attention: Patch coverage is 98.68421% with 1 line in your changes missing coverage. Please review.
Project coverage is 76.40%. Comparing base (
552a142) to head (6914e7c). Report is 9 commits behind head on development.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| FO-DICOM.Core/DicomEncoding.cs | 98.61% | 0 Missing and 1 partial :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## development #1791 +/- ##
===============================================
+ Coverage 76.39% 76.40% +0.01%
===============================================
Files 275 275
Lines 25486 25550 +64
Branches 3056 3064 +8
===============================================
+ Hits 19469 19521 +52
- Misses 5073 5082 +9
- Partials 944 947 +3
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@gofal, @amoerie - I forgot about this PR... Did you have time to check it?
I already reviewed it once and the changes seemed okay back then. The tests seem very nice, so I'm not too worried about correctness. I just hope we won't see serious performance regressions somewhere since you're touching some sensitive hot paths, but we will find out soon quickly haha. :-)
Looks great to me, thanks! When doing performance checks, I noticed a bug in LazyByteBuffer, maybe you could add this to this PR. Usually when fo-dicom reads a file or receives a file via network, then the original binary data is kept in memory and only convertet to string if the tag is accessed. But when writing the file to disk then the original binary data is reused. So this conversion is only called, when a string-value of the DicomDataset is changed or a completely new DicomDataset is created. And in these scenarios - in order to avoid unnecessary conversion - the GetBytes is not called until it is first needed when writing the dataset to file or network. Therefore there is the class LazyByteBuffer. This class implements IByteBuffer, but it gets a factory method, which is called implicitly when accessing the Data property.
Now the DicomWriter.OnElement method
public bool OnElement(DicomElement element)
{
WriteTagHeader(element.Tag, element.ValueRepresentation, element.Length);
var buffer = element.Buffer;
if (buffer is EndianByteBuffer ebb)
{
if (ebb.Endian != Endian.LocalMachine && ebb.Endian == _target.Endian)
{
buffer = ebb.Internal;
}
}
else if (_target.Endian != Endian.LocalMachine && element.ValueRepresentation.UnitSize > 1)
{
buffer = new SwapByteBuffer(buffer, element.ValueRepresentation.UnitSize);
}
WriteBuffer(_target, buffer);
return true;
}
accesses the element.Buffer.Data twice, once in WriteTagHeader, where the length of the data is taken, and a second time finally in WriteBuffer, where the acutal data is copied to the output stream.
And now there is the bug in LazyByte Buffer: everytime the Bytes-property is accessed, the factory method is called.
private byte[] Bytes => _bytes();
This is why each string is encoded twice. But this can easily be solved, by caching the output of the factory method. So if you could change the Lazy ByteBuffer to
public sealed class LazyByteBuffer : IByteBuffer
{
private readonly Func<byte[]> _bytes;
// add this field
private byte[] _byteData = null;
public LazyByteBuffer(Func<byte[]> bytes)
{
_bytes = bytes ?? throw new ArgumentNullException(nameof(bytes));
}
// change this property to access the field, and only to call the factory method if the field is null and store the result of the factory method into the field.
private byte[] Bytes => _byteData ??= _bytes();
I would have done this myself, but the branch is not in fo-dicom repository, but in your personal repository and I cannot push there.
I would have done this myself, but the branch is not in fo-dicom repository, but in your personal repository and I cannot push there.
This should not be a problem, I have done this several times. By default maintainers are allowed to edit PRs, this can be seen at the bottom of the right column of this page: "Maintainers are allowed to edit this pull request."
Though of course I can do that myself...
Ok, done!