quick-xml icon indicating copy to clipboard operation
quick-xml copied to clipboard

Async support

Open Mingun opened this issue 1 year ago • 3 comments

First of all, I would like to thank @999eagle for her work. It was very useful for quickly obtaining a working result and its iterative improvement.

After week of experimenting and prepare work I think, that the result suits me. I tried to focus on those points:

  • maximum reuse the code. For that all key methods rewritten using macros, so the sync and async code is the same
  • try to minimize intermediate wrappers, because it is painful to pass long named types to functions, like in that example: https://github.com/tafia/quick-xml/blob/9ccc68620fb4f525d0316abe8e3cfac624780b56/src/reader/mod.rs#L324-L326

I have considered several options

  • original suggestion with three intermediate structs: SliceReader, BufferedReader and AsyncReader:
    impl                  Reader<SliceReader> { ... }// for &[u8]
    impl<R: BufRead>      Reader<BufferedReader<R>> { ... }
    impl<R: AsyncBufRead> Reader<AsyncReader<R>> { ... }
    
    I have realized that we have enough only two of them therefore...
  • two intermediate newtype structs, SyncReader and AsyncReader
    impl<R>               Reader<SyncReader<R>> { ... }// for &[u8] and BufRead
    impl<R: AsyncBufRead> Reader<AsyncReader<R>> { ... }
    
    After further experiments, it became clear that we can do without SyncReader, so...
  • (this one) sync code stays the same. Async code is implemented using an adapter struct, which I've name TokioAdapter (former AsyncReader)
    impl                  Reader<&[u8]> { ... }
    impl<R: BufRead>      Reader<R> { ... }
    impl<R: AsyncBufRead> Reader<AsyncReader<R>> { ... }
    
  • do not introduce intermediate structs and just implement everything for Reader<R>
    impl                  Reader<&[u8]> { ... }
    impl<R: BufRead>      Reader<R> { ... }
    impl<R: AsyncBufRead> Reader<R> { ... }
    
    In this option the conflicts of names for internal methods would appear. In principle, there is nothing unsolvable, but if we decide to realize support for other asynchronous library, it will be perhaps not really convenient. However, my attempts to implement support for futures were unsuccessful -- API differs a little. Besides, there is nothing difficult in principle, but so far it seems that it is possible to do without it as as there are adapters

As it seemed, all realization differs slightly. I especially like the 4th option, as it eliminates the need for a long type if you need to pass Reader to the function. I think, I'll make also another PR with that variant.

I decided to not implement from_file for async reader, because it is required additional tokio feature and it seems that it can be easely done outside the crate.

Also, I did not implement read_text_into_async API, because it available not on all readers and I think, it should be reworked to return all intermediate notes as text as required for #257 and also was discussed in #154.

I'm not sure, whether we have to Send for inner type, as was originally implemented by @999eagle in #417, currently I do not require that. It seems, that compiler should automatically guarantee that when it can, but how to check that?

@999eagle, could you also review that?

Closes #417 Closes #425

Mingun avatar Jul 31 '22 20:07 Mingun

Codecov Report

Merging #450 (c00bc06) into master (9ccc686) will increase coverage by 0.16%. The diff coverage is 78.40%.

@@            Coverage Diff             @@
##           master     #450      +/-   ##
==========================================
+ Coverage   51.33%   51.49%   +0.16%     
==========================================
  Files          28       29       +1     
  Lines       13312    13435     +123     
==========================================
+ Hits         6834     6919      +85     
- Misses       6478     6516      +38     
Flag Coverage Δ
unittests 51.49% <78.40%> (+0.16%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
benches/macrobenches.rs 0.00% <ø> (ø)
benches/microbenches.rs 0.00% <ø> (ø)
examples/custom_entities.rs 0.00% <ø> (ø)
examples/nested_readers.rs 0.00% <ø> (ø)
examples/read_buffered.rs 0.00% <ø> (ø)
examples/read_texts.rs 0.00% <0.00%> (ø)
src/de/mod.rs 74.59% <ø> (+0.47%) :arrow_up:
src/events/mod.rs 68.60% <ø> (ø)
src/lib.rs 12.66% <0.00%> (+0.38%) :arrow_up:
src/se/mod.rs 93.81% <ø> (ø)
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 9ccc686...c00bc06. Read the comment docs.

codecov-commenter avatar Jul 31 '22 20:07 codecov-commenter

Will need a few days to fully review.. just leaving a few comments for now.

dralley avatar Aug 01 '22 01:08 dralley

I really wish we could try out the internal buffering idea (no more read_event_into(), &str returned in all cases) before introducing all these macros, as I think there may be an opportunity there for macro-less deduplication. The macros do make it a lot harder to actually read and understand the code. However I'm also tied up with work again for the next week and a half though so I don't have much time to spend prototyping.

So I would say I'm -0, I won't block it but I'm not thrilled about that particular change to be honest.

dralley avatar Aug 05 '22 01:08 dralley

I really wish we could try out the internal buffering idea

I think that the idea to allow the user to manage the buffer with the event content as he wish has they own benefits. Actually, that approach very similar to how Read trait in the standard library is defined: you provide a buffer that will filled with some content.

So we could investigate that idea, and maybe even provide a special buffering reader for users that do not want to manage the buffer himself, but we should keep the current API. That API allows to apply the reader as an operator to a buffer and create events over it, which does not tied to the reader itself, so it can be even destroyed if it is not required anymore.


Since there are no fundamental objections, I think I will merge it in a few hours.

Mingun avatar Aug 14 '22 11:08 Mingun

@999eagle Do you feel like this feature is living up to your expectations? Any issues with it in practice?

dralley avatar Sep 05 '22 16:09 dralley

Hi @dralley, I just now got around to doing so, but I just published my crate using this feature to implement async xml deserialization with derive-macros.

The actual crate I'm working on with this is also coming along fine, so (at least for now) this is fulfilling my expectations.

999eagle avatar Sep 09 '22 08:09 999eagle