quick-xml icon indicating copy to clipboard operation
quick-xml copied to clipboard

Expose underlying Cow in Event data

Open outfoxxed opened this issue 2 years ago • 4 comments
trafficstars

Would it be possible to expose the underlying Cow in event datatypes like BytesStart and BytesEnd? I have to match only specific tags, and have to keep track of the pushed tag stack.

Ideally I would keep a Vec<&'datasource [u8]> as the tag stack, but I am forced to copy since a BytesStart<'datasource> cannot return the underlying Cow, which I would then match to Cow::Borrowed.

outfoxxed avatar Apr 14 '23 11:04 outfoxxed

I would prefer to hide that fact that we store Cow<[u8]> internally and make you think that we store an str. If you sure that you will able to store Vec<&'datasource [u8]> (i.e. borrowed data from input), then you also could store BytesStart / BytesEnd -- they will store only offsets in that case (or only BytesEnd if you doesn't need attributes, use .to_end() to convert).

Would that solution acceptable for you? Otherwise feel free to submit a PR

Mingun avatar Apr 14 '23 15:04 Mingun

It would add a fair amount of compliction to my code, so I'd prefer to use the reference with the same lifetime. I'll submit a pr.

outfoxxed avatar Apr 15 '23 01:04 outfoxxed

Note, that all events will borrow when comes from Reader. Also, cloning them is cheap -- the underlying Cow stays in Borrowed state, if you clone borrowed Cow.

Mingun avatar Apr 15 '23 16:04 Mingun

hello, just to chime in and provide another example / motivation ...

i also felt the need to get hold of the ownership of the underlying Cow. Wanting to store the names of "start/end elements" in a HashMap/Set for later reference outside my "event reader loop", required me to either clone the underlying byte slice or write my own wrappers around ByteStart/End in order to provide a Hash implementation (merely delegating merely to the underlying byte slice.) later, when looking up values in the collection, i faced the situation that while ByteStart/End expose the names as &[u8] (not as a str) they do not allow construction from such a type (i had to ...

let name: &[u8] = ...; // comming from some other event
let key = MyWarpper(quick_xml::events::BytesEnd::new(unsafe {
  std::str::from_utf8_unchecked(name)
}));
match map.get(&key) {...}

... which I'd like to avoid of course.

i can imagine, somebody else might face similar difficulties if needing the standard Ord implementation over the names for example. so the trouble is rather the (limited) utility of ByteStart/End.

  1. Could providing these standard, derived impls (e.g. PartialOrd, Ord, Hash) help to solve the original issue? (for my use case that would do it.)
  2. Is there anything (semantically) anything speaking against deriving at least Hash for ByteStart/End? If not I'd set up a PR.

the trouble of course is, that one can never foresee future use-cases, so an exposure of the name as a Cow (e.g. ByteStart#into_bytes/_raw_name(self) might make sense after all.)

btw: many thanks for the amazing work on this library! believe it or not, but quick-xml allows me to do stuff that i can't manage to do with standard parsers in java :)

xitep avatar Jul 19 '24 10:07 xitep