multipart Feature request: support MultipartSegment subclasses in PushMultipartParser

I'm interested in parsing multipart/byteranges response data using this library. I think that it is a good match because this format does not require any logic changes in PushMultipartParser, and it is not a deprecated format in any way (just not used very often).

I understand if direct support would be considered out of scope for this library, given its focus on multipart/form-data, but is quite a shame to be unable to reuse the high-quality core parsing implementation.

The simplest way to allow this usage I can think of would be to support passing a MultipartSegment subclass to PushMultipartParser constructor, and then yielding instances of this class instead of hard-coded MultipartSegment in PushMultipartParser.parse. As a further (optional) improvement, MultipartSegment could be refactored into BaseMultipartSegment and MultipartSegment, where BaseMultipartSegment would have only the header list manipulation implemented, and multipart/form-data-specific concrete implementations of state transition callbacks would be in the MultipartSegment proper.

An external implementation of multipart/byteranges would then create a custom subclass of MultipartSegment (or BaseMultipartSegment), implement the state transition callbacks differently (parsing Content-Range instead of Content-Disposition, for example) and provide a different interface to access the parsed metadata.

What do you think of this suggestion? I could prepare a PR if the overall direction is acceptable.

Feb 19 '25 11:02 himikof

Yes this would be out of scope for this library, as it focuses on the server-side parsing of multipart/form-data. But this does not mean we cannot open it up for more use cases, as long as it does not hurt core functionality and does not add too much complexity. multipart/byteranges is a living standard used by actual servers and clients, so it's worth some consideration.

Modifying PushMultipartParser to allow a custom MultipartSegment subclass should be easy and straight forward. Either via a constructor argument, or maybe a class attribute or method that can be overridden by a subclass.

MultipartSegment is a bit tricky, though. It contains both essential header parsing logic (that could be re-used) as well as multipart/form-data specific validity checks (that need to be bypassed). Some of the parsing logic is also bound to form-data semantics (e.g. the internal self.name is None checks) and some of its public APIs do not make sense for byteranges (e.g. MultipartSegment.name). Subclassing would work, but it would not be as clean and stable as it could be.

Splitting it up into MultipartSegmentBase and MultipartSegment may be a better option, but it also may have an impact on performance. I'm not sure, needs to be tested. The _close_headers() logic and the form-data specific properties would then live in the subclass, and MultipartSegmentBase would not care about header semantics at all. Sounds good, actually.

Let me think about it for a bit and benchmark the base class split idea. I may come up with a clean API that can be officially supported.

Feb 19 '25 12:02 defnull

I know a lot of time has passed. If you are still interested, would you have a look at #77 if it works for you?

MultipartSegment is now very basic and no longer implements any parsing logic or form-data specific checks. Most general parsing logic moved back into the actual parser where it belongs and the form-data specific checks are now all in PushMultipartParser._create_segment(self, headerlist), which can be overridden. This should allow you to subclass PushMultipartParser and perform different checks or return a MultipartSegment subclass if you need to.

Jul 07 '25 12:07 defnull

The changes will be part of version 1.4, once released.

Jul 27 '25 00:07 defnull