json2xml icon indicating copy to clipboard operation
json2xml copied to clipboard

feat: Add free-threaded Python 3.14t support with parallel processing (1.55x speedup)

Open vinitkumar opened this issue 4 months ago β€’ 2 comments

Summary

This PR adds parallel processing support for json2xml, leveraging Python 3.14t's free-threaded capabilities (no-GIL) to achieve up to 1.55x speedup for medium-sized datasets.

πŸš€ Key Features

  • Parallel processing for dictionaries and lists
  • Free-threaded Python 3.14t support with GIL-free execution
  • Up to 1.55x speedup for medium datasets (100-1K items)
  • Automatic fallback to serial processing for small datasets
  • Thread-safe XML validation caching
  • Zero breaking changes - fully backward compatible

πŸ“Š Benchmark Results

Tested on macOS ARM64 with Python 3.14.0 and Python 3.14.0t:

Medium Dataset (100 items) - Best Case

Python Version Serial Time Parallel (4w) Speedup
3.14 (GIL) 7.56 ms 7.86 ms 0.96x
3.14t (no-GIL) 8.59 ms 5.55 ms 1.55x πŸš€

Key Findings:

  • βœ… 1.55x speedup on Python 3.14t for medium datasets
  • βœ… Automatic detection of free-threaded Python build
  • βœ… No benefit on standard Python (as expected due to GIL)
  • βœ… Smart fallback avoids overhead for small datasets

See BENCHMARK_RESULTS.md for complete results.

πŸ’» Usage

Basic Parallel Processing

from json2xml.json2xml import Json2xml

data = {"users": [{"id": i, "name": f"User {i}"} for i in range(1000)]}
converter = Json2xml(data, parallel=True)
xml = converter.to_xml()  # Up to 1.55x faster on Python 3.14t!

Advanced Configuration

converter = Json2xml(
    data,
    parallel=True,    # Enable parallel processing
    workers=4,        # Use 4 worker threads
    chunk_size=100    # Process 100 items per chunk
)
xml = converter.to_xml()

πŸ”§ Implementation Details

New Files

  • json2xml/parallel.py - Parallel processing module (318 lines)
  • tests/test_parallel.py - Comprehensive test suite (20 tests)
  • benchmark.py - Performance benchmarking tool
  • docs/performance.rst - Sphinx documentation

Modified Files

  • json2xml/json2xml.py - Added parallel, workers, chunk_size parameters
  • json2xml/dicttoxml.py - Integrated parallel processing support
  • README.rst - Added performance section with benchmarks
  • docs/index.rst - Added performance documentation page

βœ… Testing

All 173 tests passing (153 original + 20 new parallel tests)

pytest -v
# ============================= 173 passed in 0.14s ==============================
  • βœ… Zero regressions
  • βœ… Full backward compatibility
  • βœ… Comprehensive parallel processing validation
  • βœ… Edge case handling
  • βœ… Thread-safety verification

πŸ“š Documentation

Complete documentation added:

  • FREE_THREADED_OPTIMIZATION_ANALYSIS.md - Technical analysis and optimization strategy
  • BENCHMARK_RESULTS.md - Detailed benchmark results and analysis
  • IMPLEMENTATION_SUMMARY.md - Implementation details and architecture
  • docs/performance.rst - Sphinx documentation for users
  • Updated README.rst with usage examples and benchmark results

🎯 Performance Recommendations

When to Use Parallel Processing

Best for:

  • Medium datasets (100-1K items)
  • Python 3.14t (free-threaded build)
  • Complex nested structures

Not recommended for:

  • Small datasets (< 100 items) - overhead outweighs benefit
  • Standard Python with GIL - no parallel execution possible

Optimal Configuration

# For medium datasets (100-1K items)
converter = Json2xml(data, parallel=True, workers=4)

πŸ”„ Breaking Changes

None - This is a fully backward-compatible change:

  • Default behavior unchanged (parallel=False)
  • All existing code continues to work without modification
  • Parallel processing is opt-in

πŸ§ͺ Running Benchmarks

# Standard Python
uv run --python 3.14 python benchmark.py

# Free-threaded Python
uv run --python 3.14t python benchmark.py

πŸ“‹ Checklist

  • βœ… Implementation complete
  • βœ… All tests passing (173/173)
  • βœ… Documentation updated
  • βœ… Benchmarks run on both Python versions
  • βœ… README updated with performance section
  • βœ… Zero breaking changes
  • βœ… Backward compatible
  • βœ… Code reviewed by Oracle AI
  • βœ… Production ready

πŸŽ‰ Conclusion

This PR makes json2xml ready for Python's free-threaded future while maintaining perfect compatibility with existing code. Users can now opt-in to parallel processing and see significant performance improvements on Python 3.14t!


Related Issues: N/A (proactive optimization)
Type: Feature Enhancement
Impact: Performance improvement, no breaking changes

Summary by Sourcery

Enable optional parallel JSON-to-XML conversion in json2xml leveraging Python 3.14t free-threaded mode, add thread-safe validation caching, provide benchmarks and documentation, and maintain full backward compatibility

New Features:

  • Add opt-in parallel processing for JSON-to-XML conversion via parallel, workers, and chunk_size parameters
  • Support free-threaded Python 3.14t builds with automatic detection and GIL-free execution
  • Provide thread-safe caching for XML validation in parallel mode

Enhancements:

  • Introduce json2xml/parallel.py module with concurrent dict and list conversion logic
  • Integrate parallel conversion paths into dicttoxml and Json2xml while preserving default serial behavior
  • Bundle a benchmarking script to measure serial vs. parallel performance

Documentation:

  • Update README and Sphinx docs with performance section and usage examples
  • Add detailed markdown files for optimization analysis, benchmark results, and implementation summary

Tests:

  • Add 20 new tests in tests/test_parallel.py covering detection, dict/list parallel conversion, nested data, and order preservation

vinitkumar avatar Oct 23 '25 21:10 vinitkumar

Reviewer's Guide

This PR introduces an opt-in parallel processing layer for json2xml using Python 3.14t’s free-threaded mode. It adds a dedicated parallel module, extends core conversion functions to dispatch between serial and threaded implementations based on configuration, exposes new API parameters, and provides a full suite of tests, benchmarks, and updated documentationβ€”all while preserving backward compatibility.

Sequence diagram for parallel dict conversion in dicttoxml

sequenceDiagram
    participant Caller
    participant "dicttoxml.dicttoxml()"
    participant "parallel.convert_dict_parallel()"
    participant "ThreadPoolExecutor"
    participant "_convert_dict_item()"
    Caller->>"dicttoxml.dicttoxml()": call with parallel=True, obj is dict
    "dicttoxml.dicttoxml()"->>"parallel.convert_dict_parallel()": dispatch for dict
    "parallel.convert_dict_parallel()"->>"ThreadPoolExecutor": submit _convert_dict_item for each key
    "ThreadPoolExecutor"->>"_convert_dict_item()": process key/value in thread
    "_convert_dict_item()"-->>"ThreadPoolExecutor": return XML string
    "ThreadPoolExecutor"-->>"parallel.convert_dict_parallel()": collect results
    "parallel.convert_dict_parallel()"-->>"dicttoxml.dicttoxml()": return joined XML
    "dicttoxml.dicttoxml()"-->>Caller: return XML bytes

Sequence diagram for parallel list conversion in dicttoxml

sequenceDiagram
    participant Caller
    participant "dicttoxml.dicttoxml()"
    participant "parallel.convert_list_parallel()"
    participant "ThreadPoolExecutor"
    participant "_convert_list_chunk()"
    Caller->>"dicttoxml.dicttoxml()": call with parallel=True, obj is list
    "dicttoxml.dicttoxml()"->>"parallel.convert_list_parallel()": dispatch for list
    "parallel.convert_list_parallel()"->>"ThreadPoolExecutor": submit _convert_list_chunk for each chunk
    "ThreadPoolExecutor"->>"_convert_list_chunk()": process chunk in thread
    "_convert_list_chunk()"-->>"ThreadPoolExecutor": return XML string
    "ThreadPoolExecutor"-->>"parallel.convert_list_parallel()": collect results
    "parallel.convert_list_parallel()"-->>"dicttoxml.dicttoxml()": return joined XML
    "dicttoxml.dicttoxml()"-->>Caller: return XML bytes

Class diagram for updated Json2xml and dicttoxml API

classDiagram
    class Json2xml {
        +data: dict[str, Any] | None
        +pretty: bool
        +attr_type: bool
        +item_wrap: bool
        +root: str | None
        +parallel: bool
        +workers: int | None
        +chunk_size: int
        +to_xml() Any | None
    }
    class dicttoxml {
        +dicttoxml(
            obj: Any,
            ids: list[str] = [],
            custom_root: str = "root",
            attr_type: bool = True,
            item_func: Callable[[str], str] = default_item_func,
            cdata: bool = False,
            xml_namespaces: dict[str, Any],
            list_headers: bool = False,
            parallel: bool = False,
            workers: int | None = None,
            chunk_size: int = 100
        ) -> bytes
    }
    Json2xml --> dicttoxml : uses

Class diagram for new parallel processing module

classDiagram
    class parallel {
        +is_free_threaded() bool
        +get_optimal_workers(workers: int | None) int
        +key_is_valid_xml_cached(key: str) bool
        +make_valid_xml_name_cached(key: str, attr: dict[str, Any]) tuple[str, dict[str, Any]]
        +convert_dict_parallel(
            obj: dict[str, Any],
            ids: list[str],
            parent: str,
            attr_type: bool,
            item_func: Callable[[str], str],
            cdata: bool,
            item_wrap: bool,
            list_headers: bool = False,
            workers: int | None = None,
            min_items_for_parallel: int = 10
        ) str
        +convert_list_parallel(
            items: Sequence[Any],
            ids: list[str] | None,
            parent: str,
            attr_type: bool,
            item_func: Callable[[str], str],
            cdata: bool,
            item_wrap: bool,
            list_headers: bool = False,
            workers: int | None = None,
            chunk_size: int = 100
        ) str
    }

File-Level Changes

Change Details Files
Introduce parallel processing infrastructure
  • Add json2xml/parallel.py with free-threaded detection and thread pool utilities
  • Implement parallel convert_dict and convert_list functions with order preservation
  • Add thread-safe XML validation caching and name sanitization helpers
json2xml/parallel.py
Extend dicttoxml to route to parallel converters
  • Add parallel, workers, and chunk_size parameters to dicttoxml signature
  • Branch logic to invoke convert_dict_parallel or convert_list_parallel when parallel=True
  • Maintain original serial conversion path for fallback
json2xml/dicttoxml.py
Expose parallel options in Json2xml API
  • Add parallel, workers, and chunk_size parameters to Json2xml.init
  • Pass new parameters through to dicttoxml invocation in to_xml
  • Ensure default serial behavior remains unchanged
json2xml/json2xml.py
Add comprehensive parallel tests
  • Create tests/test_parallel.py with 20 tests covering feature detection, small/large data, nested structures, and API integration
  • Validate fallback to serial path, order preservation, and special character handling
tests/test_parallel.py
Provide benchmarking and documentation support
  • Add benchmark.py for performance measurement across dataset sizes and thread counts
  • Introduce FREE_THREADED_OPTIMIZATION_ANALYSIS.md and BENCHMARK_RESULTS.md
  • Update README.rst, docs/index.rst, and add docs/performance.rst with usage and benchmark details
benchmark.py
FREE_THREADED_OPTIMIZATION_ANALYSIS.md
BENCHMARK_RESULTS.md
README.rst
docs/index.rst
docs/performance.rst

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an issue from a review comment by replying to it. You can also reply to a review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull request title to generate a title at any time. You can also comment @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in the pull request body to generate a PR summary at any time exactly where you want it. You can also comment @sourcery-ai summary on the pull request to (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the pull request to resolve all Sourcery comments. Useful if you've already addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull request to dismiss all existing Sourcery reviews. Especially useful if you want to start fresh with a new review - don't forget to comment @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

  • Contact our support team for questions or feedback.
  • Visit our documentation for detailed guides and information.
  • Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai[bot] avatar Oct 23 '25 21:10 sourcery-ai[bot]

Codecov Report

:x: Patch coverage is 99.33333% with 1 line in your changes missing coverage. Please review. :white_check_mark: Project coverage is 99.53%. Comparing base (e5ab104) to head (8e5d68a). :warning: Report is 1 commits behind head on master-freethreaded.

Files with missing lines Patch % Lines
json2xml/dicttoxml.py 97.43% 1 Missing :warning:
Additional details and impacted files
@@                   Coverage Diff                   @@
##           master-freethreaded     #256      +/-   ##
=======================================================
+ Coverage                99.30%   99.53%   +0.23%     
=======================================================
  Files                        3        4       +1     
  Lines                      288      432     +144     
=======================================================
+ Hits                       286      430     +144     
  Misses                       2        2              
Flag Coverage Ξ”
unittests 99.53% <99.33%> (+0.23%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Oct 23 '25 21:10 codecov[bot]