Repeating text due to chap2text
The following partial output from "--scan" shows the issue. The text preceding the actual section is being repeated. The description is right. debugging shows that the start and end ids are correct.
Part: 1. INTRODUCTION TO SOLUTION ARCHITECTURE Part No: 8 Length: 791 1 INTRODUCTION TO SOLUTION ARCHITECTURE
This book is a foundation-level introduction to the discipline of solution architecture which uses a holistic approach to analyse problems and design solutions using the best available evidence from all relevant s
Part: 1.1 Architecture Part No: 9 Length: 5279 1 INTRODUCTION TO SOLUTION ARCHITECTURE
This book is a foundation-level introduction to the discipline of solution architecture which uses a holistic approach to analyse problems and design solutions using the best available evidence from all relevant s
Part: 1.2 Solution architecture Part No: 10 Length: 9735 1 INTRODUCTION TO SOLUTION ARCHITECTURE
This book is a foundation-level introduction to the discipline of solution architecture which uses a holistic approach to analyse problems and design solutions using the best available evidence from all relevant s
This diff shows how the issue can be resolved in 'chap2text'
556c556,566
< remove = False
---
> '''
> There was an assumption that no elements would occur before the element_id.
> This resulted in repeated text.
> '''
> remove=True
558,559c568,571
< if not remove and end_element_id is not None and elm.get('id') == end_element_id:
< remove = True
---
> if elm.get('id') == element_id:
> remove=False
> if end_element_id is not None and elm.get('id') == end_element_id:
> remove=True
In the existing code only elements at end_element_id and after were removed. Leaving any elements before element_id.
The code contains a bug where it will skip all text if element_id is None.
Fix that and submit a PR for it. Looks good.