FastSoup
FastSoup copied to clipboard
BeautifulSoup interface for lxml
FastSoup
.. image:: https://travis-ci.org/spumer/FastSoup.svg :target: https://travis-ci.org/spumer/FastSoup :alt: Build Status
.. image:: https://coveralls.io/repos/github/spumer/FastSoup/badge.svg :target: https://coveralls.io/github/spumer/FastSoup
=====================================================================================================================================================
BeautifulSoup interface for lxml
Key features
-
FAST search in tree
-
FAST serialize to str
-
BeautifulSoup4 interface to interact with object:
- Search:
find
\ ,find_all
\ ,find_next
\ ,find_next_sibling
- Text:
get_text
\ ,string
- Tag:
name
\ ,get
\ ,clear
\ ,__getitem__
\ ,__str__
,__repr__
,append
,new_tag
,extract
,replace_with
- Search:
Install
.. code-block:: bash
pip install fast-soup==1.1.0
How to use
.. code-block:: python
from fast_soup import FastSoup
content = ... # read some html content soup = FastSoup(content)
interact like BS4 object
result = soup.find('a', id='my_link')
interact like lxml object
el = result.unwrap()
FAQ
Q: BS4 already implement lxml parser. Why i should use FastSoup?
A: Yes, BS4 implement parser\ , and it's just building the tree. All next interactions proceed with "Python speed": searching, serialization. FastSoup internally use lxml and guarantee "C speed".
Q: How FastSoup speedup works?
A: FastSoup just build xpath and execute them. For prevent rebuilding LRU cache used.
Q: Why you don't support whole interface? This will be soon?
A: I wrote functions which speed up parsing in my projects. Just create a issue or pull request and i think we find the solution ;)
Miscellaneous
You can got power of BeautifulSoup when wrap your lxml objects, e.g:
.. code-block:: python
from fast_soup import Tag
content = ... # some bytes ready to parse context = lxml.etree.iterparse( io.BytesIO(content), ... ) for event, elem in context: tag = Tag(elem)
tag_text = tag.get_text()
tag_attr = tag['attribute']