Pythonista-Issues icon indicating copy to clipboard operation
Pythonista-Issues copied to clipboard

lxml module

Open scj643 opened this issue 8 years ago • 46 comments
trafficstars

Implementing lxml would make web scraping a whole lot easier and faster.

scj643 avatar Jan 04 '17 13:01 scj643

I may not be understanding correctly but couldn't you just pip install it via StaSh?

Hum4n01d avatar Jan 04 '17 16:01 Hum4n01d

@Hum4n01d That wouldn't work for lxml, as it's not pure Python.

@scj643 I usually use BeautifulSoup for web scraping, what makes lxml better/easier?

omz avatar Jan 04 '17 16:01 omz

@omz

The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt. It is unique in that it combines the speed and XML feature completeness of these libraries with the simplicity of a native Python API, mostly compatible but superior to the well-known ElementTree API."

scj643 avatar Jan 04 '17 16:01 scj643

Hi @omz, since you're here, I just want to say thanks so much for building Pythonista! It's really cool and fun for side projects that I can work on from anywhere!

Hum4n01d avatar Jan 04 '17 17:01 Hum4n01d

@omz As far as I can tell, lxml has to be installed for BeautifulSoup to be able to parse in XML mode (i. e. bs4.BeautifulSoup(source, "xml")). By default BS4 parses the source as HTML, which doesn't work properly for all XML, because a HTML parser makes assumptions about the meanings of some tags (where they may appear, whether they can/must be empty, etc.), whereas in XML tags have no predefined meaning.

dgelessus avatar Jan 04 '17 17:01 dgelessus

Also I use lxml to neuter certain pages for making documentation

scj643 avatar Jan 04 '17 17:01 scj643

Libxml is part of the iOS SDK so headers are provided by Apple

scj643 avatar Feb 07 '17 13:02 scj643

Having lxml would be great!

somelinguist avatar Aug 19 '17 19:08 somelinguist

It looks like Kivy for iOS previously had a recipe for getting lxml into apps, but it hasn't been ported to their current toolchain. But it looks like it should be possible to include in Pythonista at least.

https://github.com/kivy/kivy-ios

https://groups.google.com/forum/?nomobile=true#!topic/kivy-dev/86W5bPqrEUw

somelinguist avatar Aug 27 '17 02:08 somelinguist

+1 for lxml. It's a dependency for many libraries I want to use with Pythonista.

ghost avatar Aug 27 '17 14:08 ghost

If you use the lxml module outside of Pythonista, check out the performance boosts in version 4... http://lxml.de/4.0/changes-4.0.0.html

cclauss avatar Sep 17 '17 13:09 cclauss

+1 for lxml. Is there an update on when we can expect this to be included in Pythonista? It's been almost a year now since this issue/ticket was opened up. Is it still in the works to be included or has it been rejected?

osuskates avatar Dec 26 '17 14:12 osuskates

+1 lxml. Please include lxml module into pythonista.

vitofly avatar Jan 30 '18 20:01 vitofly

+1 lxml. I have a package with a big dependency I can't work around.

1minus1 avatar Feb 25 '18 19:02 1minus1

+1 lxml. I also have been stuck several times trying to install on iPhone and iPad some packages that depended on lxml. Would be great.

victordomingos avatar Mar 05 '18 22:03 victordomingos

+1 lxml. I use the docx-mailmerge module which depends on it and I would like to be able to use it on my iPhone.

briantkatch avatar Apr 12 '18 23:04 briantkatch

+1 lxml

mrjakewalter avatar Aug 03 '18 20:08 mrjakewalter

I have a want to use some libraries for working with networking gear. Many of them use lxml to parse structured data returned from the networking devices.

lampwins avatar Sep 17 '18 22:09 lampwins

+1

Sent with GitHawk

yjqiang avatar Nov 27 '18 14:11 yjqiang

+1, then I can use python-pptx to process the ppt file.

goldengrape avatar Mar 11 '19 12:03 goldengrape

https://develobile.com/pyto has lxml if you need it.

cclauss avatar Mar 11 '19 13:03 cclauss

https://develobile.com/pyto has lxml if you need it.

"@ColdGrub1384 Remove lxml (see lxml branch and #25)" https://github.com/ColdGrub1384/Pyto/issues/25

@cclauss they had removed lxml 9 hours after your post.

Maybe it is impossible to have lxml in iOS (forever?)

https://github.com/ColdGrub1384/Pyto/issues/25

Maybe lxml will not be possible on the App Store, lxml depends on libxml, libxslt and libexslt C libraries. They are already included on Xcode, no need to compile them. But, lxml calls many functions of libxslt and libexslt that aren't defined on header files (they are defined by headers inside lxml) and Apple says they are Private APIs.

goldengrape avatar Mar 12 '19 11:03 goldengrape

@goldengrape I created a branch with lxml as it works perfectly, it's just the App Store that rejects it.

ColdGrub1384 avatar Mar 12 '19 21:03 ColdGrub1384

For BeautifulSoup, the html5lib module might be slower than lxml but it performs adequately for web scraping on both Pythonista and Pyto.

cclauss avatar Mar 12 '19 21:03 cclauss

A potential work around would probably involve rebuilding lxml from the ground up so any dependencies get linked in the xcode project.

scj643 avatar Mar 13 '19 10:03 scj643

I have to rebuild libxslt and libexslt (lxml dependencies) and rename functions flagged as private APIs by Apple. Then link these libraries and re-compile lxml with renamed functions.

ColdGrub1384 avatar Mar 13 '19 10:03 ColdGrub1384

Also known as refactoring :)

scj643 avatar Mar 13 '19 12:03 scj643

Another big plus for lxml that I haven’t seen in the replies here yet is the XPath support. Makes getting fragments of html/xml sources a lot easier with carefully crafted xpaths.

boisei0 avatar Apr 18 '19 13:04 boisei0

Any progress with lxml ? 😁

hmelino avatar May 19 '19 20:05 hmelino

+1 for lxml

robert-heath avatar May 24 '19 04:05 robert-heath