pattern icon indicating copy to clipboard operation
pattern copied to clipboard

Python 3 support

Open tom-de-smedt opened this issue 10 years ago • 70 comments

Pattern should start supporting Python 3. Looking at the amount of code, it is a non-trivial task and any help is much appreciated.

tom-de-smedt avatar Dec 19 '13 00:12 tom-de-smedt

Hi Tom,

I'm a graduate in computational linguistics and would like to contribute to Pattern. Can you be more explicit about how Pattern should support Python 3? That is, do you want to maintain two different branches in parallel, one for Python 2 and one for 3? Or do you want to have a single code base that works both with 2 and 3? In the latter case, a library such as six would be useful.

Let me know what you think.

Cheers, Peter

pemistahl avatar Feb 08 '14 17:02 pemistahl

Hi Peter,

My goal would be to have a single code base that works with 2 and 3, but I have little experience with Python 3 so I don't know how feasible it is. In any case, the task is becoming more urgent so I will start looking into it more. I took a look at six which seems very useful. It's MIT-licensed so it could be included in Pattern.

Any help is appreciated! Let me know what you think.

Best, Tom

tom-de-smedt avatar Feb 12 '14 01:02 tom-de-smedt

:+1: on a single codebase.

I think the first stage is to add travis for testing (I looks like you're missing a requirements.txt file, so I'm unsure what deps it's missing (?) ). Travis will really help with conversion (and ensuring it continues to work on multiple platforms).

  • Do you want to continue supporting python 2.5? (Travis no longer supports it, 2.5 usage is pretty low and not sure which deps still support it...)
  • Check which deps support python 3 (things in requirements.txt), fix them up if need be.
  • Once that's hooked up, you could run a modernizer/futurize (see http://python-future.org/automatic_conversion.html#automatic-conversion ) on your code (or use six) and see what happens...

Happy to help if you can pass a requirements.txt.

hayd avatar Apr 16 '14 02:04 hayd

Got through the first two steps outlined by @hayd in this fork (repo has a requirements.txt and .travis.yml).

Some of the tests need to be excluded.

from test.py

# pattern.db tests require a valid username and password for MySQL.
# pattern.web tests require a working internet connection 
# and API license keys (see pattern.web.api.py) for Google and Yahoo API's.

Travis is just running python -m unittest discover -s test right now.

waylonflinn avatar Aug 17 '14 21:08 waylonflinn

Ran futurize on the codebase. Here are some preliminary findings:

  1. Some of the bundled dependencies appear to have already been futurized (they contain from __future__ import and from future import statements). Now that we have pip and virtualenv does it make sense to unbundle these?
  2. Unicode is used extensively throughout the codebase. I used from __future__ import unicode_literals in several places (mostly for raw string literals), but this should probably be handled more carefully in the long term.
  3. Does it make sense to replace web.json.encoder with the standard library module? There was a section starting with the comment ## HACK: hand-optimized bytecode; turn globals into locals that I wasn't sure how to deal with and had to comment out.

I'm a bit new to python, so any feedback is appreciated. This is a beautiful library and I'd love to see it get the unicode love from python 3.

waylonflinn avatar Aug 19 '14 15:08 waylonflinn

My 2cents:

  • from __future__ and from future are perfectly fine.
  • there may be a reason for the byte hack (is it faster than std library?)
  • mysql tests are fine, you have to set up the db with travis (one line in .travis)
  • travis internet tests also fine, although they may fail sometimes... worry about that later (when/if it becomes a problem e.g. decorator to skip test if there is connection exception or something).

Not sure what to do about API keys, was wondering what other modules e.g. pandas did for those parts... IIRC there may be keys you can use for testing of clipped results...

Perhaps it makes sense to create a PR for this and comment there, then you can comment on specific bits of code :) ... first pass tests then make pretty

hayd avatar Aug 28 '14 01:08 hayd

There's an "official" fork of Pattern with the specific aim of making it compatible with Python 3: https://github.com/pattern3

The wiki has some more information: https://github.com/pattern3/pattern/wiki

The compatibility update is supported by a grant from the Python Software Foundation. This money is to be divided among contributors. You can read the grant proposal here: http://www.clips.ua.ac.be/media/Pattern-3-grant-proposal.pdf

The fork is initiated by myself, Waylon Flinn and David Branner. Everyone (Peter & hayd?) is welcome to join as admin of the project. As admin, you'll be able to edit anything so feel free to take initiative! (we do encourage pull requests, so we can keep track of who did what)

tom-de-smedt avatar Oct 28 '14 00:10 tom-de-smedt

Happy to help with this, however when I tried (and trying again just now) running the tests I get a load of exceptions (python 2.7). I suspect this is just initial set up on my machine...

What do I need installed / setup to run the test suite (locally)?

Assuming fresh python install (or env) the following is failing:

git clone ...
cd pattern
python setup.py install  # this *ought* to install dependencies, but I don't think it does
nosetests  # this should sniff out and run all the tests, and does.

See to the travis run in the above fork: https://travis-ci.org/pinleague/pattern/builds/32799385 (this is the kind of thing that's failing though that's a couple of months old).

hayd avatar Oct 28 '14 01:10 hayd

Hi Andy,

My knowledge of Travis is zero, but different people including yourself have suggested it as a first step so I will examine it more closely. Looking at the output of the link you provided, these look like typical Python 2 vs 3 errors, e.g., using print stuff instead of print(stuff) and except Exception, e instead of except Exception as e. These are easy to fix, I previously used regular expressions to update them in the source code, but not yet in the unit tests. I'll look at updating the unit tests and push it to pattern3.

Best, Tom

tom-de-smedt avatar Oct 28 '14 02:10 tom-de-smedt

@tom-de-smedt Lots of stuff to migrated to python 3 but this can really only done with confidence once tests pass (and at the moment I can't get them passing either locally or on travis on python 2.7!!!).

At the moment they (the python 2.7 tests) fail with errors from the bottom of this page: https://travis-ci.org/pinleague/pattern/jobs/32799386. Any ideas why?

hayd avatar Oct 28 '14 02:10 hayd

Hi Tom,

as I wrote at the beginning of this year, I'm still interested in contributing to pattern. However, I have not started yet because I didn't really know where to start. But now there exists a concrete plan and I would like to be part of it. I haven't written Python code for more than a year now but it should be easy for me to get into it again (I wrote a lot of Python code during my studies and I like the language very much). Last but not least, I have been out of the computational linguistics area since I started my current job a year ago, but it would be great to deal with that stuff again.

Some things are not yet clear to me:

  1. You wrote that the fork should be made compatible with Python 3.3 but 3.4 has been released already. Shouldn't it be compatible with 3.4 then?
  2. In the fork's wiki you wrote that the fork should be made compatible with both Python 2.7 and 3.3. But what's the point of creating a fork explicitly named pattern3 if it should still support Python 2.7? In my opinion, we can provide for a much cleaner and optimized code base if we completely drop 2.7 support. Then, the usage of libraries such as six would become obsolete. Of course, the downside of this approach is to maintain two separate code bases.

I cannot tell you yet which module I would prefer to work on. First, I need to take a look at the code again. I'm not sure though whether it's a good idea to have a lot of admins for the fork. Working with pull requests is much better anyway due to the reasons you mentioned.

pemistahl avatar Oct 28 '14 09:10 pemistahl

This was partially my misunderstanding (!), just running nosetests ran the abstract test methods, which fail (at least that's part of it). cleaning these classes is probably a good thing to do anyways (they are in an "interesting" style... e.g. IMO the suite functions should go), I've cleaned up a little...

I had to capture a few actual test failures and some HTTP403Forbidden and HTTP404NotFounds. There's also a couple of proper errors (in python 2), which for now I'm skipping those tests, but they really need looking at, I've labelled them FIXME in my branch (should I PR to pattern3 or here once passing?)...

As I said above, it ~~worth making~~ necessary that these tests pass reliably in python 2 before even attempting to migrate to python 3 (otherwise it's shooting in the dark). That said, I think the issues I've found (and labelled FIXME) are minor (or at least I'm hopeful that's the case if someone can look at them who understands the codebase!).

See https://github.com/hayd/pattern/compare/c5d9c2358...ce1fe8103ccb (and on travis https://travis-ci.org/hayd/pattern/builds/39245044, unfortunately not quite passing python 2.6 and 2.7, I may have to skip/fix a couple more? Some tests seem flaky - especially those that compare e.g. to 0.771!).

Note1: This allows the test suite to be run by simply calling nosetests (or py.test).

Note2: I'm skipping the mysql tests atm, but that's no biggie to fix just an install in the yml (our objective is for no tests to be skipped on travis), the others are more important, but I'm afraid I need a patterns expert to look at the FIXMEs!!


Just to clarify the objectives here:

  • skip tests which fail (on python 2) and label them FIXME - this is mostly done
  • have travis running successfully (with some skips) on python 2
  • remove the skips (by fixing the bugs (?) for the FIXMEs)
  • once all tests are running on travis (and locally on tox), migration can begin safely

hayd avatar Oct 28 '14 10:10 hayd

To answer @pemistahl I don't think going fully py3 (and dropping support for py27 is (Edit: NOT) a good option for a library... for the next decade!). I would like to see a shared code base and drop support for python <= 2.5 (nearly every library is dropping python 2.5 support).

I'd really like to see pattern3 (once ready) merge upstream into pattern.

hayd avatar Oct 28 '14 10:10 hayd

@hayd OK, I get your point. I'm okay with that. It just reminds me again of how unhappy I am about the Python 3.* transition in general across the Python community.

Another question @tom-de-smedt : If working with pull requests is the preferred way for contribution, then why did you create the pattern3 fork? Anyone who wants to contribute would create their own fork anyway. Wouldn't it be sufficient to simply create a branch here in the main repo for this purpose?

pemistahl avatar Oct 28 '14 19:10 pemistahl

I've submitted a couple of PRs to the pattern3 branch, I think it makes sense to fix that up then merge back here (it's going to be easier to keep track of things if they are in separate repos, separate issues/PRs etc). I would strongly recommend downing-tools for a short-while (here on clips/pattern) - hopefully for only for a few weeks, and concentrate on the pattern3 branch/repo.

I'm "somewhat hopeful" it's not a massive job (famous last words). Once the python3 imports are working it should be clearer where the hit list is going to be (I suspect the toughest are the str/bytes handling).

hayd avatar Oct 29 '14 01:10 hayd

Just to update those following at home, last night I got python 3 running all tests without syntax or import errors (of course, half those tests are failing), python 2 is still passing all the tests (except those tests which failed before migration which are skipped).

https://github.com/pattern3/pattern/pull/6

(It did require ripping out the bundled (vendorized) packages and making them dependancies - I think this is a good idea anyway... so, more "home-testing" in python 2 may be a good idea before this update is merged back clip/pattern? esp. where there is poor coverage.)

This means there is a more obvious hitlist of things to do. For those who want to help I recommend (once this is merged), attempting to make all the tests pass on specific testing files you're interested in (e.g. for database):

$ nosetests test/test_db.py
$ nosetests test/test_db.py:TestClass
$ nosetests test/test_db.py:TestClass.test_method

$ nosetests test/test_db.py --pdb --pdb-fail  # drop in when there's a failure/exception

A more complete todo list issue: https://github.com/pattern3/pattern/issues/5

I haven't really thought about how six fits here, IMO if it makes fixing a test easier then use it ?

hayd avatar Oct 30 '14 21:10 hayd

Hello,

I'm looking forward to use Pattern with Python 3, because my work is written in it. I'm kind of confused with current state of Python 3 support. This package is not installable (at least, not through pip - I'm getting Python 2 errors) and and the pattern3 doesn't contain all the code base (at first sight).

By the way, Python 3 is getting more and more focus today and it's very good idea to follow this trend. You use a lot of packages, somehow embedded which is definitely not good idea for the future (e.g. BeautifulSoup_v3.2.1 is not supported for years).

hnykda avatar Dec 09 '14 22:12 hnykda

@kotrfa pattern3/pattern isn't on pip yet (so not installable), the tests aren't passing for python 3 either so it's not ready for release yet - though quite a bit of work has been done. I think the plan is for this fork to become the pattern on pip (at least that's my understanding), and it'll support both python 2 and 3.

In pattern3/pattern I've ripped out a load of the vendorised deps (which is perhaps why it looks like the code base is so different), for example beautiful soup. The tests from clips/pattern are still all there and all pass (in python 2), so nothing was removed in this process (I claim).

If you'd like to help out, which would be fantastic, please clone pattern3/pattern and see if you can help with anything in the todo list (maybe pick a test file and get it passing in both python 2 and 3, perhaps the section you need in your work?). I have a few of the areas of the codebase passing already (in both python 2 and 3), IMO it's not a huge amount of work to go :) mostly fiddly unicode stuff, then we can get it out on pip...

hayd avatar Dec 09 '14 23:12 hayd

Hello,

yeah - I was speaking about installing this fork, not Pattern3, which is, as you said, not available on pip.

I don't really need any part of pattern currently - my work is almost done and I've found Pattern to late, unfortunately. Nevertheless, maybe I could replace some parts of my current code using Pattern and simplify it. In that case, I would definitely like to help. But it doesn't seem likely I'll do it in following weeks, since end of semester is coming.

You have done quite a lot of amazing job by the way, thank you!

hnykda avatar Dec 10 '14 07:12 hnykda

FYI all, I did a little the last couple of days, now test_db and test_web are the only remaining py3 failing tests files (also test_examples, but that's IMO a special case). I don't think they should be too bad to fix... e.g. main things

  • the web stuff has one infinite loop (when crawling), which I'm not sure how to debug (!)
  • the db stuff complains about already created tables (in the tests), hopefully this won't be too bad

Surprisingly these are py3 only failures (the py2 still passes)...

That said, there are some hacks - especially the unicode workflow - which could be cleaned up.

Edit: Too hasty in victory, I've nearly got vector working https://travis-ci.org/hayd/pattern/jobs/43751620

hayd avatar Dec 11 '14 19:12 hayd

Thanks for the information! It is really promising. :+1:

hnykda avatar Dec 11 '14 19:12 hnykda

@tom-de-smedt actually the vector thing is a little weird, it looks like that vector tests fails about 50% of the time on python 3 although it passes all the time on python 2; from running the test 10 times on both. In a way it's good that I think we're into a place where expertise is needed! :) see https://github.com/pattern3/pattern/pull/17

hayd avatar Dec 12 '14 02:12 hayd

+1 for Python 3 support.

I realize the need to support a mature, powerful, and loyal community of legacy Python users, but Python 3 is only going to get more relevant with time, not less.

More importantly, Python 3 is just better. Its standard library organization is much cleaner, its syntax is more readable, and in many common cases it performs significantly better than Python 2 (speed and/or memory footprint).

That said, it’s often tricker to port to Python 3 than it “feels” like it should be. For a while, six has helped make this a little easier, but it only went so far.

To make the transition as painless as possible, I strongly recommend the Python-Future package. It is way more powerful than six; it has tools focused on automating as much of the transition as possible; and it has truly excellent documentation.

I believe it was mentioned earlier in this thread, but I just wanted to reiterate its awesomeness for anyone that might have missed it. Seriously—just browsing its documentation can evoke the inspiration to transition to a 2-3 compatible codebase.


I haven’t used Pattern yet, but it also has excellent documentation (great job!). Unfortunately, my current research is in Python 3. That’s how I found my way to this page. I hope Pattern gets to Python 3 soon!

Keep up the excellent work, and May The Source™ Be With You!

Zearin avatar Jul 14 '15 13:07 Zearin

@Zearin I used future to do the majority of the heavy lifting in the python 3 port, see the pattern3 repo. Please do try it out.

hayd avatar Jul 14 '15 15:07 hayd

How could you define the "state" of the project for porting Pattern into Python 3?

I used two years ago for Python 2.7 and it was awesome, now I'm going to work with Python 3 and I would love to use it (Pattern) again!

Thanks!

MarcosGinel avatar Nov 16 '15 02:11 MarcosGinel

Greetings, we came across this from here, and I just noticed that while a lot of the build looks stable, support for Python 3.3 seems not to be working? At least that is how I would interpret the Travis CI page. Thanks.

legel avatar Aug 11 '16 15:08 legel

I just quickly tested it on Python 3.4, by creating a conda virtual environment with python 3.4 (using conda create -n python3 python=3.4 anaconda) and running the following:

    git clone https://github.com/pattern3/pattern.git
    cd pattern
    python setup.py install

However, unfortunately, upon testing, text parsing functions at least for the web module do not seem to work... In the test folder I ran python test_web.py which is what we are using, and the following is a sample of what I got back...

======================================================================
FAIL: test_plaintext (__main__.TestPlaintext)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_web.py", line 455, in test_plaintext
    u"<a href=\"http://www.domain.com\">link</a>\n\n* item1 xxx\n* item2")
AssertionError: 'tags amp; things\n\ntitle1\n\ntitle2\n\nparagr[93 chars]tem2' != 'tags & things\n\ntitle1\n\ntitle2\n\nparagraph[76 chars]tem2'
- tags amp; things

======================================================================
FAIL: test_encode_utf8 (__main__.TestUnicode)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_web.py", line 53, in test_encode_utf8
    self.assertTrue(isinstance(web.encode_utf8(s), str))
AssertionError: False is not true

----------------------------------------------------------------------
Ran 91 tests in 0.500s

FAILED (failures=4, errors=40, skipped=1)                                       357d  ⍉

(python3)

legel avatar Aug 11 '16 16:08 legel

I used pattern with Python 2 before and I loved it, but now I switched to Python 3. What is the status of porting Pattern to Python 3?

davidhorat avatar Sep 08 '16 23:09 davidhorat

It is astonishing to me that someone hasn't completed a full update to get Python 3 version of pattern working. I guess I will fork pattern3 and try to finish it myself.

james-see avatar Sep 19 '16 02:09 james-see

never mind. too many recursion errors, encoding errors, etc. someone who knows the actual codebase should really update it.

james-see avatar Sep 19 '16 03:09 james-see