PythonScript icon indicating copy to clipboard operation
PythonScript copied to clipboard

Move to Python 3.8

Open dhowland opened this issue 7 years ago • 38 comments
trafficstars

Python 2.7 is about to become obsolete. Could the scripting host be upgraded?

dhowland avatar Jan 05 '18 03:01 dhowland

See https://github.com/sergey-shandar/getboost/issues/43, currently no nuget packet seems to be available for boost python which is correctly build against python 3.

Initial test branch created at https://github.com/chcg/PythonScript/tree/python36_test.

See also http://python3porting.com/cextensions.html

chcg avatar May 27 '18 06:05 chcg

boostorg/python#129 is closed and apparently fixed as of April. That should fix sergey-shandar/getboost#43.

Edit: sergey-shandar/getboost#43 is closed now.

jacktose avatar Aug 09 '18 17:08 jacktose

Ideas on this task from Claudia:

Hast du noch ein paar nette Ideen zum Umstieg auf Python 3? Mit Boost 1.68 sollte der möglich sein. Ich habe scintilla 4.0.1 mit python3 getestet und bis dato ist mir nur aufgefallen das Python-Text nach oder von bytes konvertiert werden muß. Heisst, mit Python Version2 war ein String standardmäßig ein Byte-Object, mit Python3 ist dies nun ein Unicode-Object, wenn also ein Script nun etwas wie editor.addText('some text') macht, wird dies nicht funktionieren da es eigentlich editor.addText(b'some text')heißen müsste. Das ist aber unschön aus Sicht des Users so dass ein automatisches prüfen und konvertieren nötig wäre. Gleiches gilt für die PS-Console.

Oder sonst noch Gedanken, was geändert/verbessert werden könnte? Man könnte ein Feature eibauen, welches dynamische generierte dialog erlaubt. Siehe hierzu https://blogs.msdn.microsoft.com/oldnewthing/20050429-00/?p=35743 Wenn man das auf static, radio/checkbox und normale button sowie ein text control beschränkt sollte es auch mit überschaubaren Aufwand realisierbar sein. Die Idee war in etwa sowas:

editor.customDialog('Custom Dialog Title', (x,y,w,h), [('static', 'Convert document', x,y,w,h), ('radio', 'ALL TO UPPERCASE', x,y, w,h), ('radio', 'all to lowercase', x,y, w,h), ('button', 'OK - let's do it', x,y, w,h), ('button', 'Ignore and close', x,y, w,h),])

Das skript übergibt also den Titel, x,y Koordinaten sowie die Breite und Höhe des Dialogssowie eine Liste von dynamisch generierten Controls. Deren x,y sollten relative zum Dialog sein. Also ein x,y = 0 von static wäre der erste linke obere Punkt innerhalb des Dialogs und nicht auf dem Monitor.

Bezüglich Beispielskripten, die werde ich Dir noch zukommen lassen werde sie nur ein bisschen überarbeiten, heißt, vorallem, dokumentieren.

chcg avatar Oct 03 '18 18:10 chcg

Great to see that moving to py3 is going to be done. I know it is early alpha state but should I be able to open the console already? Or running some very basic scripts like notepad.new()? When it comes to testing or migrating the scripts to python3 I can offer my help.

Ekopalypse avatar May 07 '20 19:05 Ekopalypse

Most of the unit tests are already running. I just added the correction to use the python modules provided with the plugin installation. So I think you could already start migrating the scripts.

The major drawback I think is just that non UTF-8 files are not working correctly yet as could be seen with still failing https://github.com/bruderstein/PythonScript/blob/master/PythonScript/python_tests/tests/ReplaceAnsiTestCase.py . Text is written as b'sample text'. This is happening due to the change that for python 3 the internal representation of a string is utf-8 now and not a bytearray. So it is not yet clear to me, if it is easier to convert unicode back to byte array and have the same handling within pythonscript like before with python2.7 or convert the bytearrays coming e.g. from scintilla to unicode which propably causes some problems how to know the encoding.

chcg avatar May 09 '20 06:05 chcg

Yes, this is a area which needs to be addressed. What I found is needed is something like this

    cdef inline _encode(self, unicode text):
        return text.encode('mbcs' if self.call(self.sci_pointer, SCI_GETCODEPAGE, 0, 0) == 0 else 'utf-8')


    cdef inline unicode _decode(self, char* text):
        return text.decode('mbcs' if self.call(self.sci_pointer, SCI_GETCODEPAGE, 0, 0) == 0 else 'utf-8')


    cdef inline unicode _decode_const(self, const char* text):
        return text.decode('mbcs' if self.call(self.sci_pointer, SCI_GETCODEPAGE, 0, 0) == 0 else 'utf-8')


    cdef inline unicode _decode_with_length(self, const char* text, size_t length):
        return text[:length].decode('mbcs' if self.call(self.sci_pointer, SCI_GETCODEPAGE, 0, 0) == 0 else 'utf-8')

the encode returns a bytestring whereas the decode, obviously, returns unicode. That would mean, that each scintilla call which expects an char array would need such an conversion. As an example the usage from appendText from the point of a script

editor.appendText('Hello World') <-- I would suggest using this instead of forcing users to do editor.appendText(b'Hello World')

but that would mean that the appendText call needs to do something like

cpdef void appendText(self, unicode text):
        ''' Append a string to the end of the document without changing the selection. '''
        encoded = self._encode(text)
        self.call(<sptr_t>self.sci_pointer, SCI_APPENDTEXT, <uptr_t>(len(encoded)), <sptr_t><char*>encoded)

Yes, this is a little overhead but using b'' strings doesn't feel natural for python3 developers

and of course getText is something like

    cpdef unicode getText(self):
        ''' Retrieve all the text in the document. Returns number of characters retrieved. Result is NUL-terminated. '''
        buffer_size = self.call(<sptr_t>self.sci_pointer, SCI_GETTEXT, 0, 0)
        if buffer_size:
            buffer = bytes(buffer_size+1)
            self.call(<sptr_t>self.sci_pointer, SCI_GETTEXT, <uptr_t>buffer_size, <sptr_t><char*>buffer)
            return self._decode(buffer[:-1])
        return ''

Unfortunately I don't know if boost::python provides a way to make this work easily.

Ekopalypse avatar May 09 '20 12:05 Ekopalypse

I tried the latest build with a fresh npp 7.8.6 and it crashed. Then I copied, from the old 2.7 installation, the PythonScript related files and directories from .\plugins\Config to the new npp and it still crashed. Finally I renamed lib from .\plugins\PythonScript to no_lib and copied an embeddable python382 into .\plugins\PythonScript and was able to start npp.

image

The string(unicode/bytestring) conversions seems already to work.

Concerning the crash I assume something is missing in lib. I will check this out. I'm going to work on the provided example scripts.

Ekopalypse avatar May 09 '20 15:05 Ekopalypse

@Ekopalypse Did you use 3.0.0 or already 3.0.1? 3.0.1 should fix the problem starting with the locally provided python modules from the pythonscript zip file.

chcg avatar May 10 '20 08:05 chcg

What I've described was done with 3.0.1, sorry for not being clear. Btw. most of the samples are done. The remaining are depending on the text encoding conversion. Do you want me to make PR for the fixed ones already or should we wait until the encoding stuff got resolved?

Ekopalypse avatar May 10 '20 10:05 Ekopalypse

@chcg - I don't know why but the first time I tested 3.0.1 it didn't work with the package I've downloaded but today with a clean npp and PS setup (the zipped x86 and x64) everything was ok, I mean from setup point of view. I rechecked the lib directory, the one I renamed to no_lib and found that all of the python extension modules are missing, only the python files are there and most of the directories weren't created. Can't say why this happened but, as already said, today x86 and x64 zip worked.

By the way, with Python3 there is, although not officially supported by python, a way to use pip, pythons package manager, together with an embedded distribution. This could reduce the number of release packages to 2 (or 4 if the two source packages are counted as well), a x86 minimum and x64 minimum package. Let me know if this is something which is worth investigating.

Ekopalypse avatar May 12 '20 16:05 Ekopalypse

@Ekopalypse Good news that 3.0.1 fixed what was expected to be fixed.

  • pip within the provided package would be of interest I think. The only issue I see is that an update of the PythonScript plugin by PluginAdmin probably removes also the modules installed via pip.

  • if you like to provide the corrected scripts already now that would be fine for me. Could you also add a list which scripts still need to be adapted and are now blocked due to the encoding problem.

chcg avatar May 12 '20 20:05 chcg

@chcg I will do a final test with all samples using utf8 and ansi encoded buffers, just to be sure I haven't overlooked something. Do you think it is needed to test utf-8 bom and ucs2 ... as well? After that I will create the list of the failing sample scripts.

Concerning pip, do you see any chance that I can simulate the PluginAdmin workflow?

Ekopalypse avatar May 14 '20 10:05 Ekopalypse

@Ekopalypse it's possible to simulate the pluginAdmin workflow. You have to build notepad++ in debug mode. Then you can customize the json file to point to a temporarily github release
json file:

<NPP_INST_DIR>\plugins\Config\nppPluginList.json

More info here: https://npp-user-manual.org/docs/plugins/#plugins-admin

cmeriaux avatar May 14 '20 10:05 cmeriaux

By the way, your work on this feature request is awesome and very much appreciated. It's not possible for my limited software skill to help on the dev, but I'm volonture to be beta tester.

Good job guys

cmeriaux avatar May 14 '20 10:05 cmeriaux

See https://github.com/notepad-plus-plus/notepad-plus-plus/blob/ade01204c8800c6117bdc72eb476de4226adf4fa/PowerEditor/src/WinControls/PluginsAdmin/pluginsAdmin.cpp#L455

and

https://github.com/notepad-plus-plus/wingup/blob/master/src/winmain.cpp#L957

what happens on a update. So from my understanding the existing dir is moved to backup, replaced by the zip and on success the backup is deleted. This would make pip data disappear on each update, if it is installed in the pythonscript plugin dir.

chcg avatar May 14 '20 11:05 chcg

So the obvious solution would be to have pip and all installed modules in the plugins\config directory, which means site.py module needs to be changed/adapted I guess. Let me think about it.

Ekopalypse avatar May 14 '20 12:05 Ekopalypse

My current state is the following:

  • all scripts are python3 ready
  • most scripts will work with unicode and ansi files if those use the chars from the ascii table only. exception is HideLines which does have a problem with the markers

image

whereas it should look like this

image

That means, that all scripts, except of Disable Virtual Space, Enable Virtual Space, GotoLineCol_Barebones, InsertRuler and StartWithLocalPthon, do have the issues that

with utf8 enocded files they might run into an offset error

image

Note, the rectangle should include the ending bracket.

and with ansi files they throw an UnicodeDecodeError: 'utf-8' codec can't decode byte

beside this, the following has been observed.

CTags Based Autocompletion utf8 encoded tags file: utf8: completion list shows different encoding, offset issue ansi: completion list shows different encoding, offset issue ansi encoded tags file: utf8: offset issue ansi: completion list shows different encoding, offset issue

Formatter utf8: last line wrong ansi: decoding error

MultiEdit
utf8: ok ansi: decoding error

Multiples_SR
utf8: doesn't work when using special chars ansi: doesn't work when using special chars

RegexTester
utf8: ok ansi: doesn't find special ansi characters

GotoLineCol
utf8: offset error ansi: special char cannot be detected correctly

Ekopalypse avatar May 16 '20 22:05 Ekopalypse

I'm trying to find out what the issue with the HideLines and the Formatter scripts, corrupts the last line to format, are because these errors doesn't seem to be encoding related.

Ekopalypse avatar May 16 '20 22:05 Ekopalypse

Issue with Formatter script has been identified. Here and here

it must be

endpos = editor.getLineEndPosition(...)

Ekopalypse avatar May 17 '20 12:05 Ekopalypse

I assume the problem with HideLines is not within the script but within the getStringFromObject function. From what I understand it returns the raw string, as it is typed, instead of the bytes.

Ekopalypse avatar May 24 '20 11:05 Ekopalypse

I guess a picture explains it better than what I wrote image

Ekopalypse avatar May 24 '20 18:05 Ekopalypse

By changing the code for getStringFromObject to

std::string ScintillaWrapper::getStringFromObject(boost::python::object o)
{
	std::string raw;
	if (PyUnicode_Check(o.ptr()))
	{
		boost::python::object utf8Text = o.attr("__str__")();
		raw = std::string(boost::python::extract<const char *>(utf8Text));
	}
	else if(PyBytes_Check(o.ptr()))
	{
		raw.assign(PyBytes_AsString(o.ptr()), PyBytes_Size(o.ptr()));
	}
	else
	{
		boost::python::object rawString = o.attr("__str__")();
		raw = std::string(boost::python::extract<const char *>(rawString), _len(rawString));
	}

	return raw;
}

HideLines script works as well as adding byte strings

image

But I'm not sure if this is the way to go as this is working around the boost::python lib, isn't it?

Ekopalypse avatar May 24 '20 19:05 Ekopalypse

I did a similar change:

raw = std::string(PyBytes_AsString(o.ptr()));

for the byte case. The else case is probably not working any more. The issues which I faced with this are happening on reading back data from scintilla and setting the right ANSI encoding. At least it would fix some of the remaining problems. I will add it.

chcg avatar May 26 '20 06:05 chcg

If I understand the documentation correctly, using PyBytes_AsString without a length argument breaks HideLines, because the string would only be returned until the first \0 is received.

The problem with reading text as in getText, getTextRange ... could be solved by using PyUnicode_FromEncodedObject, I guess. The second argument is the encoding.

The question I ask myself is whether this is not already somehow covered in boost::python. I mean, this is something everyone has to deal with. I can't find an example of that. I found some complaints and a [PR] (https://github.com/boostorg/python/pull/54), which is still open, but I don't understand enough to see if this would solve the problem.

By the way, are you trying to solve it for every possible ANSI encoding? I mean, isn't it enough to use ANSI aka what is the current system locale aka what Python calls mbcs?

Update: PyUnicode_FromEncodedObject might not be the right choice because if the encoding is set to, lets say, cp1252 then a script would expect a string like ö to be encoded as b'\xf6' but I assume PyUnicode_FromEncodedObject would return b'\xc3\xb6', the utf8 representation.

Ekopalypse avatar May 26 '20 10:05 Ekopalypse

How can we start using python3x ? These edge cases will not affect me.

pete312 avatar Jul 07 '20 14:07 pete312

@pete312 download a zip from https://github.com/bruderstein/PythonScript/releases/tag/v3.0.1. Depending on your notepad++ configuration you either have to unpack into %APPDATA%\notepad++ or into the installation directory of notepad++.

Ekopalypse avatar Jul 07 '20 14:07 Ekopalypse

why need manual unpack?

download msi and right click - install. it will automatically put files in the correct location, wherever npp needs them? or is there some catch?

Thanks.

Rawat

On 7/7/2020 7:56 PM, Ekopalypse wrote:

@pete312 https://github.com/pete312 download a zip from https://github.com/bruderstein/PythonScript/releases/tag/v3.0.1. Depending on your notepad++ configuration you either have to unpack into %APPDATA%\notepad++ or into the installation directory of notepad++.

vsrawat avatar Jul 07 '20 16:07 vsrawat

@vsrawat I don't know as I don't use the installed version.

Ekopalypse avatar Jul 07 '20 16:07 Ekopalypse