mysqlclient icon indicating copy to clipboard operation
mysqlclient copied to clipboard

Implement #372: Support session state tracking for GTIDs, schema, variables, TX state

Open beamerblvd opened this issue 5 years ago • 11 comments

Add several methods and two new types for session state tracking:

  • session_state_changed indicates whether session state has changed
  • get_session_gtids returns a list of GTIDs created by the just-committed transaction, if applicable
  • get_session_new_schema returns the new schema, if the schema changed
  • get_session_changed_variables returns any changed session variables, if any
  • get_session_transaction_state returns the transaction state, if any and available
  • get_session_transaction_characteristics returns the transaction characteristics, if any and available

Manual testing output:

In [1]: from MySQLdb import connections 
   ...: connection = connections.Connection(host='127.0.0.1', user='root', port=3306, database='eb')

In [2]: connection.session_state_changed()
Out[2]: True

In [3]: connection.get_session_gtids()
Out[3]: []

In [4]: connection.get_session_new_schema()

In [5]: connection.get_session_changed_variables()
Out[5]: [SessionVariableChange(variable_name=b'autocommit', value=b'OFF')]

In [6]: connection.get_session_transaction_state() 

In [7]: connection.get_session_transaction_characteristics()

In [8]: cursor = connection.cursor()

In [9]: connection.session_state_changed()
Out[9]: True

In [10]: connection.get_session_gtids()
Out[10]: []

In [11]: cursor.execute("SELECT @@global.server_uuid;")
Out[11]: 1

In [12]: connection.session_state_changed()
Out[12]: False

In [13]: connection.get_session_gtids()
Out[13]: []

In [14]: connection.get_session_new_schema()

In [15]: connection.get_session_changed_variables()
Out[15]: []

In [16]: connection.get_session_transaction_state()

In [17]: connection.get_session_transaction_characteristics()

In [18]: cursor.execute("USE eb_history;")
Out[18]: 0

In [19]: connection.session_state_changed()
Out[19]: True

In [20]: connection.get_session_gtids()
Out[20]: []

In [21]: connection.get_session_new_schema()
Out[21]: b'eb_history'

In [22]: connection.get_session_changed_variables()
Out[22]: []

In [23]: connection.get_session_transaction_state()

In [24]: connection.get_session_transaction_characteristics()

In [25]: cursor.execute("USE eb;")
Out[25]: 0

In [26]: connection.session_state_changed()
Out[26]: True

In [27]: connection.get_session_gtids()
Out[27]: []

In [28]: connection.get_session_new_schema()
Out[28]: b'eb'

In [29]: connection.get_session_changed_variables()
Out[29]: []

In [30]: connection.get_session_transaction_state()

In [31]: connection.get_session_transaction_characteristics()

In [32]: cursor.execute("SELECT * FROM nick_test;")
Out[32]: 5

In [33]: connection.session_state_changed()
Out[33]: False

In [34]: connection.get_session_gtids()
Out[34]: []

In [35]: connection.get_session_new_schema()

In [36]: connection.get_session_changed_variables()
Out[36]: []

In [37]: connection.get_session_transaction_state()
Out[37]: TransactionState(transaction_active=True, transaction_explicit=False, nontransactional_tables_read=False, transactional_tables_read=True, unsafe_writes=False, safe_writes=False, unsafe_statements=False, result_sent=True, locked_tables=False)

In [38]: connection.get_session_transaction_characteristics()

In [40]: cursor.execute("SET @@session.time_zone = 'America/Los_Angeles';")
Out[40]: 0

In [41]: connection.session_state_changed()
Out[41]: True

In [42]: connection.get_session_gtids()
Out[42]: []

In [43]: connection.get_session_new_schema()

In [44]: connection.get_session_changed_variables()
Out[44]: [SessionVariableChange(variable_name=b'time_zone', value=b'America/Los_Angeles')]

In [45]: connection.get_session_transaction_state()

In [46]: connection.get_session_transaction_characteristics()

In [48]: cursor.execute("COMMIT;")
Out[48]: 0

In [49]: cursor.execute("SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; START TRANSACTION READ WRITE;") 
Out[49]: 0

In [50]: cursor.execute("SELECT * FROM nick_test;")
Out[50]: 5

In [51]: cursor.execute("INSERT INTO nick_test (name) VALUES ('hello 6');")
Out[51]: 1

In [52]: connection.session_state_changed()
Out[52]: False

In [53]: connection.get_session_gtids()
Out[53]: []

In [54]: connection.get_session_new_schema()

In [55]: connection.get_session_changed_variables()
Out[55]: []

In [56]: connection.get_session_transaction_state()
Out[56]: TransactionState(transaction_active=True, transaction_explicit=True, nontransactional_tables_read=False, transactional_tables_read=True, unsafe_writes=False, safe_writes=True, unsafe_statements=False, result_sent=True, locked_tables=False)

In [57]: connection.get_session_transaction_characteristics()

In [58]: cursor.execute("COMMIT;")
Out[58]: 0

In [59]: connection.session_state_changed()
Out[59]: False

In [60]: connection.get_session_gtids()
Out[60]: [b'ef77999f-4f6d-11ea-b0e8-0242ac120002:7']

In [61]: connection.get_session_new_schema()

In [62]: connection.get_session_changed_variables()
Out[62]: []

In [63]: connection.get_session_transaction_state()
Out[63]: TransactionState(transaction_active=False, transaction_explicit=False, nontransactional_tables_read=False, transactional_tables_read=False, unsafe_writes=False, safe_writes=False, unsafe_statements=False, result_sent=False, locked_tables=False)

In [64]: connection.get_session_transaction_characteristics()

beamerblvd avatar Jul 27 '19 02:07 beamerblvd

Please support other than SESSION_TRACK_GTIDS too.

methane avatar Nov 18 '19 12:11 methane

Please support other than SESSION_TRACK_GTIDS too.

Hello, @methane. For whatever reason, I did not receive a notification of your reply. I will reload this page more often so that I don't miss any more notifications. Apologies for that.

I want to make sure I understand your comment correctly. First, are you a contributor on this project, asking me to make a change before you'll accept this for merging? If so:

  • Can you comment as to the rest of the code? Is the pull request otherwise completely satisfactory for merging, you just want to see more? You don't see any errors or problems that need to be fixed?
  • My only concern is that I don't understand the other parts of session state tracking nearly as well as I understand the GTID parts (because I understand those only by necessity). I wonder if it would be better for someone with an actual use case for these other features, who likewise understands them better, to expand upon this initial implementation after it is merged.

beamerblvd avatar Feb 13 '20 22:02 beamerblvd

First, are you a contributor on this project, asking me to make a change before you'll accept this for merging?

Yes, I am the only active maintainer of this project.

* Can you comment as to the rest of the code? Is the pull request otherwise completely satisfactory for merging, you just want to see more? You don't see any errors or problems that need to be fixed?
  • Method name should be changed to get_session_track_info.
  • I am not sure that returned data is always UTF-8. Please use PyBytes_FromStringAndSize.
* My only concern is that I don't understand the other parts of session state tracking nearly as well as I understand the GTID parts (because I understand those only by necessity). I wonder if it would be better for someone with an actual use case for these other features, who likewise understands them better, to expand upon this initial implementation after it is merged.

I have no plan to use this feature. My only concern is exposing the underlying API to users. This is the low level library. User can implement higher-level API using this low-level API.

methane avatar Feb 14 '20 08:02 methane

Okay. I'll make these changes today/this weekend and update the pull request this weekend. Thanks for the review! Less back-and-forth this way. :-)

beamerblvd avatar Feb 14 '20 20:02 beamerblvd

Codecov Report

Merging #373 into master will increase coverage by 0.55%. The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #373      +/-   ##
==========================================
+ Coverage   86.59%   87.15%   +0.55%     
==========================================
  Files          12       13       +1     
  Lines        1537     1557      +20     
==========================================
+ Hits         1331     1357      +26     
+ Misses        206      200       -6
Impacted Files Coverage Δ
MySQLdb/constants/CLIENT.py 100% <100%> (ø) :arrow_up:
MySQLdb/times.py 100% <0%> (ø) :arrow_up:
MySQLdb/_exceptions.py 100% <0%> (ø) :arrow_up:
MySQLdb/constants/CR.py 0% <0%> (ø) :arrow_up:
MySQLdb/compat.py 100% <0%> (ø)
MySQLdb/constants/ER.py 97.17% <0%> (+0.11%) :arrow_up:
MySQLdb/converters.py 82.85% <0%> (+0.5%) :arrow_up:
MySQLdb/cursors.py 81.46% <0%> (+0.53%) :arrow_up:
MySQLdb/connections.py 82.99% <0%> (+1.74%) :arrow_up:
MySQLdb/__init__.py 84.78% <0%> (+2.96%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update c67dbd4...be77e9d. Read the comment docs.

codecov[bot] avatar Feb 17 '20 03:02 codecov[bot]

@methane: I have updated the pull request to support all facets of session tracking and to use PyBytes_FromStringAndSize instead of assuming unicode.

beamerblvd avatar Feb 17 '20 03:02 beamerblvd

Oh, I didn't mean adding bunch of APIs. I meant:

infos = conn.get_session_track_info(MySQLdb.constants.session_state_type.SESSION_TRACK_GTIDS)

Only one method is needed.

methane avatar Feb 17 '20 06:02 methane

Oh, I didn't mean adding bunch of APIs. I meant:

infos = conn.get_session_track_info(MySQLdb.constants.session_state_type.SESSION_TRACK_GTIDS)

Only one method is needed.

Well, I looked at doing it that way, but the problem is that the session track info you get is all so different. Some of it is in different data formats than others. For some of it, you call mysql_session_track_get_first and then don't call anything else. For others, you call mysql_session_track_get_first and then mysql_session_track_get_next until the latter stops returning anything. For others, you call mysql_session_track_get_first and then raise an error if mysql_session_track_get_next doesn't return something the first time but then stop looping when mysql_session_track_get_next doesn't return something the second time.

In my opinion, that's all just two different to put in one method called get_session_track_info. We would basically have to just add wrappers around mysql_session_track_get_first and mysql_session_track_get_next and let the library callers do the heavy lifting of determining which ones to call and when and how, and that is just too low-level to put that responsibility in Python code, from my perspective.

There is another option that would be a fairly easy change and wouldn't muddy up the interface for Connection: I could add a single method Connection.get_session_state() that returns a native ConnectionSessionState object, and then all the new methods I've added could belong to ConnectionSessionState instead of Connection.

beamerblvd avatar Feb 17 '20 13:02 beamerblvd

I'm sorry but please notice I don't want to review and maintain much code I never use. I want to minimize the maintenance cost of this project.

For some of it, you call mysql_session_track_get_first and then don't call anything else.

What's wrong about assert len(res) == 1?

For others, you call mysql_session_track_get_first and then mysql_session_track_get_next until the latter stops returning anything.

It looks no problem.

For others, you call mysql_session_track_get_first and then raise an error if mysql_session_track_get_next doesn't return something the first time but then stop looping when mysql_session_track_get_next doesn't return something the second time.

What is the problem of just returning list?

methane avatar Feb 19 '20 06:02 methane

I'm sorry but please notice I don't want to review and maintain much code I never use. I want to minimize the maintenance cost of this project.

I can understand that, but I hope you can likewise keep in mind that literally thousands of other projects use this library, and I'm trying to contribute the best-quality enhancement I can, not just the barest-metal enhancement I can.

I am happily volunteering to maintain this specific contribution and fix any bugs that arise from it. I know maintaining open source software is hard, and I would not contribute this and then abandon it and expect you to maintain it in perpetuity. That's the spirit of open source software. :-)

For some of it, you call mysql_session_track_get_first and then don't call anything else.

What's wrong about assert len(res) == 1?

For others, you call mysql_session_track_get_first and then mysql_session_track_get_next until the latter stops returning anything.

It looks no problem.

For others, you call mysql_session_track_get_first and then raise an error if mysql_session_track_get_next doesn't return something the first time but then stop looping when mysql_session_track_get_next doesn't return something the second time.

What is the problem of just returning list?

It is a problem because it's not just about returning a list. It's about knowing whether to call just _first or also call _next and how many times to call _next, and whether a failed call to _next means an error or just means no-more-data, all of which depends on which type of session track info you're attempting to return. So even if I had a single method that just returned a list as you suggest (which is possible), that method would still have to have dedicated logic for each of the session track info types (if GTID do this, if VARIABLE do that, if TRANSACTION_STATE do this other thing, etc.). It could not behave one way that would work for all types.

beamerblvd avatar Feb 19 '20 23:02 beamerblvd

I can understand that, but I hope you can likewise keep in mind that literally thousands of other projects use this library,

Of course. And you can keep in mind that most of the thousands of projects using this library doesn't need this feature. Reducing maintenance cost is much better for much users. Note that this project and PyMySQL had lost maintainers.

and I'm trying to contribute the best-quality enhancement I can, not just the barest-metal enhancement I can.

Except DB-API 2.0 interface, I want to keep this library is "binding" of the libmysqlclient. So "barest-metal" is OK to me.

It is a problem because it's not just about returning a list. It's about knowing whether to call just _first or also call _next and how many times to call _next, and whether a failed call to _next means an error or just means no-more-data, all of which depends on which type of session track info you're attempting to return.

Caller must know the type of session track info. What's wrong about calling _next when just calling _first is enough? Doesn't it create just one-length list?

methane avatar Feb 20 '20 01:02 methane