sqlparse icon indicating copy to clipboard operation
sqlparse copied to clipboard

Give indication about position of each token seen

Open ChaosEternal opened this issue 11 years ago • 20 comments

Suggestion is add methods like col and row on class Token , which would make life like sql conversion easier.

ChaosEternal avatar Dec 04 '12 05:12 ChaosEternal

I think I don't understand... column number can be fetch from the position of the identifier inside the IdentifiersList and this number would change if you add or remove them (but it's true that would be able to be added a property to get it easier), but row? What do you want to say with "row"? The order of a statement inside a StatementsList?

piranna avatar Dec 04 '12 06:12 piranna

sorry, let me explain it.

the column number is the position of the token from beginning of it's line. the row number is line number of the token from begining of file.

think about thousands of sql queries, each has thousands of lines, and a tool is wanted to rewrite some parts of the queries and leave other parts intact.

ChaosEternal avatar Dec 04 '12 06:12 ChaosEternal

I still don't get the catch of the rows and columns, as I told you this number can change, and also sqlparse allow to change specific fragments of the SQL queries (and in fact is better to do it in a relatively form). Maybe the functionality that you are requesting correspond to a higher level library?

@andialbrecht, can you be able to give us light on this topic? :-)

piranna avatar Dec 04 '12 08:12 piranna

sorry for disturb , and I wrote some dirty code to explain this. look attachement

the 2.sql is as following:

WITH ATBL AS (SELECT

  • FROM CTBL ) , BTBL AS ( SELECT * FROM DTBL,ETBL -- ) ) SELECT * FROM ATBL,BTBL ; if run the python like this:

sed -e "python convwith2temptable2.py 2.sql |sed 's/$/;/'" 2.sql

it will give us this:

WITH DROP TABLE IF EXISTS ATBL; CREATE TABLE ATBL AS (SELECT

  • FROM CTBL ) ; DROP TABLE IF EXISTS BTBL; CREATE TABLE BTBL AS ( SELECT * FROM DTBL,ETBL -- ) ); SELECT * FROM ATBL,BTBL ;

Thanks.

On Tue, Dec 4, 2012 at 4:19 PM, Jesús Leganés Combarro < [email protected]> wrote:

I still don't get the catch of the rows and columns, as I told you this number can change, and also sqlparse allow to change specific fragments of the SQL queries (and in fact is better to do it in a relatively form). Maybe the functionality that you are requesting correspond to a higher level library?

@andialbrecht https://github.com/andialbrecht, can you be able to give us light on this topic? :-)

— Reply to this email directly or view it on GitHubhttps://github.com/andialbrecht/sqlparse/issues/82#issuecomment-10987939.

ChaosEternal avatar Dec 04 '12 08:12 ChaosEternal

as you mentioned, the row number and column number will change, and so the code is unsafe.

ChaosEternal avatar Dec 04 '12 08:12 ChaosEternal

I could think of another use case: In CrunchyFrog, the db frontend/SQL editor where sqlparse originated from, I need to know where a statement starts and ends to highlight it in the GUI. So col/row attributes on tokens would be helpful here too. Also when jumping to specific parts within a statement ("go to WHERE clause/JOIN definitions" would be a nice feature in the editor).

But as @piranna already noted, the values in the hypothetical cols/rows attributes would only be valid after initial parsing. During initial parsing all whitespaces are preserved so that the lexer should be able to calculate the initial positions (I'd throw char offset in the ring then too)

But as soon as filters are applied those properties could be invalid and we have no easy way to tell if they're still valid. OTOH, maybe that's just a documentation / naming issue...

andialbrecht avatar Dec 04 '12 21:12 andialbrecht

See also #102

andialbrecht avatar Nov 30 '14 06:11 andialbrecht

Throwing another comment here to say I'd like to see this implemented, for a tool I'm working on. It seems pretty basic to have position information in a parser. :)

idank avatar Feb 15 '16 00:02 idank

Pull-request are welcome :-) El 15/2/2016 1:52, "Idan Kamara" [email protected] escribió:

Throwing another comment here to say I'd like to see this implemented, for a tool I'm working on. It seems pretty basic to have position information in a parser. :)

— Reply to this email directly or view it on GitHub https://github.com/andialbrecht/sqlparse/issues/82#issuecomment-184015281 .

piranna avatar Feb 15 '16 03:02 piranna

@idank do you mean absolute position? nth token from the beginning of statement?

vmuriart avatar Jun 12 '16 20:06 vmuriart

I meant absolute position. I ended up with this hack, for those interested:

def calcpositions(root, index=0):
    positions = {}

    if root.is_group():
        startindex = index
        for token in root.tokens:
            nestedpos, index = calcpositions(token, index)
            positions.update(nestedpos)
        positions[root] = (startindex, index)
    else:
        positions[root] = (index, index+len(root.value))
        index += len(root.value)

    return positions, index

It returns a {token: (start, end)} dictionary and should be called with sqlparse.parse(text)[0].

idank avatar Jun 12 '16 20:06 idank

it's around the lines of what I was going to suggest, though I had misunderstood and thought you meant index as position of "whole" token rather than start and end characters.

I modified my would be suggestion but its essentially the same, with the exception of mine skipping the groups and just does the individual tokens.

from __future__ import print_function
import sqlparse


def calcpositions(root, index=0):
    positions = {}

    if root.is_group():
        startindex = index
        for token in root.tokens:
            nestedpos, index = calcpositions(token, index)
            positions.update(nestedpos)
        positions[root] = (startindex, index)
    else:
        positions[root] = (index, index + len(root.value))
        index += len(root.value)

    return positions, index


def calcpositions2(stmt):
    index = 0
    positions = {}
    for token in stmt.flatten():
        positions[token] = (index, index + len(str(token)))
        index += len(str(token))
    return positions

if __name__ == '__main__':
    sql = "CREATE TABLE test();"
    stmt = sqlparse.parse(sql)[0]

    print(calcpositions(stmt))
    print("##############")
    print(calcpositions2(stmt))

({<Identifier 'test' at 0x1E9BC10>: (13, 17), <DDL 'CREATE' at 0x1EA0458>: (0, 6), <Whitespace ' ' at 0x1EA04A8>: (6, 7), <Keyword 'TABLE' at 0x1EA04F8>: (7, 12), <Whitespace ' ' at 0x1EA0548>: (12, 13), <Name 'test' at 0x1EA0598>: (13, 17), <Punctuation '(' at 0x1EA05E8>: (17, 18), <Punctuation ')' at 0x1EA0638>: (18, 19), <Punctuation ';' at 0x1EA0688>: (19, 20), <Parenthesis '()' at 0x1E9BB40>: (17, 19), <Statement 'CREATE...' at 0x1E9BBA8>: (0, 20)}, 20)
##############
{<Punctuation ')' at 0x1EA0638>: (18, 19), <DDL 'CREATE' at 0x1EA0458>: (0, 6), <Punctuation ';' at 0x1EA0688>: (19, 20), <Whitespace ' ' at 0x1EA04A8>: (6, 7), <Keyword 'TABLE' at 0x1EA04F8>: (7, 12), <Whitespace ' ' at 0x1EA0548>: (12, 13), <Name 'test' at 0x1EA0598>: (13, 17), <Punctuation '(' at 0x1EA05E8>: (17, 18)}

vmuriart avatar Jun 12 '16 21:06 vmuriart

though nicely compliments the token_at_offeset function.

vmuriart avatar Jun 12 '16 21:06 vmuriart

Yeah I happen to need position of groups too.

This whole thing falls on its face if the parser decides to suppress parts of the input one day, say whitespace (like most parsers do). It works for now though.

idank avatar Jun 12 '16 22:06 idank

The parser should never suppress whitespaces. IOW during parsing it should be possible to restore the original input. This is required in some use cases.

andialbrecht avatar Jun 13 '16 05:06 andialbrecht

Funny thing is that the position is already known in Lexer.py. this information gets ignored again. Why not add it to the token instead?

Of course, calculating row & column might make it a bit harder. But it should be reasonably trivial to add in the lexer.

sjoerdjob avatar Jun 13 '16 13:06 sjoerdjob

The problem was that once filters are applied to the lexer their row/columns are modified. At the moment parse doesn't take any filters that would make this a problem; but in the case it was allowed to take filters (i was planning on adding the capability later) their position would shift.

Thinking about it though, it shouldnt be too complicated to add a component to the filters to update this information before it finishes processing

vmuriart avatar Jun 13 '16 13:06 vmuriart

Depends on the information you want. If what you want (most likely) the position where it originally occurred in the stream/file/string, then doing it in the lexer makes sense.

In reailty, sqlparse can also be used for reformating and other position-changing purposes. In that case, yes, the whole concept of position breaks down in general. Still, I think adding the functionality with that disclaimer makes sense.

sjoerdjob avatar Jun 13 '16 13:06 sjoerdjob

I second the need to have (back) a way to get the position of each token. In sqlparse > 0.2 the Lexer class has been simplified (it is currently a mere namespace for a single classmethod) and the tokenize() function that uses it does not return the position anymore.

One of my tools implements poor man's fake domains on databases missing the domain concept, recognizing CREATE DOMAIN foo bar and mechanically replacing foo with bar in the statements when this is the case. I had to reimplement the old behaviour to be able to do that.

lelit avatar Oct 25 '16 07:10 lelit

I also need this for something I'm working on!

IMO, it makes complete sense to provide this for parse, and then simply invalidate the information as and when it becomes invalidated.

muhmud avatar Jan 10 '21 14:01 muhmud