scancode-toolkit icon indicating copy to clipboard operation
scancode-toolkit copied to clipboard

Improve splitting lists of copyright holders

Open pombredanne opened this issue 1 year ago • 0 comments

With:

        (c) 1999                Terrehon Bowden <[email protected]>
                                Bodo Bauer <[email protected]>

we get

       copyrights:
           -   copyright: (c) 1999 Terrehon Bowden <[email protected]> Bodo Bauer <[email protected]>
       holders:
           -   holder: Terrehon Bowden Bodo Bauer

ideally we should get this instead

       holders:
           -   holder: Terrehon Bowden
           -   holder: Bodo Bauer

The parse tree looks like this, so we have distinct name groups to guide splitting:

(label='ROOT', children=(
    (label='COPYRIGHT', children=(
        (label='COPY', value='(c)')
        (label='NAME-YEAR', children=(
            (label='NAME-YEAR', children=(
                (label='NAME-YEAR', children=(
                    (label='YR-RANGE', children=(
                        (label='YR-RANGE', children=(
                            (label='YR', value='1999')
                        ))
                    ))
                    (label='NNP', value='Terrehon')
                    (label='NNP', value='Bowden')
                ))
            ))
        ))
        (label='NAME-EMAIL', children=(
            (label='NAME', children=(
                (label='EMAIL', value='<[email protected]>')
                (label='NAME', children=(
                    (label='NNP', value='Bodo')
                    (label='NNP', value='Bauer')
                ))
            ))
            (label='EMAIL', value='<[email protected]>')
        ))
    ))
))

pombredanne avatar Sep 12 '24 09:09 pombredanne