scancode-toolkit
scancode-toolkit copied to clipboard
Improve splitting lists of copyright holders
With:
(c) 1999 Terrehon Bowden <[email protected]>
Bodo Bauer <[email protected]>
we get
copyrights:
- copyright: (c) 1999 Terrehon Bowden <[email protected]> Bodo Bauer <[email protected]>
holders:
- holder: Terrehon Bowden Bodo Bauer
ideally we should get this instead
holders:
- holder: Terrehon Bowden
- holder: Bodo Bauer
The parse tree looks like this, so we have distinct name groups to guide splitting:
(label='ROOT', children=(
(label='COPYRIGHT', children=(
(label='COPY', value='(c)')
(label='NAME-YEAR', children=(
(label='NAME-YEAR', children=(
(label='NAME-YEAR', children=(
(label='YR-RANGE', children=(
(label='YR-RANGE', children=(
(label='YR', value='1999')
))
))
(label='NNP', value='Terrehon')
(label='NNP', value='Bowden')
))
))
))
(label='NAME-EMAIL', children=(
(label='NAME', children=(
(label='EMAIL', value='<[email protected]>')
(label='NAME', children=(
(label='NNP', value='Bodo')
(label='NNP', value='Bauer')
))
))
(label='EMAIL', value='<[email protected]>')
))
))
))