tabula-java
tabula-java copied to clipboard
Bug: empty cells dropped and cells below shifted up when using area option
Issue summary:
when use area options, empty cells in the table is removed and the cells below are shifted up automatically. but if not use area option, output remains the same as the input. It indicates area options are associated with a rule of removing empty cells and shift below cells up.
the issue is confirmed here
I raised an issue here.
https://github.com/chezou/tabula-py/issues/245
Expected behavior:
I expect the area option would not remove and shift cells. I expect the empty cells remain where they are, instead of being removed.
Code to reproduce the issue:
java -jar tabula/tabula-1.0.3-jar-with-dependencies.jar -a 0,0.1,606,68 -p 1 --stream ~/Downloads/test_convert.pdf
please remove the area above to produce the comparison result.
Example file:
Actual behavior:
below is without the area option.
ISIN
NaN
GB00xxxyy409
IExxxxBNyy 34
Uxxxx24Fy012
US0xxxW1y27
USxxxx9Kyy59
USxxxx35yy67
USxxxx2Qyy58
USxxxx331yy05
USxxxx761yy7
NaN
US1xxx37yy96
NaN
US1xxx2Q1yy8
CA2xxxx1yy6
US2xxx21yy9
JPxxxx400yy6
USxxxx161yy6
NaN
GB0xx540yy86
US45xx6F1yy9
US46xx02yy34
GBxxx128J450
JPxxxx90yy06
JPxxx320yy09
US4xxx5Hyy005
JP3yyyy00006
GBxxx92yyP64
NaN
GBxxxx09yy53
GBxxxx909H978
GxxxxY yZyy79
GBxxxxJyyy12
GBxxxXDZyy23
GB0xxx9yyF68
BMxxx8yyy068
NaN
USxxxx6Qyy40
USxxxx18yy45
USxxxx6yyy59
NaN
Uxxxx66Gyy40
NaN
NaN
NaN
GxxxxKBF4yy3
below is with area option:
ISIN
GB00xxxyy409
IExxxxBNyy 34
Uxxxx24Fy012
US0xxxW1y27
USxxxx9Kyy59
USxxxx35yy67
USxxxx2Qyy58
USxxxx331yy05
USxxxx761yy7
US1xxx37yy96
US1xxx2Q1yy8
CA2xxxx1yy6
US2xxx21yy9
JPxxxx400yy6
USxxxx161yy6
GB0xx540yy86
US45xx6F1yy9
US46xx02yy34
GBxxx128J450
JPxxxx90yy06
JPxxx320yy09
US4xxx5Hyy005
JP3yyyy00006
GBxxx92yyP64
GBxxxx09yy53
GBxxxx909H978
GxxxxY yZyy79
GBxxxxJyyy12
GBxxxXDZyy23
GB0xxx9yyF68
BMxxx8yyy068
USxxxx6Qyy40
USxxxx18yy45
USxxxx6yyy59
Uxxxx66Gyy40
GxxxxKBF4yy3
What I try to solve:
why a 0.1 pixel shift would lead the empty cells to be removed and cells below those shifted up.