STEPBible-Data icon indicating copy to clipboard operation
STEPBible-Data copied to clipboard

Badly formatted data in TTESV

Open dlee opened this issue 3 years ago • 3 comments

There are some lines in TTESV that do not conform to the specified format. I couldn't really figure out how to fix the errors, but they seem to generally fall within the lines of the word index having a +00 and then a long list of strongs numbers.

Some examples:

$Num 1:43	02=<06485>	05=<04294>	07=<05321>	53+00=<07969>+<02572>+<00505>+<00702>+<03967>	
$Num 2:30	02=<06635>	04=<06485>	53+00=<07969>+<02572>+<00505>+<00702>+<03967>	
$Num 4:44	02=<06485>	04=<04940>	3+00=<07969>+<00505>+<03967>	
$Num 26:47	01=<00428>	04=<04940>	07=<01121>	09=<00836>	13=<06485>	53+00=<07969>+<02572>+<00505>+<00702>+<03967>	
$Num 26:62	03=<06485>	23+00=<07969>+<06242>+<00505>	05=<03605>	06=<02145>	09=<02320>	10=<01121>	12=<04605>	13=<03588>	16=<03808>	17=<06485>	18=<08432>	20=<01121>	22=<03478>	23=<03588>	26=<03808>	27=<05159>	28=<05414>	31=<08432>	33=<01121>	35=<03478>	
$Jdg 15:11	3+00=<07969>+<00505>	02=<00376>	04=<03063>	05+06=<03381>	09=<05585>	12=<05553>	14=<05862>	16=<00559>	18=<08123>	22=<03045>	25=<06430>	27=<04910>	30=<04100>	33=<02088>	37=<06213>	42=<00559>	47=<06213>	53=<06213>	
$Jdg 16:27	03=<01004>	05=<04390>	07=<00582>	09=<00802>	10=<03605>	12=<05633>	15=<06430>	17=<08033>	21=<01406>	3+00=<07969>+<00505>	25=<00376>	27=<00802>	29=<07200>	32=<08123>	33=<07832>	
$1Ki 4:32	03=<01696>	3+00=<07969>+<00505>	04=<04912>	07=<07892>	1+05=<00505>+<02568>	
$1Ki 5:16	01=<00905>	02+03=<08010>	3+00=<07969>+<00505>+<07969>+<03967>	04=<08269>	05=<05324>	06=<00834>	08=<05921>	10=<04399>	13=<07287>	16=<05971>	18+19=<06213>	21=<04399>	
$1Ch 12:27	02=<05057>	03=<03077>	08=<00175>	3+00.	<07969>+<00505>+<07651>+<03967>	
$1Ch 12:29	03=<01121>+<01144>	05=<00251>	07=<07586>	3+00=<07969>+<00505>	11=<04768>	16=<08104>	18=<04931>	21=<01004>	23=<07586>	
$1Ch 29:4	3+00=<07969>+<00505>	01=<03603>	03=<02091>	06=<02091>	08=<00211>	7+00=<07651>+<00505>	10=<03603>	12=<02212>	13=<03701>	15=<02902>	17=<07023>	20=<01004>	
$2Ch 2:2	02=<08010>	03=<05608>	70+00=<07657>+<00505>	04=<00376>	06+07=<05449>	80+00=<08084>+<00505>+<00376>	10=<02672>	13+14=<02022>	3+00=<07969>+<00505>+<08337>+<03967>	17=<05329>	
$2Ch 2:17	02=<08010>	03=<05608>	04=<03605>	06+07=<00582>+<01616>	08=<00834>	12=<00776>	14=<03478>	15=<00310>	17=<05610>	21=<01732>	23=<00001>	25=<05608>	29=<04672>	153+00=<03967>+<02572>+<00505>+<07969>+<00505>+<08337>+<03967>	
$2Ch 2:18	01=<07657>	02=<00505>	06=<06213>	08+09=<05449>	80+00=<08084>+<00505>	11=<02672>	14+15=<02022>	3+00=<07969>+<00505>+<08337>+<03967>	18=<05329>	22=<05971>	23=<05647>	
$2Ch 4:5	02=<05672>	05=<02947>	08=<08193>	10=<04639>	13=<08193>	16=<03563>	19=<06525>	22=<07799>	24=<02388>+<03557>	3+00=<07969>+<00505>	25=<01324>	
$2Ch 25:13	03=<01121>	06=<01416>	08=<00558>	10=<07725>	14=<01980>	18=<04421>	19=<06584>	21=<05892>	23=<03063>	25=<08111>	27+28=<01032>	30+31=<05221>	3+00=<07969>+<00505>	36=<00962>	37=<07227>	38=<00961>	
$2Ch 29:33	03+04=<06944>	600=<08337>+<03967>	06=<01241>	3+00=<07969>+<00505>	08=<06629>	
$2Ch 35:7	02=<02977>	03=<07311>	06=<01121>	07=<05971>	09+10=<06453>	12=<03605>	15=<04672>	16=<03532>	18=<01121>	19=<05795>	20=<04480>	22=<06629>	25=<04557>	30+00=<07970>+<00505>	3+00=<07969>+<00505>	28=<01241>	29=<00428>	31=<04480>	33+34=<04428>	35=<07399>	
$Job 1:3	02=<04735>	7+00=<07651>+<00505>	03=<06629>	3+00=<07969>+<00505>	04=<01581>	500=<02568>+<03967>	05=<06776>	07=<01241>	500=<02568>+<03967>	09+10=<00860>	12=<03966>	13=<07227>	14=<05657>	18=<00376>	21=<01419>	23=<03605>	25=<01121>	28=<06924>	

There's also this line that has a word index of 601:

$Num 26:51	04=<06485>	07=<01121>	09=<03478>	601+30=<08337>+<03967>+<00505>+<00505>+<07651>+<03967>+<07970>

There's also a line that has an invalid strongs number (0100419):

$2Sa 15:17	03=<04428>	04+05=<03318>	07=<03605>	09=<05971>	10=<07272>	14=<05975>	17=<04801>	18=<01004>	 <0100419+04801>

I think the last entry is supposed to be 19=<01004+04801>

dlee avatar Mar 13 '21 04:03 dlee

Thanks for taking time to point these out. This dataset is due for a complete revamp. In the future I plan to link to the individual tagged words in ESV - ie I'll avoid copyright issues by not including in-between untagged words. The dataset is also being updated to tagging that includes all Hebrew prefixes and suffixes. This means, in the short term, I won't be fixing these issues. Sorry!

DavidIB avatar Mar 13 '21 11:03 DavidIB

Thank you for the update. Do you have an estimated timeline for the updated dataset?

dlee avatar Mar 13 '21 11:03 dlee

ASAP

David IB

On Sat, Mar 13, 2021 at 11:56 AM David Lee @.***> wrote:

Thank you for the update. Do you have an estimated timeline for the updated dataset?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tyndale/STEPBible-Data/issues/40#issuecomment-798201812, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAM5BOS3YZ42L3CL35NJ4U3TDM5KDANCNFSM4ZDNIOOQ .

DavidIB avatar Mar 13 '21 12:03 DavidIB