spec add 1st draft line GT/training specs

Jan 29 '19 10:01 kba

I strongly recommend the introduction of a third GT subset devel.

@wrznr Can you elaborate?

Jan 31 '19 09:01 kba

@kba

Can you elaborate?

Most training procedures allow for the application of three different sets of GT: train, eval and devel. While the purpose for the first two is supposedly clear, the latter is used during training for parameter fixing and error estimation. E.g. ocropus-rtrain has the parameter --tests for this purpose. Note that strictly speaking you may not abuse your evaluation data as development data.

Mar 19 '19 11:03 wrznr

@kba Cf. https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets

Mar 19 '19 11:03 wrznr

@wrznr So far I mainly applied k-fold_cross-validation, would you still see added benefits over this by partitioning into three sets?

May 21 '19 23:05 cneud

@wrznr Do your remaining @wrznr requested changes relate to this comment only or is there other stuff that needs changing (for the time being)?

Aug 08 '19 16:08 cneud

@Doreenruirui's work on okralact has diverged significantly from these specs. It makes little sense to publish these specs with the only implementation implementing it differently.

@doreenruirui can you compare your schemas and documentation with this so we can integrate that part of okralact into the specs?

Aug 08 '19 17:08 kba

@Doreenruirui's work on okralact has diverged significantly from these specs. It makes little sense to publish these specs with the only implementation implementing it differently.

@Doreenruirui can you compare your schemas and documentation with this so we can integrate that part of okralact into the specs?

@kba I am sorry that I am not very familiar with github. Can you point me to the document I should compare with my schemas?

Aug 08 '19 17:08 Doreenruirui

@Doreenruirui We're discussing these changes/new files: https://github.com/OCR-D/spec/pull/105/files.

In particular I would like to harmonize the proposal here (https://github.com/OCR-D/spec/pull/105/files?file-filters%5B%5D=.md#diff-2ae93b1f468c44b9f7e195133a0fb539) of using BagIt for the line GT with your approach in okralact wrt to input format.

Also interesting would be to compare okralact's engine schemas with the schema proposed here https://github.com/OCR-D/spec/pull/105/files?file-filters%5B%5D=.md&file-filters%5B%5D=.yml#diff-690d5874f98dfbd6737bc0168b6084d8 and https://github.com/OCR-D/spec/pull/105/files?file-filters%5B%5D=.md&file-filters%5B%5D=.yml#diff-a1f62fd4dd219fc5c5d5f0ccb419c88b

Aug 08 '19 17:08 kba