spec
spec copied to clipboard
add 1st draft line GT/training specs
I strongly recommend the introduction of a third GT subset devel.
@wrznr Can you elaborate?
@kba
Can you elaborate?
Most training procedures allow for the application of three different sets of GT: train
, eval
and devel
. While the purpose for the first two is supposedly clear, the latter is used during training for parameter fixing and error estimation. E.g. ocropus-rtrain
has the parameter --tests
for this purpose. Note that strictly speaking you may not abuse your evaluation data as development data.
@kba Cf. https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets
@wrznr So far I mainly applied k-fold_cross-validation, would you still see added benefits over this by partitioning into three sets?
@wrznr Do your remaining @wrznr requested changes
relate to this comment only or is there other stuff that needs changing (for the time being)?
@Doreenruirui's work on okralact has diverged significantly from these specs. It makes little sense to publish these specs with the only implementation implementing it differently.
@doreenruirui can you compare your schemas and documentation with this so we can integrate that part of okralact into the specs?
@Doreenruirui's work on okralact has diverged significantly from these specs. It makes little sense to publish these specs with the only implementation implementing it differently.
@Doreenruirui can you compare your schemas and documentation with this so we can integrate that part of okralact into the specs?
@kba I am sorry that I am not very familiar with github. Can you point me to the document I should compare with my schemas?
@Doreenruirui We're discussing these changes/new files: https://github.com/OCR-D/spec/pull/105/files.
In particular I would like to harmonize the proposal here (https://github.com/OCR-D/spec/pull/105/files?file-filters%5B%5D=.md#diff-2ae93b1f468c44b9f7e195133a0fb539) of using BagIt for the line GT with your approach in okralact wrt to input format.
Also interesting would be to compare okralact's engine schemas with the schema proposed here https://github.com/OCR-D/spec/pull/105/files?file-filters%5B%5D=.md&file-filters%5B%5D=.yml#diff-690d5874f98dfbd6737bc0168b6084d8 and https://github.com/OCR-D/spec/pull/105/files?file-filters%5B%5D=.md&file-filters%5B%5D=.yml#diff-a1f62fd4dd219fc5c5d5f0ccb419c88b