dbeam icon indicating copy to clipboard operation
dbeam copied to clipboard

Always use an auto-generated doc values as a back-up for Avro doc-related metadata retrieval.

Open rulle-io opened this issue 3 years ago • 4 comments

This PR is meant to be a solution for issue #579 .

Also make a schema generation process less dependant on a user-provided schema and more fault-tolerant.

Current implementation

dbeam always generates an doc-related properties for a Avro schema based on input parameters and ResultSet value. Optionally a user can provide a custom "handwritten" schema. A user-provided schema is only used for Avro doc values. Thus fields' names, types and type length are taken from an auto-generated schema.

Drawback(s)

One of drawbacks of this behaviour is that when a new field appears in a DB table and as consequence in a source SQL ResultSet (e.g. SELECT * is used), and a user-provided scheam doesn't contain this field, the process will throw an error.

Solution

dbeam's auto-generated schema is always used as a back-up, if a new a user-provided schema doesn't contain the field in question.

Additional use-case

An unplanned positive side-effect can be that one can use a a user-provided schema as a dictionary of descriptions (docs) for various fields, so one schema file can be used for muliple tables. We are going to use this side-effect.

  • "Unit tests are included"

Checklist for PR author(s)

  • [x] Changes are covered by unit tests (no major decrease in code coverage %) and/or integration tests.
  • [x] Ensure code formating (use mvn com.coveo:fmt-maven-plugin:format org.codehaus.mojo:license-maven-plugin:update-file-header)
  • [x] Document any relevant additions/changes in the appropriate spot in javadocs/docs/README.

rulle-io avatar Sep 01 '21 20:09 rulle-io

Codecov Report

Merging #377 (7bd0191) into master (2646c35) will increase coverage by 0.42%. The diff coverage is 92.75%.

@@             Coverage Diff              @@
##             master     #377      +/-   ##
============================================
+ Coverage     91.47%   91.90%   +0.42%     
- Complexity      243      258      +15     
============================================
  Files            26       27       +1     
  Lines           927      963      +36     
  Branches         67       71       +4     
============================================
+ Hits            848      885      +37     
+ Misses           52       50       -2     
- Partials         27       28       +1     

codecov[bot] avatar Sep 01 '21 20:09 codecov[bot]

@labianchin

rulle-io avatar Sep 16 '21 21:09 rulle-io

Hi. Sorry it took me a while to get here, as I am putting little time on this project...

Is this PR still relevant? It has some conflicts with the just merged #380 .

If so, can you elaborate a bit further on the need for these changes? Specifically: what do we mean by "more fault-tolerant"? And what problem does "less dependant on a user-supplied schema" solves?

labianchin avatar Mar 07 '22 14:03 labianchin

Updated the description.

rulle-io avatar Mar 13 '22 14:03 rulle-io