dbeam
dbeam copied to clipboard
Always use an auto-generated doc values as a back-up for Avro doc-related metadata retrieval.
This PR is meant to be a solution for issue #579 .
Also make a schema generation process less dependant on a user-provided schema and more fault-tolerant.
Current implementation
dbeam always generates an doc-related properties for a Avro schema based on input parameters and ResultSet
value.
Optionally a user can provide a custom "handwritten" schema.
A user-provided schema is only used for Avro doc
values.
Thus fields' names, types and type length are taken from an auto-generated schema.
Drawback(s)
One of drawbacks of this behaviour is that when a new field appears in a DB table and as consequence in a source SQL ResultSet (e.g. SELECT *
is used), and a user-provided scheam doesn't contain this field, the process will throw an error.
Solution
dbeam's auto-generated schema is always used as a back-up, if a new a user-provided schema doesn't contain the field in question.
Additional use-case
An unplanned positive side-effect can be that one can use a a user-provided schema as a dictionary of descriptions (doc
s) for various fields, so one schema file can be used for muliple tables. We are going to use this side-effect.
- "Unit tests are included"
Checklist for PR author(s)
- [x] Changes are covered by unit tests (no major decrease in code coverage %) and/or integration tests.
- [x] Ensure code formating (use
mvn com.coveo:fmt-maven-plugin:format org.codehaus.mojo:license-maven-plugin:update-file-header
) - [x] Document any relevant additions/changes in the appropriate spot in javadocs/docs/README.
Codecov Report
Merging #377 (7bd0191) into master (2646c35) will increase coverage by
0.42%
. The diff coverage is92.75%
.
@@ Coverage Diff @@
## master #377 +/- ##
============================================
+ Coverage 91.47% 91.90% +0.42%
- Complexity 243 258 +15
============================================
Files 26 27 +1
Lines 927 963 +36
Branches 67 71 +4
============================================
+ Hits 848 885 +37
+ Misses 52 50 -2
- Partials 27 28 +1
@labianchin
Hi. Sorry it took me a while to get here, as I am putting little time on this project...
Is this PR still relevant? It has some conflicts with the just merged #380 .
If so, can you elaborate a bit further on the need for these changes? Specifically: what do we mean by "more fault-tolerant"? And what problem does "less dependant on a user-supplied schema" solves?
Updated the description.