data-api-builder
data-api-builder copied to clipboard
CosmosDB : GQL Schema Generation with Sampling
Refer this document for details Schema Inference Design.docx
What is this change?
Add a utility to generate schema for No SQL database. It will be based on best efforts.
PR Code Change Summary
Main Code
Samplers:
- [ ] src/Core/Generator/Sampler/PartitionBasedSampler.cs
- [ ] src/Core/Generator/Sampler/TimeBasedSampler.cs
- [ ] src/Core/Generator/Sampler/TopNSampler.cs
Schema Generator
- [ ] src/Core/Generator/SchemaGenerator.cs : Generates GQL out of given set of JSON array
a) Add Alias name
b) tag container entity or an entity with alias as
@modelc) Mark an attribute is nullable or not i.e.!
Utility Classes
- [ ] src/Core/Generator/Sampler/CosmosExecutor.cs : It is responsible to run a CosmosDB Query
- [ ] src/Core/Generator/SchemaGeneratorFactory.cs : a) Create connection with Cosmos DB b) Runs Required Sampler c) Generates Schema
Export Command
- [ ] src/Cli/Commands/ExportOptions.cs
- [ ] src/Cli/Exporter.cs
Test Coverage
- [ ] Samplers: src/Service.Tests/CosmosTests/SamplerTests.cs
a) Might notice that, I am creating item with the gap of 1 sec, as Sampler queries are using
_tsand it is auto generated by cosmosDB (and there is no way to control it). So, waiting time is added to get the different values for this column - [ ] SchemaGenerator: src/Service.Tests/CosmosTests/SchemaGeneratorTest.cs
Other Changes
- [ ] src/Cli/CustomLoggerProvider.cs : Minimal Logger level to
Debugas I added few debug logs to get more insights (if required) on, what is happening in sampler.
How can you test it?
- Go to this location, here you will find the data builder executable:
--helpprovides all the available options
- Below is the minimal command required to run this feature
TopNSampler
PartitionBasedSampler
TimeBasedSampler
How was this tested?
- [x] Integration Tests
- [x] Unit Tests
Please update the PR description
just curious since i haven't fully looked through: make sure this feature is feature flagged and doesn't just execute by default
just curious since i haven't fully looked through: make sure this feature is feature flagged and doesn't just execute by default
this feature is not part of DAB flow. Customer has to run it explicitly and generate the gql.
/azp run
@sourabh1007 can you add samples for all 3 modes, I have few feedbacks on the cli command variables and also on the PR, will add once you have the samples in PR
/azp run
@sourabh1007 What is the sequence of commands here? I see that all the sampling methods require config file as the input, however dab init which is the initial command to generate the config, which cannot be executed without a schema file.
@sourabh1007 What is the sequence of commands here? I see that all the sampling methods require config file as the input, however dab init which is the initial command to generate the config, which cannot be executed without a schema file.
Generation of schema file is very first command (if required) then, after that it is normal flow.
/azp run
@sourabh1007 What is the sequence of commands here? I see that all the sampling methods require config file as the input, however dab init which is the initial command to generate the config, which cannot be executed without a schema file.
Generation of schema file is very first command (if required) then, after that it is normal flow.
What is the experience if customers don't want to use the schema auto generation part?
Also, can you please create separate PRs for the changes of CLI and the DAB engine?
Also, can you please create separate PRs for the changes of CLI and the DAB engine?
I’ve already provided an explanation of the files included in the PR. Could you please let me know if there's anything specific that’s unclear or if there's an issue you'd like me to address? I'm happy to help.
Also, can you please create separate PRs for the changes of CLI and the DAB engine?
I’ve already provided an explanation of the files included in the PR. Could you please let me know if there's anything specific that’s unclear or if there's an issue you'd like me to address? I'm happy to help.
Please separate the CLI behavior changes and the DAB engine changes for Cosmos DB into different PRs. This will make them easier to manage. I’ll leave the decision up to you and @seantleonard . Also, could you address the comment on the first question?
pr fine as is, no need to break up at this point.. I Just need time to go in and review latest changes. In future, big changes need to be broken up, even with change descriptions.
/azp run
@sourabh1007 can we rename the param names as below for schema extraction?
- TopNExtractor
- TimePartitionedSampler
- EligibleDataSampler
/azp run
/azp run
\azp run
/azp run
/azp run