Reduce size of DataDictionary endpoint
The DataDictionary can be fairly large causing issues in browsers. One idea is to allow adding a filter parameter to the REST endpoints to reduce the data being returned. Another idea is to provide a configurable paging mechanism within the returned pages.
I'll work this ticket. There are several ways this can be implemented. For filtering fields, we could try integrating a third-party library such as bohnman/squiggly or something similar. If using a third party library is not desired, then we need to decide how many features to support for a given filter expression, such as wildcards, negations, nested fields, and regex. Some examples:
-
select=name- Return single field 'name'. -
select=id,name- Return multiple fields, 'name' and 'id'. -
select=-name- Return all fields except 'name'. -
select=name*- Return all fields starting with 'name'. -
select=record.name- Return nested field 'name' in 'record' objects. -
select=record[name,id]- Return multiple nested fields 'name' and 'id' in 'record' objects. -
select=record[name*]- Return all nested fields in 'record' objects starting with 'name'. -
select=/^name[a-z]/- Return fields matching regex.
Pagination is relatively straightforward to implement with a combination of a limit and offset parameters.
@lbschanno using squiggly may be a little overkill here. Pagination may be the easier way to go.
Created a separate ticket (#788) for the query model endpoint
@ivakegg agreed re. squiggly. I went with straightforward pagination for my pull request: https://github.com/NationalSecurityAgency/datawave-dictionary-service/pull/7
Closed the PR due to the need for a different approach.
Currently when a text/html is returned, the data is rendered as an html table enhanced by jquery.dataTables, which supports in-memory pagination, sorting, and regex filtering via client-side processing. This becomes an issue when datasets are too large for the available amount of memory to handle.
One option would be to switch to server-side processing and bind the dataTables' pagination, sorting, and filtering to ajax calls that would subsequently call the server to receive the modified table. We could also reduce the number of server calls (and increase the table's response time) by establishing an internal max threshold of results whereby if the total number of results is less than this threshold, a table configured for client-side processing is returned, or otherwise a table configured for server-side processing is returned.
Adding support for server-side processing would be somewhat involved but not complex.
@alerman I believe you updated an interface to allow a paging mechanism on an internal sub-system. Could you comment on what you did that may help Laura work this issue. Thanks.
@lbschanno take a look at ag-grid. I used it with angilarjs but I find it to be far superior to datatables in terms of the pagination support, and it's even far better at resource consumption. One of the main issues with DataTables is that it's computing the rendering even for items not yet displayed. I've found ag-grid to be easy to use and much better about resource consumption
@alerman thank you, I will give ag-grid a try.
@alerman were you using the client-side model or the server-side model from ag-grid? Unfortunately it looks like the server-side model that supports loading data from a server is only included under the enterprise version of ag-grid.
OK, here is the plan. @alerman will implement using ag-grid for the client-side pagination which should get us over the initial hump of being unable to load the data dictionary with a 5M json size. We should still work on doing a server side pagination mechanism but that is no longer as high a priority.
@dws4 does DD v2 address this issue?
@foster33 - Unfortunately not at the moment for large amount of rows. If there are a lot of rows that are populated, I just have it stay in the loading state until ready.