Enriching supported date formats
Description
Date without time and date with time but with no millis are valid formats for all the date fields. Fixes https://github.com/Graylog2/graylog2-server/issues/12137
/nocl (yet)
**The failed tests are ignored at this point. There is no reason to fix them before the solution is accepted. **
Motivation and Context
See https://github.com/Graylog2/graylog2-server/issues/12137 Query string query does not allow to customize/set date formats. The only way to achieve a requested behavior seems to be by extending date-type field formats in the mapping.
Screenshots (if appropriate):
-
When using full datetime, with millis, queries used to work and work correctly.
-
If you remove any part of a datetime, i.e. millis, the query does not work. The reason is difficult to figure out by a user, you simply do not get results.
-
With the proposed change, shorter date and datetime formats can be used, but they only work on the latest data, where new data format has been used on mapping.
- The change does not influence data ingestion.
As seen in point 3, solution works only for the indices created after the change is applied. For the older ones old behavior would still be visible(format field in existing mapping cannot be changed). The question is if it is acceptable.
Types of changes
- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Refactoring (non-breaking change)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
Checklist:
- [x] My code follows the code style of this project.
- [ ] My change requires a change to the documentation.
- [ ] I have updated the documentation accordingly.
- [x] I have read the CONTRIBUTING document.
- [ ] I have added tests to cover my changes.
@luk-kaminski @dennisoelkers Can we adjust the query instead of adjusting the schema? I think we are already parsing the query for validation. Is there a way to modify the date to be in the correct format before we send it to OpenSearch?
@luk-kaminski @dennisoelkers Can we adjust the query instead of adjusting the schema? I think we are already parsing the query for validation. Is there a way to modify the date to be in the correct format before we send it to OpenSearch?
@bernd - The is a chance to do so. We can i.e. try to write a new normalizer (org.graylog.plugins.views.search.engine.normalization.SearchNormalizer), but it would be much more complex task (as query is just a string) probably requiring something similar to what Tomas has done around QueryValidationServiceImpl...
What are you afraid of with the mapping change?
What are you afraid of with the mapping change?
- As you pointed out, it doesn't work for existing indices. That means the search experience will be inconsistent until all older indices are deleted from the OpenSearch cluster. (including in the warm tier)
- If we parse the timestamp strings, we can consider the user's timezone and adjust the timestamp before executing the query. The timestamps in the index are stored in UTC.
- Mapping changes need to be done very carefully, and we need to think through all implications, especially if it's one of our core fields.
- We have a time picker that allows the selection of time ranges. Using the time picker (you cannot opt out of that) and a range query can be confusing. So, I consider using the query language to specify a time range advanced.
I want to make sure that the change of a core field mapping with all its implications is justified. That's why I am asking for possible alternatives. :slightly_smiling_face:
One issue that comes to mind:
If we index a timestamp like 2024-06-06T12:12:12 or 2024-06-06 in OpenSearch, we also get that value back in the response message. That means the message list would show 2024-06-06 instead of the full timestamp.
Can the same be achieved by specifying a search_analyzer for timestamp in the mapping?
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-analyzer.html
This still has the drawback that it won't apply to old indices, but it won't risk storing different date formats in indices that subsequent code might be surprised by, because traditionally we have always written full ISO dates.
Can the same be achieved by specifying a
search_analyzerfortimestampin the mapping? https://www.elastic.co/guide/en/elasticsearch/reference/current/search-analyzer.htmlThis still has the drawback that it won't apply to old indices, but it won't risk storing different date formats in indices that subsequent code might be surprised by, because traditionally we have always written full ISO dates.
We would also still have the timezone issue for the user.
I see this as a UX issue and would like to avoid changing the index schema settings for it. (and everything else that would need to be adjusted to deal with different timestamps) We made the mistake a decade ago to expose the query language directly. Now, we have to invest more work (parsing and fixing the query) to handle it. :smile: