CASE icon indicating copy to clipboard operation
CASE copied to clipboard

Add representation for KeywordSearchConfiguration and KeywordIndexingConfiguration

Open chrishargreaves opened this issue 4 months ago • 0 comments

Background

Keyword searching is proposed in #179. It also discusses KeywordIndex, which was requested to be broken out into a separate request (https://github.com/casework/CASE/issues/179#issuecomment-3201676650). This has evolved into a more specific suggestion to add: KeywordIndexingConfiguration KeywordSearchConfiguration

Both are included here as they share many concepts.

My previous comments relating to this are here: https://github.com/casework/CASE/issues/179#issuecomment-3199705336 https://github.com/casework/CASE/issues/179#issuecomment-3204773116

Requirements

Requirement 1

Add KeywordIndexingConfiguration

Requirement 2

Add KeywordSearchConfiguration

Risk / Benefit analysis

Benefits

If keyword results are modelled in CASE, a comparison would reveal that two tools produce different results given the same keyword/keyword list.

This addition capturing the search/indexing configuration would potentially allow the reasons for the difference to be determined quickly and programatically. e.g. different character encodings used for search (KeywordSearchConfiguration) or indexing (KeywordIndexingConfiguration)

Risks

The submitter is unaware of risks associated with this change

Competencies demonstrated

Competency 1

Competency Question 1.1

What configurations could explain the differences in keyword search results result when the same term is used on the same data when using different tools.

Result 1.1

Example 1: Tool 1: configuration 1: character encoding UTF-8 Tool 2: configuration 2: character encoding UTF-8, UTF-16

Competency Question 2.1

What configurations could explain the differences in keyword search results result when the same term is used on the same data when using the same tool.

Result 2.1

Example 2: Tool 1 configuration 3: search artefacts only Tool 1 configuration 4: search all data

Result 2.2

Example 3: Tool 1 configuration 5: include substrings Tool 1 configuration 6: do not include substrings

Competency Question 3.1

Did the actions carried out differ from the Standard Operating Procedures (SOPs)?

(this might also be useful to flag that data may be missing due to indexing settings when viewing a case file originally processed by someone else)

Result 3.1

Settings match SOP for keyword indexing

Result 3.2

WARNING: Search indexing settings for this case differ from SOP configuration

Solution suggestion

Screenshots from several tools are included in the comments linked above: (https://github.com/casework/CASE/issues/179#issuecomment-3199705336, https://github.com/casework/CASE/issues/179#issuecomment-3204773116)

If this is considered a useful addition, I can try to enumerate and deduplicate those and derive a list of properties to capture settings for keyword searching/indexing.

BulkExtractor may also have useful configuration parameters to feed into this.

chrishargreaves avatar Aug 26 '25 08:08 chrishargreaves