Scribe-Data Refine CLI User Experience by Validating Input Languages and Data Types

Terms

[X] I have searched all open bug reports
[X] I agree to follow Scribe-Data's Code of Conduct

Behavior

Summary

When users provide a non-existent language or data type to the total command, the system incorrectly returns a number of lexemes. This leads to confusion and undermines the user experience.

Steps to Reproduce

Run the total command with an invalid language or data type.
Observe that the command returns a number rather than indicating that the language or data type does not exist.

Example

Initially when we run this command scribe-data t -lang Latin and we had:

lang filter =  ?lexeme dct:language ?language . # added this while trying to debug
Language: Latin # and this...
Total number of lexemes: 1344820

After we added some print statements, we see that the language_filter was not updating the language parameter hence giving a wrong result.

You can see the same thing for French: scribe-data t -lang French, and we had:

Lang filter =  ?lexeme dct:language wd:Q150 .
Language: French
Total number of lexemes: 19746

This shows inconsistent behaviour.

Expected Behavior

The command should validate the provided language and data type. If either does not exist, the system should gracefully return without executing the query and also suggest to the user what they can do to resolve it.

Root Cause

The current implementation lacks validation checks for the existence of input languages and data types in the metadata files. Specifically, the language_metadata.json file plays a crucial role in this issue. It serves as the authoritative source for valid languages and their corresponding QIDs. When a user inputs a language or data type that is not present in this file, the CLI does not recognize it as invalid and proceeds with the query. This oversight results in misleading output and a poor user experience

Proposed Solution

[x] Introduce a validation function that checks for the existence of the provided language and data type.
[x] Modify the CLI code in total.py to include a user-friendly message for non-existent languages.
[x] Ensure that the main script correctly handles the update and setting of metadata, as defined in main.py, to improve user experience.

Related Issues

This issue is closely related to #295 as it has to do with CLI

Contribution

I would love to work and collaborate on implementing this improvement.

Oct 12 '24 00:10 DeleMike

Hi @andrewtavis, I found a bug while trying to get total lexemes and I saw that it was connected to the language_metadata.json file. I have worked on an initial fix (a PR), which I will soon drop so that you can see my reasoning on how I propose we fix it.

Can you assign this issue to me?

Oct 12 '24 00:10 DeleMike

@catreedle, after our long talk about this issue yesterday, I have created the issue and raised an initial PR here

Oct 12 '24 00:10 DeleMike