datapackage-java icon indicating copy to clipboard operation
datapackage-java copied to clipboard

ValidationException Not Thrown and Error Messages Not User-Friendly

Open Shreeja-dev opened this issue 1 year ago • 3 comments

Issue with Validation in frictionlessdata/datapackage-java

Context

I am using Java 21 and frictionlessdata/datapackage-java to validate a CSV file against a schema defined in a datapackage.json. Here is the setup:

CSV File

firstname,lastname,gender,age
John,Doe,male,30
Jane,Smith,female,25
Alice,Johnson,female,19
Bob,Williams,male,17

###datapackage.json (person.csv)

{ "name": "csv-validation-using-ig", "description": "Validates Person", "dialect": { "delimiter": "," }, "resources": [ { "name": "person_data", "path": "org/csv/person.csv", "schema": { "fields": [ { "name": "firstname", "type": "string", "description": "The first name of the person.", "constraints": { "required": true } }, { "name": "lastname", "type": "string", "description": "The last name of the person.", "constraints": { "required": true } }, { "name": "gender", "type": "string", "description": "Gender of the person. Valid values are 'male' or 'female'.", "constraints": { "enum": ["male", "female"] } }, { "name": "age", "type": "integer", "description": "The age of the person. Must be greater than 18.", "constraints": { "minimum": 19 } } ] } } ] }

###Junit Test
---
package csv;

import static org.junit.jupiter.api.Assertions.assertFalse;
import static org.junit.jupiter.api.Assertions.assertNotNull;
import static org.junit.jupiter.api.Assertions.assertThrows;

import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;

import org.junit.jupiter.api.Test;

import com.fasterxml.jackson.databind.ObjectMapper;

import io.frictionlessdata.datapackage.Package;
import io.frictionlessdata.tableschema.exception.ValidationException;

class PersonDataPackageValidationTest {

    @Test
    void validateDataPackage() throws Exception {
        // Validate the datapackage.json using the new resource paths
        ValidationException exception = assertThrows(ValidationException.class, () -> this.getDataPackageFromFilePath(
                "org/csv/datapackage.json", true));

        // Assert the validation messages
        assertNotNull(exception.getMessages());
        assertFalse(exception.getMessages().isEmpty());
    }

    public static Path getBasePath() {
        try {
            String pathName = "/src/test/resources/org/csv/datapackage.json";
            Path sourceFileAbsPath = Paths.get(DataPackageValidationTest.class.getResource(pathName).toURI());
            return sourceFileAbsPath.getParent();
        } catch (Exception ex) {
            throw new RuntimeException(ex);
        }
    }

    private Package getDataPackageFromFilePath(String datapackageFilePath, boolean strict) throws Exception {
        String jsonString = getFileContents(datapackageFilePath);
        Package dp = new Package(jsonString, getBasePath(), strict);
        return dp;
    }

    public String convertToJson(List<Object> validationMessages) {
        try {
            ObjectMapper objectMapper = new ObjectMapper();
            return objectMapper.writerWithDefaultPrettyPrinter().writeValueAsString(validationMessages);
        } catch (Exception e) {
            throw new RuntimeException("Failed to convert to JSON", e);
        }
    }

    private static String getFileContents(String fileName) {
        try {
            return new String(TestUtil.getResourceContent(fileName));
        } catch (Exception ex) {
            throw new RuntimeException(ex);
        }
    }
}

---

Issues and Questions

:question: Issue 1: Missing ValidationException for Age Rule

  • Description:
    I have defined a rule in the schema that the age must be at least 19. In the CSV, the person Bob Williams has an age of 17.

    • Expected Behavior: A ValidationException should be thrown.
    • Actual Behavior: No exception is thrown.
  • Question:
    Could you guide me on what might be wrong in my configuration or code?

:question: Issue 2: Lack of Specific Error Messages

  • Description:
    Here is an example of the error messages I receive in another scenario:

    [{
      "type" : "required",
      "code" : "1028",
      "path" : "$.fields[0]",
      "schemaPath" : "#/properties/fields/items/anyOf/0/required",
      "arguments" : [ "name" ],
      "details" : null,
      "message" : "$.fields[0].name: is missing but it is required"
    }]
    
  • Description:
    The error message does not indicate the specific field or value in the CSV that caused the issue.

  • Comparison:
    The Frictionless Python validator provides more detailed error messages, such as:

    {
      "message": "The cell \"\" in row at position \"2\" and field \"firstname\" at position \"1\" does not conform to a constraint: constraint \"required\" is \"True\""
    }
    
    
  • Question:
    How can I achieve similar specific error messages using datapackage-java to make the validation results more user-friendly?


:question: Request for Guidance

  • Questions:
    1. Kindly suggest how to fix the issue where the age rule is not triggering a ValidationException.
    2. Please guide me on how to configure or modify datapackage-java to provide detailed error messages like the Frictionless Python validator.

I appreciate any help or guidance you can provide.

---

Please preserve this line to notify @iSnow (lead of this repository)

Shreeja-dev avatar Dec 03 '24 04:12 Shreeja-dev

@iSnow @amercader @akariv

Appreciate any help or guidance you can provide on the above issue.

Shreeja-dev avatar Dec 04 '24 04:12 Shreeja-dev

Hi @Shreeja-dev

thank you for your feedback.

Issue 1: Missing ValidationException for Age Rule

The missing exception comes from a misunderstanding how validation works in the library. There are two validations:

  • formal schema validation, this happens at the time you create a datapackage with a schema. In this step, the validity of the schema against the tableschema-spec is validated. No data validation occurs, therefore also no constraints validation
  • validation of data, this happens only when you try to read data from the datapackage. The reason is that for a full data validation, the library would have to process all the data in the package, and again when you read the data. Therefore, this validation is deferred till you read the data.

We can rewrite your example so that it works:

void validateDataPackage() throws Exception {
	Package dp = this.getDataPackageFromFilePath(
		"/fixtures/datapackages/constraint-violation/datapackage.json", true);
	Resource resource = dp.getResource("person_data");
	ConstraintsException exception = assertThrows(ConstraintsException.class, () -> resource.getData(false, false, true, false));

	// Assert the validation messages
	Assertions.assertNotNull(exception.getMessage());
	Assertions.assertFalse(exception.getMessage().isEmpty());
}

You can see that only during the resource.getData() call the exception will be thrown.

Issue 2: Lack of Specific Error Messages

This is true, but it is a problem of the networknt validator library we are using to do formal schema validation.

For data validation, I took some steps to make the exceptions more user-friendly.

Hope that helps.

iSnow avatar Mar 26 '25 13:03 iSnow

I am having no success with validation, too. I have tried your example. However, no Exception is thrown.

public static void main(String[] args) throws Exception {
        String jsonString = org.apache.commons.io.FileUtils.readFileToString(new File("/tmp/datapackage-java/src/test/resources/fixtures/datapackages/constraint-violation/datapackage.json"), Charset.defaultCharset());
        Package dp = new Package(jsonString, Path.of("/tmp/datapackage-java/src/test/resources/fixtures/"), true);
        Resource resource = dp.getResource("person_data");
        resource.getData(false, false, false, false);
    }

There is the list of errors: https://github.com/frictionlessdata/datapackage-java/blob/7354d2dd0dd5b24f0a5f4eb3557440b3604e8f47/src/main/java/io/frictionlessdata/datapackage/resource/AbstractResource.java#L71

It is filled in the catch block of the validate method: https://github.com/frictionlessdata/datapackage-java/blob/7354d2dd0dd5b24f0a5f4eb3557440b3604e8f47/src/main/java/io/frictionlessdata/datapackage/resource/AbstractResource.java#L476-L483

However, I could not find a place where the list of errors is read.

jze avatar Nov 07 '25 17:11 jze