[BUG-python] `FloatMetadataProperty`: value is not a valid `float` when it is an rounded integer
Describe the bug
Using distilabel and to_argilla method raises this error regarding metadata and text-descriptives.
ValueError: `FeedbackRecord.metadata` {'length-input': 19, 'length-instructions': 106, 'generation-model':
'gpt-3.5-turbo', 'input_n_tokens': 2, 'input_n_unique_tokens': 2, 'input_n_sentences': 1, 'input_perplexity': 1,
'input_entropy': 0, 'input_flesch_reading_ease': -48.99, 'instructions_n_tokens': 18,
'instructions_n_unique_tokens': 18, 'instructions_n_sentences': 1, 'instructions_perplexity': 1.79,
'instructions_entropy': 0.58, 'instructions_flesch_reading_ease': 61.67} does not match the expected schema, with
exception: 2 validation errors for MetadataSchema
input_perplexity
value is not a valid float (type=type_error.float)
input_entropy
value is not a valid float (type=type_error.float)
Expected behavior I would expect this check to pass given that any integer is a rounded of float.
Environment:
- Argilla Version [e.g. 1.0.0]: 1.23
- ElasticSearch Version [e.g. 7.10.2]:
- Docker Image (optional) [e.g. argilla:v1.0.0]:
Additional context Add any other context about the problem here.
🤖
The error you're encountering is due to the input_perplexity and input_entropy values in FeedbackRecord.metadata not being valid floats. This could be due to a few reasons:
-
The values are not provided: In the
_NumericMetadataPropertySchemaclass, theminandmaxvalues are optional. If these values are not provided, the_value_in_boundsmethod will raise a ValueError if theprovided_valueis not within the bounds ofminandmax. -
The values are not of the correct type: The
_check_allowed_value_typemethod checks if the provided value is of typeintorfloat. If it is not, a TypeError is raised. -
The values are NaN: The
_check_nanmethod checks if the provided value is NaN (Not a Number). If it is, a ValueError is raised.
To fix this issue, ensure that the input_perplexity and input_entropy values in FeedbackRecord.metadata are valid floats and are within the bounds of min and max if they are provided. Also, make sure that these values are not NaN.
You can find more details in the metadata.py file.
As for the expected data type for input_perplexity and input_entropy in the FeedbackRecord.metadata schema, I wasn't able to find specific information within the repository. It's possible that this information is available elsewhere or I may have missed it. If you have more information about the expected data types or if there's a specific part of the code you're working with that could provide more context, please share it. This will help me provide a more accurate answer.
Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@frascuchon, this might also be the case for IntMetaDataProperties working with 0 rounded numbers (1.0, 2.0, etc.)
Thanks @sdiazlor and @davidberenstein1957. If this is a client validation error, it should be an error related to the pydantic behaviour. Maybe @alvarobartt can take a look into it. He can tackle this better than me.
May I ask how are the metadata_properties for the text-descriptives defined? Because it seems that the value is float which is a Python type that should go from float(1) -> 1.0, so I cannot see the issue with the provided information, could you answer the question above and provide any other information that can be useful? i.e. is the issue only within the to_argilla method of distilabel or also in argilla Python package?
AFAIK @davidberenstein1957 worked on that integration and I'm not really aware about the text-descriptives details, could you double check @davidberenstein1957? Thanks
@sdiazlor worked on this integration, I checked it. I think the issue is originating in the text-descriptives integration but is two-fold.
-
- the integration, which might be swapping a
IntegerMetadataPropertywith aFloatMetadataPropertyduring thetext-descriptivesconfig. @sdiazlor, would you be able to take a look here?
- the integration, which might be swapping a
- However, from a usability perspective there is also the issue of not allowing for passing a rounded float like
1.0to anIntegerMetadataPropertyand perhaps also not an integer like1to aFloatMetadataProperty. For me this feels buggy but what do you think @alvarobartt, should we allow for this?