Neeraj Malhotra comments

Results 5 comments of


                                            Neeraj Malhotra

Pyspark unique check doesn't return error

Yeah, implementation is simple. User needs to be careful about using it in production though. By default it should be disabled and only runs when absolutely needed. On Sep 23,...

[PySpark] Performance issues during validation

Acknowledging the resource-intensive nature of data validations, I concur that caching could be an ideal solution. However, before implementing this within `pandera`, I recommend conducting performance tests on a suitable...

Cannot create a pydantic model with a `pandera.typing.pyspark.DataFrame` type.

Just looking at the code above I suspect the issue is your import `from pandera.typing.pyspark import DataFrame` which might be pointing to `pyspark.pandas.DataFrame` and not `PySpark Sql`. I haven't digged...

Pyspark validation doesn't validate joint uniqueness

If I recall, we had disabled it to avoid performance issues on large datasets but sure it can be added if anyone wants it but be mindful that it will...

How to name the checks when using register_check_method

I could reproduce it and could only glance at it for now. I believe for some reason `schema.check.error` is coming null at [link](https://github.com/unionai-oss/pandera/blob/850dcf8e59632d54bc9a6df47b9ca08afa089a27/pandera/backends/pyspark/error_formatters.py#L4C34-L4C34) ``` def format_generic_error_message( parent_schema, check, ) ->...