PileOfLaw icon indicating copy to clipboard operation
PileOfLaw copied to clipboard

A dataset for pretraining language models targeted for legal tasks.

Results 2 PileOfLaw issues
Sort by recently updated
recently updated
newest added

Hi @Breakend , can I ask has there any deduplication been done for Pile of Law? Either local deduplication from a single source, or global deduplication among all the data...

The credit card agreements scraping process apparently produced binary data that is stored in text strings like `"b'JEANNE D\xe2\x80\x99ARC CREDIT UNION\n...'"`. Note that this is a `str` that contains the...