PileOfLaw
PileOfLaw copied to clipboard
A dataset for pretraining language models targeted for legal tasks.
Results
2
PileOfLaw issues
Sort by
recently updated
recently updated
newest added
Hi @Breakend , can I ask has there any deduplication been done for Pile of Law? Either local deduplication from a single source, or global deduplication among all the data...
The credit card agreements scraping process apparently produced binary data that is stored in text strings like `"b'JEANNE D\xe2\x80\x99ARC CREDIT UNION\n...'"`. Note that this is a `str` that contains the...