Dylan Freedman
Dylan Freedman
So, essentially: * by default, always output exactly as many fields as headers * don't warn at all if there's a mismatch in field amount that entails too few fields...
Sure, thanks for opening the issue. See #46 — you're welcome to review if you have availability; but we can review on our end, too.
I'm also encountering these errors. @xloem were you able to modify the code to get it to work?
You can modify where the `transformers` module places cached files by setting the environment variable `TRANSFORMERS_CACHE`. See here for additional information: https://stackoverflow.com/questions/63312859/how-to-change-huggingface-transformers-default-cache-directory
+1 to truncating the page. In our case (I'm also at The Washington Post), we have large pages because NextJS injects the initial application state in the HTML. This always...
What about an approach where you simply set an ignore flag that's cleared in the scroll handler, following https://stackoverflow.com/a/1386750
I should probably make the cache handling more clear in the docs so folks are reassured. Great point re: error handling. Logging an error message and continuing is the way...
I'm interested to hear how the CoLab exploration goes. Re: running it in Portuguese, you can use any custom model on Hugging Face. For example, this model looks relevant https://huggingface.co/ruanchaves/bert-base-portuguese-cased-assin-similarity....
Closing for now. Feel free to reopen if there's a specific feature request or guide to getting it to run in CoLab. Thanks!
Agreed! This seems useful. I'm thinking the behavior that makes sense would be to recursively include `.txt` and `.pdf` files when you specify a directory. Do you also think that...