David Mezzetti

Results 176 comments of David Mezzetti

I've seen this error and it can be disabled by setting the env param `TOKENIZERS_PARALLELISM=false`. I'll re-run this notebook in Colab to see if it's something new. While I have...

If you export the notebook to Python and run it through Python, do you get the same errors? Wonder if it's possible that the VS Code Jupyter plugin is forking...

I think this is related to the VS Code Jupyter plugin vs Colab. Even just editing the ipynb file will cause a rewrite/change of the metadata. This error doesn't occur...

I'm going to close this PR as it's already out of date. I'll test this out in Colab and update the notebooks. #370 created to track that.

My guess is that the notebook is making a bad assumption on glob.glob producing a consistent order. Probably need to sort the list of images to keep it consistent or...

You should be able to pass `facebook/m2m100_1.2B` as a path parameter to the translate pipeline. There could be a change that forces all translation to run through the model specified...

This pipeline has been updated to have broader model support. There is also a new option, `findmodels` which can be set to false. When false, the pipeline will always use...

This seems like a good idea suited for a workflow. The workflow could call a translate function hosted via an API. Regarding speed are you running the translations on a...

It would take some profiling but the slow performance probably is more model based than pre/post processing. The tokenizers package is written in Rust which will run at native speed....

Thank you for the kind words. This is a good idea and something I'll tackle in the next iteration of paperai.