Dirk Groeneveld comments

Results 200 comments of


                                            Dirk Groeneveld

Docs say you can pass token ids to `.encode()`, but it throws an exception when you do

Using the slow tokenizer is a good workaround. The fact that the documentation does one thing and the code does another is still a bug though. I doubt that people...

Custom truncation logic is really hard

@SaulLu, from Slack

Custom truncation logic is really hard

Let's say I have a dataset where the two fields I have for every instance are "question" and "context". I never want to truncate the question. If I truncate the...

Custom truncation logic is really hard

> Figure out the culprit pair, and exclude it from the batch With a large enough dataset (or many datasets in my case) this is not possible. For one thing,...

Custom truncation logic is really hard

A specific solution would be a way to say "truncate first, then second" or "second, then first". As you noted, it means you have to give it something like a...

Running a Beaker Executor job leaves loads of uncommitted datasets in the workspace

I did another run like this. This time only one of them failed. The results table points me to this dataset: https://beaker.org/ds/01GC7PCX5M9B357GX6YJFY7C5R/details. This is clearly incomplete. Its presence will prevent...

Running a Beaker Executor job leaves loads of uncommitted datasets in the workspace

Ah! I can search experiments by name, which reveals this error message: ``` 2022-09-05T21:18:13.535899802Z {"name": "root", "msg": "[step trained_model_arc_challenge_bert-base-uncased_2147483647] Uncaught exception", "args": {"py/tuple": []}, "levelname": "ERROR", "levelno": 40, "pathname": "/opt/conda/lib/python3.9/site-packages/tango/common/logging.py",...