Sam Shleifer

Results 45 comments of Sam Shleifer

Trying to workaround, I made an ordered list of pieces that I want to keep like ```[piece: "" score: 0.0 type: UNKNOWN, piece: "" score: 0.0 type: CONTROL, piece: ""...

~Does `SetVocabulary` do anything? Do you have an example of how to use it?~ SetVocabulary example: https://github.com/google/sentencepiece/issues/250

I upgrade to 10.15.4 and it didn't work -> identical error message. First offending line: ```bash [ 94%] Linking CXX executable ../marian-conv cd /Users/shleifer/marian/build/src && /usr/local/Cellar/cmake/3.16.2/bin/cmake -E cmake_link_script CMakeFiles/marian_conv.dir/link.txt --verbose=1...

That url creates a dir called `bbc-summary-data` containing files like `bbc-summary-data/{bbcid}.summary`. Which code is meant to be run after that to continue preprocessing? bbcid.summary files are not mentioned in the...

First file `bbc-summary-data/10000983.summary` looks like this: ![image](https://user-images.githubusercontent.com/6045025/82697229-7f87e180-9c36-11ea-82d3-1bdb5872cfc1.png)

1) Verifying that I don't need to run `prepare_bbc_data.py` after doing the SN --> XSUM replacement, right? 2) Which field is the summary? Or is that in another file? For...

Instructions for converting a Tatoeba-Challenge (marian model) to huggingface. https://github.com/sshleifer/transformers_fork/blob/46509d1c19b9e69d75fb95d33d38dbac4f6f8858/scripts/tatoeba/README.md#L30-L30 The `convert` function does the heavy lifting: https://github.com/huggingface/transformers/blob/master/src/transformers/convert_marian_to_pytorch.py#L567

send full command please

I think num_candidates=1 won't work without significant code modification (removing multiple choice head). I think I've said something different in a different thread, which was wrong.

It would require significant code modification. I'd start by - finding all mentions of multiple choice candidates (and removing them, fixing downstream if they are required) - updating train.py to...