multilingual-modeling icon indicating copy to clipboard operation
multilingual-modeling copied to clipboard

[WIP] Refactor madx_run_clm.py

Open haileyschoelkopf opened this issue 3 years ago • 7 comments

Current changes: just some unused / commented out code from madx_run_clm.py. There is more, but I was not certain why certain parts are commented out.

We'll need to refactor the script as well once we add new ft strategies.

I also wonder whether it would be helpful to turn language experiments into a single packaged script (train tokenizer + adapt model + possibly run eval?) So that it is easier to onboard and have the others run experiments.

haileyschoelkopf avatar Jun 27 '22 14:06 haileyschoelkopf

I also wonder whether it would be helpful to turn language experiments into a single packaged script (train tokenizer + adapt model + possibly run eval?) So that it is easier to onboard and have the others run experiments.

Might be in the future, but at least during these sprints, let's keep it separate.

yongzx avatar Jun 28 '22 05:06 yongzx

We might also want to reconfigure the file structure?

My thoughts would be something like:

multilingual-modeling/
- lang-adapt/
    - README.md
    - scripts/
    - finetune/
        - *.py
    -  *.py
- evaluation/
    - eval_xnli/
    - eval_exp_sentence_retreival_eval/

lintangsutawika avatar Jun 29 '22 14:06 lintangsutawika

Yea the structure is a mess right now. There's too many duplication (e.g., on the eval side, we actually don't need eval_xnli) due to legacy codes before.

I am working on it right now.

yongzx avatar Jun 29 '22 14:06 yongzx

1fb6504

multilingual-modeling/
- lang-adapt/
    - README.md
    - scripts/
    -  *.py
- evaluation/
    - wikiann/  #scripts
    - xnli/  #scripts
    - eval.py
    - README.md
- exp_sentence_retreival_eval/

for now.

@lintangsutawika What do you have in mind in the finetune/ folder?

yongzx avatar Jun 29 '22 14:06 yongzx

this makes sense to me, but I had problems downloading XNLI when there was a folder called "xnli" in the same path. Renaming to anything else (xnli_scripts, etc) fixes this problem.

haileyschoelkopf avatar Jul 01 '22 12:07 haileyschoelkopf

@haileyschoelkopf Fixed by a8486d4 (using scripts_*) instead.

yongzx avatar Jul 01 '22 13:07 yongzx

@yongzx I'm not sure. I think parameter-efficient finetuning should be included in lang-adapt/

lintangsutawika avatar Jul 01 '22 16:07 lintangsutawika