torchscale
torchscale copied to clipboard
Foundation Architecture for (M)LLMs
Hi! torchscale 0.3.0 does not include LongNet. When will a new version with LongNet be released?
(torchscale) yehuicheng@bdp-gpu04:~/torchscale/examples/fairseq$ torchrun --nproc_per_node=8 --master_port 29501 --nnodes=1 train.py /home/data/dataset/yehuicheng/LongNet_example/DNA_example/longnet_example --num-workers 0 --activation-fn gelu --share-decoder-input-output-embed --validate-interval-updates 1000 --save-interval-updates 1000 --no-epoch-checkpoints --memory-efficient-fp16 --fp16-init-scale 4 --arch transformer --task language_modeling --sample-break-mode none --tokens-per-sample 4096...
I try the script :Breadcrumbs[torchscale](https://github.com/microsoft/torchscale/tree/main)/[examples](https://github.com/microsoft/torchscale/tree/main/examples) LongNet Model,but meet issue: /fairseq/(torchscale) :~/data/results/fairseq$ torchrun --nproc_per_node=8 --master_port 29501 --nnodes=1 train.py /home/data/dataset/yehuicheng/LongNet_example/DNA_example/longnet_example --num-workers 0 --activation-fn gelu --share-decoder-input-output-embed --validate-interval-updates 1000 --save-interval-updates 1000 --no-epoch-checkpoints --memory-efficient-fp16 --fp16-init-scale...