NeMo
NeMo copied to clipboard
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
I try to train ~5.6G data with ~700M validation Using this below command: python /workspace/data/NeMo/examples/nlp/language_modeling/bert_pretraining.py --config-name=/workspace/data/NeMo/examples/nlp/language_modeling/conf/bert_pretraining_from_text_config.yaml model.train_ds.data_file="/workspace/data/NeMo/lm/data/public-data/train.txt" model.validation_ds.data_file="/workspace/data/NeMo/lm/data/public-data/val.txt" model.train_ds.batch_size=128 model.optim.lr=5e-5 trainer.max_epochs=1 trainer.gpus=1 Then it show this error GPU available: True,...
# What does this PR do ? This is a work-in-progress for a model and data set that performs multilingual punctuation restoration, true casing, and sentence boundary detection. See `Usage`...
# What does this PR do ? Adds a script to load Wav2Vec2.0 weights from Fair to NeMo implementation. Also adjusts NeMo implementation to be similar to Fair's. **Collection**: [ASR]...
The PR adds the new SOTA model we have for intent classification and slot filling for spoken language understanding. - [x] Move transformer modules from `nemo.collections.nlp.modules.common` to `nemo.collections.common.parts`, see also...
**Describe the bug** I'm trying to convert .nemo to .riva, but I'm getting message `AttributeError: 'EncDecRNNTBPEModel' object has no attribute 'input_example`. Converting the Conformer-CTC model is OK but the Conformer-Transducer...
Can I get more data on the dataset cleaning process? I am a non spanish speaker 😅. Like voxpoplui datasets says 120hrs after cleaning. I downloaded the dataset it has...
**Describe the bug** Hi, I am trying to convert Biomegatron Model finetuned for token classification tasks using nemo to onnx format. I am getting the following error: ARNING: The shape...
All LR schedulers in PyTorch do not have the `max_steps` parameter, so we should not add `max_steps` to their `scheduler_args`. Previous code tackle the problem in case-by-case manner, while here...
Signed-off-by: Lily Lee # What does this PR do ? Zero Shot Slot Filling Model **Collection**: NLP # Changelog - Add specific line by line info of high level changes...
**Describe the bug** I am not getting correct time stamps for speech segment and many speech chunks are removed. I am using pretrained Marblenet and speakerdiarization_speakernet models. It removes lots...