NeMo icon indicating copy to clipboard operation
NeMo copied to clipboard

Add ability to give seperate datasets for test, train and validation

Open shanmugamr1992 opened this issue 3 years ago • 20 comments

What does this PR do ?

Gives the user the ability to specify separate train,test and validation datasets as a dictionary in data_prefix for gpt model

Collection: nlp/language_modelling

Changelog

  • Supports dictionary format in data_prefix for megatron gpt config file.
  • Uses the existing data loader to load each individual dataset appropriately by changing split string

Usage

  • data_prefix: {train:/path/to/train, test:/path/to/test, validation:/path/to/validation}

Before your PR is "Ready for review"

Pre checks:

  • [x] Make sure you read and followed Contributor guidelines
  • [ ] Did you write any new necessary tests?
  • [x] Did you add or update any necessary documentation?
  • [ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • [ ] Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • [x] New Feature
  • [ ] Bugfix
  • [ ] Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas.

shanmugamr1992 avatar Aug 23 '22 21:08 shanmugamr1992

This pull request introduces 1 alert and fixes 1 when merging 33908e66ddba431d4892875da34b547f976fe401 into c097fa1f3541d406eeac804a8a674eddbc6fd7ce - view on LGTM.com

new alerts:

  • 1 for Unreachable code

fixed alerts:

  • 1 for Unreachable code

lgtm-com[bot] avatar Aug 25 '22 21:08 lgtm-com[bot]

This pull request introduces 1 alert and fixes 1 when merging 90d52263d1d7e0bf35356614d6010baef42da7be into 0e57b58a849f6275629910cdeebd608e528327bf - view on LGTM.com

new alerts:

  • 1 for Unreachable code

fixed alerts:

  • 1 for Unreachable code

lgtm-com[bot] avatar Aug 27 '22 00:08 lgtm-com[bot]

This pull request introduces 1 alert and fixes 1 when merging 549cbcc479fccc97563cdc7b76962279da65c66f into d19146cd5cad07151f94820ff92392a29c3225f6 - view on LGTM.com

new alerts:

  • 1 for Unreachable code

fixed alerts:

  • 1 for Unreachable code

lgtm-com[bot] avatar Aug 30 '22 23:08 lgtm-com[bot]

This pull request introduces 1 alert and fixes 1 when merging 656af9e26c5a8e246f8dd4313bbdd00c5eb7ed74 into e8ba60b648ae0fe04ca46d93a4d9e0f6537b521d - view on LGTM.com

new alerts:

  • 1 for Unreachable code

fixed alerts:

  • 1 for Unreachable code

lgtm-com[bot] avatar Aug 31 '22 23:08 lgtm-com[bot]

This pull request introduces 1 alert and fixes 1 when merging 97fd7152442f2025ae343580cec5de7613adaaf0 into 2ef4f357d8192a576ad47a14a284f646dc109dbd - view on LGTM.com

new alerts:

  • 1 for Unreachable code

fixed alerts:

  • 1 for Unreachable code

lgtm-com[bot] avatar Sep 01 '22 03:09 lgtm-com[bot]

This pull request introduces 1 alert and fixes 1 when merging cbeb64f4bfd932be46a28b51b9b39fbbe5b35c4d into 817f81cbbc6c73f847e78d55f048432f17a786ac - view on LGTM.com

new alerts:

  • 1 for Unreachable code

fixed alerts:

  • 1 for Unreachable code

lgtm-com[bot] avatar Sep 01 '22 19:09 lgtm-com[bot]

This pull request introduces 1 alert and fixes 1 when merging a16b3aa800df9e383b80fd6e2113c42e40bef15f into 817f81cbbc6c73f847e78d55f048432f17a786ac - view on LGTM.com

new alerts:

  • 1 for Unreachable code

fixed alerts:

  • 1 for Unreachable code

lgtm-com[bot] avatar Sep 01 '22 20:09 lgtm-com[bot]

This pull request introduces 1 alert and fixes 1 when merging 98e9d7d47868a9ceddb5ef080e06da6f3e2dea0e into dbe1a589de0624eb1fe815a9fd59f4c0210702d9 - view on LGTM.com

new alerts:

  • 1 for Unreachable code

fixed alerts:

  • 1 for Unreachable code

lgtm-com[bot] avatar Sep 02 '22 02:09 lgtm-com[bot]

This pull request introduces 1 alert and fixes 1 when merging 64f6c4053c74e7c98b3408857df5c006da42b473 into ea3c1b5cd8dd711e49cfe5ec8811cb7f2b7313cc - view on LGTM.com

new alerts:

  • 1 for Unreachable code

fixed alerts:

  • 1 for Unreachable code

lgtm-com[bot] avatar Sep 02 '22 17:09 lgtm-com[bot]

This pull request introduces 1 alert and fixes 1 when merging 35b5ed3f295ff7d9c930b0826afee47c09bb56fe into 1c16b966299203392aaba73090d820376a291974 - view on LGTM.com

new alerts:

  • 1 for Unreachable code

fixed alerts:

  • 1 for Unreachable code

lgtm-com[bot] avatar Sep 02 '22 23:09 lgtm-com[bot]

This pull request introduces 1 alert and fixes 1 when merging e2f8227b3d11e8b1d8596da95650db4eb64ea55f into 1c16b966299203392aaba73090d820376a291974 - view on LGTM.com

new alerts:

  • 1 for Unreachable code

fixed alerts:

  • 1 for Unreachable code

lgtm-com[bot] avatar Sep 03 '22 05:09 lgtm-com[bot]

This pull request introduces 1 alert and fixes 1 when merging acc9cec498b584f6f303194ce1657fc85075b5a9 into 1c16b966299203392aaba73090d820376a291974 - view on LGTM.com

new alerts:

  • 1 for Unreachable code

fixed alerts:

  • 1 for Unreachable code

lgtm-com[bot] avatar Sep 03 '22 20:09 lgtm-com[bot]

This pull request introduces 1 alert and fixes 1 when merging d5a1421fb2f57af75d98f343393d023128a23055 into 760d0c813922eac41fdfc62ace1b68a74d25d4c4 - view on LGTM.com

new alerts:

  • 1 for Unreachable code

fixed alerts:

  • 1 for Unreachable code

lgtm-com[bot] avatar Sep 06 '22 20:09 lgtm-com[bot]

This pull request introduces 1 alert and fixes 1 when merging 959b7dce56aebae514f8c950dd559acaa6cd90ba into d29a66bc5344415a134fac597be095b1271a4ce7 - view on LGTM.com

new alerts:

  • 1 for Unreachable code

fixed alerts:

  • 1 for Unreachable code

lgtm-com[bot] avatar Sep 07 '22 05:09 lgtm-com[bot]

This pull request introduces 1 alert and fixes 1 when merging 67cc7bdc083ec3144eb0afaf2160db5ffd5573b6 into abbe6430e314a0159370e198f16b75dcd75ba3f7 - view on LGTM.com

new alerts:

  • 1 for Unreachable code

fixed alerts:

  • 1 for Unreachable code

lgtm-com[bot] avatar Sep 07 '22 21:09 lgtm-com[bot]

This pull request introduces 1 alert and fixes 1 when merging d3765bc59a54488233a7049b8ac4bdd3f3cffc70 into abbe6430e314a0159370e198f16b75dcd75ba3f7 - view on LGTM.com

new alerts:

  • 1 for Unreachable code

fixed alerts:

  • 1 for Unreachable code

lgtm-com[bot] avatar Sep 07 '22 23:09 lgtm-com[bot]

This pull request introduces 1 alert and fixes 1 when merging 8cc056f3fb4e528245156549c13969f2efa832d9 into b18f9057e83d711e7bf0b36c8494a1d3be093394 - view on LGTM.com

new alerts:

  • 1 for Unreachable code

fixed alerts:

  • 1 for Unreachable code

lgtm-com[bot] avatar Sep 08 '22 19:09 lgtm-com[bot]

This pull request introduces 1 alert and fixes 1 when merging b68566aaee73082243a47b5690b142987da45ff0 into b9cf05cf76496b57867d39308028c60fef7cb1ba - view on LGTM.com

new alerts:

  • 1 for Unreachable code

fixed alerts:

  • 1 for Unreachable code

lgtm-com[bot] avatar Sep 09 '22 01:09 lgtm-com[bot]

This pull request introduces 1 alert and fixes 1 when merging 48f89e4dcd629fc96ad11c38040b1f89e370d5cd into b9cf05cf76496b57867d39308028c60fef7cb1ba - view on LGTM.com

new alerts:

  • 1 for Unreachable code

fixed alerts:

  • 1 for Unreachable code

lgtm-com[bot] avatar Sep 09 '22 02:09 lgtm-com[bot]

This pull request introduces 1 alert and fixes 1 when merging 89d7bec21539aea008e887b1521682d5aeda9e89 into 7357d4b107c821ff9f890ae1b9d9d0b9fe207890 - view on LGTM.com

new alerts:

  • 1 for Unreachable code

fixed alerts:

  • 1 for Unreachable code

lgtm-com[bot] avatar Sep 09 '22 19:09 lgtm-com[bot]