rasa icon indicating copy to clipboard operation
rasa copied to clipboard

Follow up on Tensorflow's recommendation for investigating the slowdown

Open dakshvar22 opened this issue 2 years ago • 3 comments

We have an ongoing conversation with TensorFlow where we are trying to debug the slowdown in certain TF operations that we have seen after upgrading to TF 2.6 . The tensorflow team has come back with a recommendation on what could help us get around the slowdown.

We have made some follow through on the recommendation by running the model regression tests on GPU with the advised change. The initial findings suggest that there is no significant difference in the training time and test time for the affected configs.

We wanted to run the same model regression tests on CPU but the training and test times that are reported in the model regression test workflow cannot be trusted because of various reasons. An initial run done locally suggests that there could indeed be an improvement when pipelines are run on CPU. I ran Sparse + DIET(seq) + ResponseSelector(t2t) on Carbon bot data locally with and without the advised change. Here are the results I got:

| 2.8.x | Sparse + DIET(seq) + ResponseSelector(t2t) | Carbon Bot | 550s | | remove_gradient_tape | Sparse + DIET(seq) + ResponseSelector(t2t) | Carbon Bot | 430s |

This suggests that the change probably results in an improvement but only on CPU. This needs to be further validated by running experiments on GCP instances so that we have trustable numbers on these.

For comparison to what we had before the upgrade, this sheet has training and testing time numbers before and after the TF 2.6 upgrade.

The pipelines which have specifically taken a hit after the upgrade are the ones which include DIETClassifier with entity extraction, E2E TEDpolicy with entity extraction, or the ones containing the LanguageModelFeaturizer.

Definition of Done:

  • [x] Run experiments to get these numbers with and without the change for all configurations and datasets that we have been benchmarking on (combinations of NLU + config listed here):

    • Training time
    • Testing time
    • F1 scores (for intent classification, entity recognition and response selector as applicable)
  • [ ] Based on gathered numbers, followup on the issue on the TF repo to give them feedback on the recommendation.

dakshvar22 avatar Oct 22 '21 15:10 dakshvar22

Exalate commented:

koernerfelicia commented:

Scripts used for the last round of "benchmarking" are here. This may be useful as a starting point. However, note that setup for this should be considerably simpler, because you won't have to bother with GPU/CUDA compatibility, and the two versions to compare even have the same dependencies.

koernerfelicia avatar Oct 22 '21 15:10 koernerfelicia

@dakshvar22 Do you know if Tensorflow had any insight into why the change resulted in improvements for CPU only? cc: @camattin

b-quachtran avatar Mar 31 '22 17:03 b-quachtran

Hey @b-quachtran Apologies for the delay in response. As far as I can see, we haven't shared the benchmarking numbers on GPU with them. Our contractor Markus was leading that investigation and I'm not sure why those numbers haven't been shared. Although, the concerned tensorflow folks have also been really silent on this and have not come back with responses in 4-5 months on our previous queries.

dakshvar22 avatar Apr 19 '22 12:04 dakshvar22

➤ Maxime Verger commented:

:bulb: Heads up! We're moving issues to Jira: https://rasa-open-source.atlassian.net/browse/OSS.

From now on, this Jira board is the place where you can browse (without an account) and create issues (you'll need a free Jira account for that). This GitHub issue has already been migrated to Jira and will be closed on January 9th, 2023. Do not forget to subscribe to the corresponding Jira issue!

:arrow_right: More information in the forum: https://forum.rasa.com/t/migration-of-rasa-oss-issues-to-jira/56569.

sync-by-unito[bot] avatar Dec 19 '22 12:12 sync-by-unito[bot]