text icon indicating copy to clipboard operation
text copied to clipboard

Compare torchnlp to torchtext

Open PetrochukM opened this issue 6 years ago • 10 comments

Hi There!

To help folks sort through the clutter of libraries, please add a comparison of PyTorch-NLP and pytorch/text.

https://github.com/PetrochukM/PyTorch-NLP

Thanks!

PetrochukM avatar Jun 02 '18 20:06 PetrochukM

Would you rather PyTorch-NLP replace torchtext as the semi-official NLP support library for PyTorch? Or are there things you believe you still lack relative to torchtext?

jekbradbury avatar Jun 02 '18 22:06 jekbradbury

I think I got everything, missing a couple datasets. The codebase has a completely different design.

Happy to help however I can!

PetrochukM avatar Jun 02 '18 22:06 PetrochukM

@jekbradbury Checking in!

PetrochukM avatar Jun 08 '18 21:06 PetrochukM

Again!

PetrochukM avatar Jul 31 '18 16:07 PetrochukM

@PetrochukM just to check in and see which kind of datasets you would propose.

zhangguanheng66 avatar May 30 '19 22:05 zhangguanheng66

Are there plans to consolidate these two libraries?

Also are there any roadmaps available for releases/plans? I noticed torch/vision has a models/ directory and torch text does not. After a searching for a while it seems like NLP models are scattered across many github repositories with little to no support minus a hand full of research libraries such as AllenNLP.

torchtext and torchnlp seem to be trying to solve very similar problems according to my limited understanding

Are there suggestions for where to go for:

  1. models (hopefully verified to reproduce the published results and some level of community/repository owner maintenance)
  2. idioms around PyTorch + NLP
  3. access to open source datasets with a clear license for companies to evaluate legal exposure
  4. to contribute to the above list of useful code/data.

eedeleon avatar Jun 05 '19 20:06 eedeleon

@eedeleon Yes. There will be a release by the end of July (0.4.0). We are now planning to incorporate some common NLP models in torchtext (like torchvision) to support the research community. We also try to improve the merge process, including more completed docs, examples, and unit tests. If you have any models and dataset in mind, feel free to propose here or slack pytorch text.

zhangguanheng66 avatar Jun 07 '19 14:06 zhangguanheng66

@eedeleon Originally, there was a plan to deprecate torchtext and merge torchnlp into the official library; however, the Facebook PyTorch team has had a lack of resources and expertise in NLP to follow through on this process.

Also, as per Facebooks request, I did not have the time to dedicate a full-time effort to torchnlp.

There was also a plan from pytext and torchnlp to collaborate because pytext was finding they had to rewrite much of the code in torchnlp. That said, there was a lack of follow up.


@zhangguanheng66 I am glad that you are anticipating a major release for torchtext by the end of July. However, the lack of progress on torchtext historically gives me pause. Furthermore, there has been a lack of progress in the last month since you made your comment.

Furthermore, the lack of docs, examples, and unit tests is apparent.


That said, I'll continue to support and improve PyTorch-NLP. Our engineering team uses it for our research supported by the Allen NLP team. We will continue to make the improvements that help our use case.

We would love for more people to actively contribute to PyTorch-NLP inline with their needs!

PetrochukM avatar Jul 11 '19 04:07 PetrochukM

@PetrochukM Thanks for the comments. For the next release, I will try to add a few new supervised learning dataset, a tutorial to construct dataset with new pattern. We still have a lot of tech debt to fix but we have to prioritize them.

zhangguanheng66 avatar Jul 11 '19 15:07 zhangguanheng66

Say @PetrochukM I was wondering if there were any updates on this post? I have been looking at Torchtext lately, and the API has been completely revamped. Lots of major breaking change and even a different design concept as they try to be more consistent with pytorch Dataloaders, etc. BUT I am finding the package really hard to use because of a lack of documentation for the new API. And there are a lot of things like the Dataloaders need to be recreated after each epoch, because of StopIteration errors, etc. Are the two projects still mutually exclusive or are they replacements for each other, hybrid?

00krishna avatar Nov 29 '21 23:11 00krishna