w2vv [Feature Request]: Better experience with reproducing results

First of all, I want to thank you for this repository with well-written documentation. But trying to reproduce your results, wanted to share some feedback on how it could be even better.

Setting up an environment is very slow:
- Downloading all required files from http://lixirong.net/ is very slow and regularly fails due to network errors. For example, it takes a few days to download word2vec.tar.gz
  - Is it possible to host files on some well-supported file-sharing system such as Google Drive and redirect requests from your site to it? Or store them directly in git with Git Large File Storage
  - Ideally, is to have everything accessible from an online-container with code&data without the need to download anything at all, such as Kaggle
- Since support for python 2 is about to end, it might be a good idea to migrate this repo to python 3 as well. I already did it so that it works in my environment, will try to attach diff if everything works properly.
README does a great job explaining details of the project and how to reproduce it locally. Although, an area of specific content/structure of some data files wasn't clear for me. So I'm currently in the process of exploring your paper and the source code to figure it out. It would have been a great addition to the guide if dataset and word2vec were more verbosely covered.

One more time, that's just my subjective feedback on how to make this repository even better, if you have the possibility to maintain it. Thank you again for putting a lot of work in organizing code and README here, that helped a lot!

Thanks, Oleh

Oct 30 '19 13:10 OlehOnyshchak

First of all, I want to thank you for this repository with well-written documentation. But trying to reproduce your results, wanted to share some feedback on how it could be even better.

* Setting up an environment is very slow:
  
  * Downloading all required files from http://lixirong.net/ is very slow and regularly fails due to network errors. For example, it takes a few days to download [word2vec.tar.gz](http://lixirong.net/data/w2vv-tmm2018/word2vec.tar.gz)
    
    * Is it possible to host files on some well-supported file-sharing system such as [Google Drive](https://www.google.com/drive/) and redirect requests from your site to it? Or store them directly in git with [Git Large File Storage](https://git-lfs.github.com/)
    * Ideally, is to have everything accessible from an online-container with code&data without the need to download anything at all, such as [Kaggle](https://www.kaggle.com/)
  * Since support for python 2 is [about to end](https://github.com/python/devguide/pull/344), it might be a good idea to migrate this repo to python 3 as well. I already did it so that it works in my environment, will try to attach diff if everything works properly.

* README does a great job explaining details of the project and how to reproduce it locally. Although, an area of specific content/structure of some data files wasn't clear for me. So I'm currently in the process of exploring your paper and the source code to figure it out.  It would have been a great addition to the guide if dataset and word2vec were more verbosely covered.

One more time, that's just my subjective feedback on how to make this repository even better, if you have the possibility to maintain it. Thank you again for putting a lot of work in organizing code and README here, that helped a lot!

Thanks, Oleh

Hi Oleh, We are also trying to reproduce the results but since py2 is no longer available we are facing issues. Hope you could share your py3 code! That'd be a great deal for us. And also what are the versions of all other libraries you used?

Thanking you Rakesh

Jan 24 '20 10:01 RakeshRadarapu

Sorry for the late reply. The data can be downloaded from Google Drive and Baidu Pan.

Jan 27 '20 08:01 danieljf24

Hi @RakeshRadarapu. There were too much behaviour changes in depended libraries when migrating from py2 to py3, so we decided to work with the original version after one day of trying.

Although, the following resources might be of your interest:

Wikipedia Image Recommendation project, where we reused Word2VisualVec
Kaggle Data Preprocessing notebook, where we transformed our data into Word2VisualVec format
Kaggle Word2VisualVec training, where we set up py2 environment for Word2VisualVec model. You can just fork this notebook and train the model with your data without any downloads/configurations

Thank you for the alternative links to download the dataset @danieljf24

Jan 27 '20 10:01 OlehOnyshchak

Hi @RakeshRadarapu. There were too much behaviour changes in depended libraries when migrating from py2 to py3, so we decided to work with the original version after one day of trying.

Although, the following resources might be of your interest:
* [Wikipedia Image Recommendation](https://github.com/OlehOnyshchak/WikiImageRecommendation) project, where we reused Word2VisualVec

* [Kaggle Data Preprocessing notebook](https://www.kaggle.com/jacksoncrow/dataset-preprocessing), where we transformed our data into Word2VisualVec format

* [Kaggle Word2VisualVec training](https://www.kaggle.com/jacksoncrow/w2vvtraining), where we set up py2 environment for Word2VisualVec model. You can just fork this notebook and train the model with your data without any downloads/configurations
Thank you for the alternative links to download the dataset @danieljf24

Hi Oleh, Thanks for the concern. Can I know the Tensorflow and the cuda version you used for the py2 model.

Jan 27 '20 18:01 RakeshRadarapu

Hi Oleh, Thanks for the concern. Can I know the Tensorflow and the cuda version you used for the py2 model.

Hi @RakeshRadarapu. Tensorflow version is 1.15, while CUDA is V10.0.130. You can also check this and other environment details in interactive python prompt via the link to Kaggle, which I provided above.

Feb 06 '20 18:02 OlehOnyshchak

w2vv w2vv copied to clipboard

[Feature Request]: Better experience with reproducing results

w2vv
w2vv copied to clipboard