John Giorgi issues

Results 64 issues of


                                            John Giorgi

Availble pre-trained model?

Hi, I was hoping to compare this approach with my [own sentence embedding method](https://github.com/JohnGiorgi/DeCLUTR). Sorry if this is mentioned somewhere (I couldn't find it) but is the "best" pretrained model...

Upload "Raw data, bad retrievals removed" as a zip

Hi! Would it be possible to upload the [Raw data, bad retrievals removed](https://drive.google.com/open?id=1jwBzXBVv8sfnFrlzPnSUBHEEAbpIUnFq) data as a `.zip` to the Google Drive, similar to `multi-news-original`? [I am trying to point the...

Don't cache reinit_modules

Fixes https://github.com/allenai/allennlp/pull/5505#issuecomment-1007540627 Changes proposed in this pull request: - Don't cache transformers when `reinit_modules` is provided. - Removes `reinit_modules` from the transformer spec - Always load a new model when...

Update scispacy version on streamlit demo

I am getting different results for the same input text when I use the streamlit demo vs. when I run the code locally. The text in question: ```python text =...

bug

Don't shuffle the dataset when num_epochs=1

Currently, the dataset reader will shuffle the dataset during every epoch. In order to do this, it reads the entire dataset into memory, shuffles it, then yields instances one-by-one. This...

enhancement

Models don't load with allennlp>=1.2.0

The pretrained models do not load properly with `allennlp>=1.2.0`. The error reported is: ``` RuntimeError: Error loading state dict for DeCLUTR Missing keys: [] Unexpected keys: ['_text_field_embedder.token_embedder_tokens.transformer_model.roberta.pooler.dense.weight', '_text_field_embedder.token_embedder_tokens.transformer_model.roberta.pooler.dense.bias'] ``` For...

Training progress bar no longer displays

On the latest pre-release of AllenNLP, there is no longer a progress bar indicating the remaining training time. Figure out why!

invalid

Easiest way to use the model as a "library"?

Hi, I am interested in comparing to your QuickThoughts method by evaluating it on the full SentEval benchmark. To do that I need to write something like the following: ```python...

Error with get_state in download.py

Hi, I downloaded the pre-filtered URL list from [here](https://mega.nz/#F!EZZD0YwJ!9_PlEQzdMVLaNdKv_ICNVQ), and then tried to extract the text with `download.py` as per the readme ```bash python download.py url_dumps_deduped/RS_2018-07.xz.deduped.txt \ --n_procs 40 \...

max_len returns unexpected value

Hi, I noticed something weird about the `max_len` attribute of the `tokenizer` ```python from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("allenai/scibert_scivocab_uncased") print(tokenizer.max_len) # => 1000000000000000019884624838656 ``` Whereas I expected it to...