Achyudh Ram comments

Results 13 comments of


                                            Achyudh Ram

Testing a trained model throws error

Across hedwig, the --trained-model arg is used to point towards a snapshot. So to fix this should we add a flag to allow testing directly on pretrained models?

Killed message on trying models.reg_lstm

I've seen this happen in cases where there isn't enough system memory. Can you please check if that's the issue by monitoring memory usage?

[WIP] DocBERT Colab notebook

> I also removed the hugging-face flag, although it would be nice if the weights were already in `hedwig-data`. Should I rename this PR and make a seperate one for...

[DocBERT] Can DocBERT bin any length document without truncaiton?

Yup, 512 tokens

How to feed the document data into DocBert

@xdwang0726 For BERT, we do treat the entire document as a single sentence. For the hierarchical version of BERT (H-BERT), we split the document into its constituent sentences.

How to feed the document data into DocBert

@tralfamadude I am not sure how you would be able to use the pre-trained models for more than a thousand tokens. Since the maximum sequence length of the pre-trained models...

How to feed the document data into DocBert

@xdwang0726 yes, if I understand your question correctly, that is the case for BERT

Yeah https://github.com/castorini/hedwig/pull/38 adapts the model from https://arxiv.org/pdf/1607.01759.pdf for document classification, though you might have to dig into the implementation to see if there are differences between our model and Facebook's...

Custom dataset error

I see that the last three elements have values 0, 0, 0. Even though your input is of non-zero length, the length vector might not have been set properly. Could...

Create a simple, reproducible, out-of-the-box snapshot for doc classification paper

As suggested in the meeting today, let's split Castor and deal with cleaning up code for the single text sequence tasks first. It would be nice if we can have...