AdvancedLiterateMachinery icon indicating copy to clipboard operation
AdvancedLiterateMachinery copied to clipboard

some questions about embedding in the code and in the paper

Open kenneys-bot opened this issue 2 years ago • 1 comments
trafficstars

issue

    if inputs_embeds is None:
        inputs_embeds = self.word_embeddings(input_ids)
    token_type_embeddings = self.token_type_embeddings(token_type_ids)

    embeddings = inputs_embeds + token_type_embeddings
    if self.position_embedding_type == "absolute":
        position_embeddings = self.position_embeddings(position_ids)
        embeddings += position_embeddings

    if "line_bbox" in kwargs:
        embeddings += self._cal_spatial_position_embeddings(kwargs["line_bbox"])

    if "line_rank_id" in kwargs:
        embeddings += self.line_rank_embeddings(kwargs["line_rank_id"])

    if "line_rank_inner_id" in kwargs:
        embeddings += self.line_rank_inner_embeddings(kwargs["line_rank_inner_id"])`

For the Token Embeddings, 1D Seg.Rank Embeddings and 1D Seg. BIE Embeddings in the figure, I couldnot understand their meanings, and there is no clear explanation in the paper, finally I found the corresponding position in the code for debugging. As a result, a new problem was encountered. What exactly are inputs_embeds and token_type_embeddings in the code? Is the result of adding them both together the Token Embeddings in the diagram? 1D Seg. Rank Embeddings are line_rank_embeddings? 1D Seg. BIE Embeddings are line_rank_inner_embeddings? Very much looking forward to getting a quickly reply from the developer soon!

kenneys-bot avatar Nov 01 '23 03:11 kenneys-bot

Hi, The token type embeddings are the same for all tokens. It is remained as used in BERT/BROS. The 1D Seg. Rank Embeddings and 1D Seg. BIE Embeddings are exactly what you mensioned.

ccx1997 avatar Nov 10 '23 07:11 ccx1997