James Jing Tang comments

Results 8 comments of


                                            James Jing Tang

strange view() operation in ReduceState module

@atulkum One more question, in your code, step_coverage_loss is the sum of the minimum of attn_dist and coverage in each element. https://github.com/atulkum/pointer_summarizer/blob/5e511697d5f00cc474370fd76ac1da450ffd4d2e/training_ptr_gen/train.py#L99 And coverage is coverage + attn_dist. https://github.com/atulkum/pointer_summarizer/blob/5e511697d5f00cc474370fd76ac1da450ffd4d2e/training_ptr_gen/model.py#L124 So,...

strange view() operation in ReduceState module

@atulkum have you ever try to set is_coverage as True, it's extremely easy to cause loss became NaN, less learning rate is useless for this issue.

strange view() operation in ReduceState module

@atulkum I think this operation may cause NaN https://github.com/atulkum/pointer_summarizer/blob/fd8dda35390d058c1745b9495634ea0ddadf71ad/training_ptr_gen/model.py#L95 calculating the memory of attention in each decoder step may create many computation graph branch in torch backend, but in fact...

James Jing Tang

strange view() operation in ReduceState module

strange view() operation in ReduceState module

strange view() operation in ReduceState module

strange view() operation in ReduceState module

strange view() operation in ReduceState module

请问这个层次分类支持bert吗？

编译源码和提取map都有问题，还请解答一下！

编译源码和提取map都有问题，还请解答一下！