t2vec icon indicating copy to clipboard operation
t2vec copied to clipboard

Significance of index offset in saveKNearestVocabs

Open arcrank opened this issue 5 years ago • 1 comments

Hello,

I had been looking through code and porting parts of it python. In saveKNearestVocabs there is a part that has an offset in a for loop over the vocab, at first I thought it was just because differences between julia being 1-indexed and python being 0-indexed but now I am not sure

function saveKNearestVocabs(region::SpatialRegion, datapath::String)
    V = zeros(Int, region.k, region.vocab_size)
    D = zeros(Float64, region.k, region.vocab_size)
    for vocab in 0:region.vocab_start-1
        V[:, vocab+1] .= vocab
        D[:, vocab+1] .= 0.0
    end
    for vocab in region.vocab_start:region.vocab_size-1
        cell = region.vocab2hotcell[vocab]
        kcells, dists = knearestHotcells(region, cell, region.k)
        kvocabs = map(x->region.hotcell2vocab[x], kcells)
        V[:, vocab+1] .= kvocabs
        D[:, vocab+1] .= dists

The resulting file just has an empty first entry in the V and D arrays, since the PAD token is actually at index 1 and then the vocab which starts at 4 is now at index 5. Is there a downstream motivation for this or just how it was first implemented?

arcrank avatar Jun 16 '20 21:06 arcrank

It is a preserved interface at the time of its implementation just in case that we might want to change the vocab_start in the future. You can ignore it in your implementation.

boathit avatar Jul 04 '20 01:07 boathit