thinc
thinc copied to clipboard
Add `SparseLinear_v2`, fixing indexing issues
Introduce SparseLinear_v2
to fix indexing issues SparseLinear
does not correctly index the gradient/weight matrix (#752). This change fixes the indexing, so that the full matrix is used.
To retain compatibility with existing models that use SparseLinear
, which works relatively well if there are not too many hash collisions, the fixed version is renamed to SparseLinear_v2
.
While at it, fix another indexing-related bug:
The output of MurMur hashes were mapped to array indices as follows:
idx = hash & (nr_weight-1)
This works well when nr_weight
is a power of two. For instance,
if we have 16 buckets:
idx = hash & 15
idx = hash & 0b1111
However, when the user uses a bucket count that is not a power of two, this breaks down. For instance, if we have 15 buckets:
idx = hash & 14
idx = hash & 0b1110
This would mask out all odd indices. We fix this by using the modulus instead. To preserve compatibility with existing models, this change is only added to SparseLinear_v2
.
Do you really need to do the older version of the hash mapping at all?
Do you really need to do the older version of the hash mapping at all?
I don't know. If someone used
SparseLinear(length=not_a_power_of_2)
their models would break by changing from bit masking to modulo.
But with this as SparseLinear_v2
I'm not sure where that would happen?
But with this as
SparseLinear_v2
I'm not sure where that would happen?
I am not sure I follow? There may be existing uses for SparseLinear
/SparseLinear_v1
out there with lengths that are not a power of 2? So, we can't really remove SparseLinear
without breaking the API. But we can also not retroactively fix SparseLinear
, because the indices would change (when a lengths that is not a power of two is used) and existing models would output garbage.
Ah, I thought more was redefined than actually is. I think it would be clearer to call it something like v1_indexing
.
Ah, I thought more was redefined than actually is. I think it would be clearer to call it something like
v1_indexing
.
Changed the name to v1_indexing
.
You'll need to merge in the latest from master
to avoid the CI failure with the macOS 10.15 environment.
You'll need to merge in the latest from
master
to avoid the CI failure with the macOS 10.15 environment.
Merged.
@explosion-bot please test_slow_gpu
🪁 Successfully triggered build on Buildkite
URL: https://buildkite.com/explosion-ai/thinc-slow-gpu-tests/builds/27
Would the slow GPU tests pass after the changes Madeesh made for tensorflow?
The fix for the failing tests is in this commit. So, performing a new rebase on master
ought to sort out the GPU test failures.
@explosion-bot please test_slow_gpu
🪁 Successfully triggered build on Buildkite
URL: https://buildkite.com/explosion-ai/thinc-slow-gpu-tests/builds/36
@explosion-bot please test_slow_gpu
🪁 Successfully triggered build on Buildkite
URL: https://buildkite.com/explosion-ai/thinc-slow-gpu-tests/builds/37