thinc icon indicating copy to clipboard operation
thinc copied to clipboard

Add `SparseLinear_v2`, fixing indexing issues

Open danieldk opened this issue 2 years ago • 10 comments

Introduce SparseLinear_v2 to fix indexing issues SparseLinear does not correctly index the gradient/weight matrix (#752). This change fixes the indexing, so that the full matrix is used.

To retain compatibility with existing models that use SparseLinear, which works relatively well if there are not too many hash collisions, the fixed version is renamed to SparseLinear_v2.


While at it, fix another indexing-related bug:

The output of MurMur hashes were mapped to array indices as follows:

idx = hash & (nr_weight-1)

This works well when nr_weight is a power of two. For instance, if we have 16 buckets:

idx = hash & 15
idx = hash & 0b1111

However, when the user uses a bucket count that is not a power of two, this breaks down. For instance, if we have 15 buckets:

idx = hash & 14
idx = hash & 0b1110

This would mask out all odd indices. We fix this by using the modulus instead. To preserve compatibility with existing models, this change is only added to SparseLinear_v2.

danieldk avatar Sep 02 '22 08:09 danieldk

Do you really need to do the older version of the hash mapping at all?

adrianeboyd avatar Sep 02 '22 08:09 adrianeboyd

Do you really need to do the older version of the hash mapping at all?

I don't know. If someone used

SparseLinear(length=not_a_power_of_2)

their models would break by changing from bit masking to modulo.

danieldk avatar Sep 02 '22 08:09 danieldk

But with this as SparseLinear_v2 I'm not sure where that would happen?

adrianeboyd avatar Sep 02 '22 08:09 adrianeboyd

But with this as SparseLinear_v2 I'm not sure where that would happen?

I am not sure I follow? There may be existing uses for SparseLinear/SparseLinear_v1 out there with lengths that are not a power of 2? So, we can't really remove SparseLinear without breaking the API. But we can also not retroactively fix SparseLinear, because the indices would change (when a lengths that is not a power of two is used) and existing models would output garbage.

danieldk avatar Sep 02 '22 08:09 danieldk

Ah, I thought more was redefined than actually is. I think it would be clearer to call it something like v1_indexing.

adrianeboyd avatar Sep 02 '22 09:09 adrianeboyd

Ah, I thought more was redefined than actually is. I think it would be clearer to call it something like v1_indexing.

Changed the name to v1_indexing.

danieldk avatar Sep 02 '22 17:09 danieldk

You'll need to merge in the latest from master to avoid the CI failure with the macOS 10.15 environment.

svlandeg avatar Sep 06 '22 13:09 svlandeg

You'll need to merge in the latest from master to avoid the CI failure with the macOS 10.15 environment.

Merged.

danieldk avatar Sep 07 '22 08:09 danieldk

@explosion-bot please test_slow_gpu

shadeMe avatar Sep 07 '22 08:09 shadeMe

🪁 Successfully triggered build on Buildkite

URL: https://buildkite.com/explosion-ai/thinc-slow-gpu-tests/builds/27

explosion-bot avatar Sep 07 '22 08:09 explosion-bot

Would the slow GPU tests pass after the changes Madeesh made for tensorflow?

adrianeboyd avatar Oct 24 '22 08:10 adrianeboyd

The fix for the failing tests is in this commit. So, performing a new rebase on master ought to sort out the GPU test failures.

shadeMe avatar Oct 24 '22 09:10 shadeMe

@explosion-bot please test_slow_gpu

danieldk avatar Nov 14 '22 14:11 danieldk

🪁 Successfully triggered build on Buildkite

URL: https://buildkite.com/explosion-ai/thinc-slow-gpu-tests/builds/36

explosion-bot avatar Nov 14 '22 14:11 explosion-bot

@explosion-bot please test_slow_gpu

danieldk avatar Nov 14 '22 16:11 danieldk

🪁 Successfully triggered build on Buildkite

URL: https://buildkite.com/explosion-ai/thinc-slow-gpu-tests/builds/37

explosion-bot avatar Nov 14 '22 16:11 explosion-bot