Sparse Domain Isolation for supporting large-scale Recommender Systems.

Status	Draft
Author(s)	Haidong Rong ([email protected]) Yafei Zhang([email protected]) Jiandong Wang([email protected]) Chuan Cheng([email protected])
Reviewers(s)	Alexandre Passos([email protected]) Bairen Yi([email protected])
Sponsor	Yuefeng Zhou ([email protected]) Zhenyu Tan ([email protected])
Updated	2020-09-16

@yuefengz @byronyi Hi,
This is the RFC of Sparse Domain Isolation for supporting large-scale Recommender Systems. It ’s still a draft. We will update the latest content as soon as possible, we can improve on this basis. In order to push forward as soon as possible, I first submitted here but the owners are everyone who participated in the discussion in the past, and we will complete the list later.

Apr 25 '20 12:04 rhdong

@byronyi If we are going to contribute to addon first, do we need a RFC here?

Apr 30 '20 07:04 yuefengz

Since this RFC targets for SIG AddOns, add SIG AddOns leads @facaiy @seanpmorgan and TF sponsor @karmel as reviewers.

Apr 30 '20 07:04 smilingday

@byronyi If we are going to contribute to addon first, do we need a RFC here?

I guess the design was originally targeted to TF core.

As @alextp said, if part of it still requires changes to TF core, then we still need a (probably smaller) RFC here.

Apr 30 '20 17:04 byronyi

It requires changes to core that we should discuss now. From my point of view the most important feature tf core can offer here is allowing experimentation and development of this type of problem (for which there is very high demand at least in industry) to happen without needing to involve tf core.

Separately from that I think the design of the actual components here has many interesting parts, and a fairly close version of these components to what is proposed here should be in core, but I think it's more important now that we make core properly extensible than that we debate the details of this component.

On Thu, Apr 30, 2020 at 10:56 AM Bairen Yi [email protected] wrote:

@byronyi https://github.com/byronyi If we are going to contribute to addon first, do we need a RFC here?

I guess the design was originally targeted to TF core.

As @alextp https://github.com/alextp said, if part of it still requires changes to TF core, then we still need a (probably smaller) RFC here.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/community/pull/237#issuecomment-622008574, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRI2TS4DEYRY3FU2IBDRPG3TXANCNFSM4MQXNN6A .

--

Alex

Apr 30 '20 17:04 alextp

That's a very interesting proposal.

From a high level view (and I'm probably wrong) it looks like it proposes a new type of variable and a new type of optimizer which can update that variable. Given that this is the case I think we can implement this in addons or some other SIG package as long as there are APIs in core TF to ensure that this variable can declare itself checkpointable, be tracked by something like tf.Module / keras.Model (so you can do model.trainable_sparse_variables), and maybe be automatically watched via the gradient tape.

Can you expand the document to clarify the details of these changes to existing parts of TF as opposed to most of the content which is on the new types?

Thanks!

Thank you, In fact, My initial idea was to encapsulate some kind of ResourceVariable backed Hashtable, as we know TF is not good at training any non tf.Variable. I reuse lookup.MutableHashTable because I don't like to write a new hash lib in TF , especially, lookup.XX support checkpointable and deployable on tf.distribute.Server. Here is the compare based on v1.15.2 shows that the range of core effected by the RFC: https://github.com/tensorflow/tensorflow/compare/v1.15.2...rhdong:rfc?expand=1

The main changes:

supporting the random initiallizer on lookup.MutableHashTable.Find
Four stateful optimizers(Adagrad, Adam, FTRL, Momentum) adaptation.(Maybe cancelled in new scheme)

Thanks!

May 04 '20 17:05 rhdong

The change to the existing SparseApply* kernels which removes Ref(T) from the signature is backwards incompatible and can't be done.

Adding new kernels for the hash apply is fine, though.

I do wonder if we need the Optimizer method _apply_dense_hash or whether we can use a separate optimizer-like class which knows about the hash application. This has the advantage that it naturally covers the use cases where people want different optimizers for the really sparse embedding layers (which I think is relatively common).

On Mon, May 4, 2020 at 10:17 AM rhdong [email protected] wrote:

That's a very interesting proposal.

From a high level view (and I'm probably wrong) it looks like it proposes a new type of variable and a new type of optimizer which can update that variable. Given that this is the case I think we can implement this in addons or some other SIG package as long as there are APIs in core TF to ensure that this variable can declare itself checkpointable, be tracked by something like tf.Module / keras.Model (so you can do model.trainable_sparse_variables), and maybe be automatically watched via the gradient tape.

Can you expand the document to clarify the details of these changes to existing parts of TF as opposed to most of the content which is on the new types?

Thanks!

Thank Alex, In fact, My initial idea was to encapsulate a some kind of ResourceVariable backed Hashtable, as we know TF is not good at training any non tf.Variable. I reuse lookup.MutableHashTable because I don't like to write a new hash lib in TF , especially, lookup.XX support checkpointable and deployable on tf.distribute.Server. Here is the compare based on v1.15.2 shows that the range of core effected by the RFC:

https://github.com/tensorflow/tensorflow/compare/v1.15.2...rhdong:rfc?expand=1

The main changes:

supporting the random initiallizer on lookup.MutableHashTable.Find

Four stateful optimizers(Adagrad, Adam, FTRL, Momentum) adaptation.(Maybe cancelled in new schema)

Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/community/pull/237#issuecomment-623592882, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRMN3Q36LC7PKGV6URDRP32D7ANCNFSM4MQXNN6A .

--

Alex

May 04 '20 17:05 alextp

The change to the existing SparseApply* kernels which removes Ref(T) from the signature is backwards incompatible and can't be done. Adding new kernels for the hash apply is fine, though. I do wonder if we need the Optimizer method _apply_dense_hash or whether we can use a separate optimizer-like class which knows about the hash application. This has the advantage that it naturally covers the use cases where people want different optimizers for the really sparse embedding layers (which I think is relatively common). … On Mon, May 4, 2020 at 10:17 AM rhdong @.***> wrote: That's a very interesting proposal. From a high level view (and I'm probably wrong) it looks like it proposes a new type of variable and a new type of optimizer which can update that variable. Given that this is the case I think we can implement this in addons or some other SIG package as long as there are APIs in core TF to ensure that this variable can declare itself checkpointable, be tracked by something like tf.Module / keras.Model (so you can do model.trainable_sparse_variables), and maybe be automatically watched via the gradient tape. Can you expand the document to clarify the details of these changes to existing parts of TF as opposed to most of the content which is on the new types? Thanks! Thank Alex, In fact, My initial idea was to encapsulate a some kind of ResourceVariable backed Hashtable, as we know TF is not good at training any non tf.Variable. I reuse lookup.MutableHashTable because I don't like to write a new hash lib in TF , especially, lookup.XX support checkpointable and deployable on tf.distribute.Server. Here is the compare based on v1.15.2 shows that the range of core effected by the RFC: https://github.com/tensorflow/tensorflow/compare/v1.15.2...rhdong:rfc?expand=1 The main changes: 1. supporting the random initiallizer on lookup.MutableHashTable.Find 2. Four stateful optimizers(Adagrad, Adam, FTRL, Momentum) adaptation.(Maybe cancelled in new schema) Thanks! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#237 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRMN3Q36LC7PKGV6URDRP32D7ANCNFSM4MQXNN6A . -- - Alex

Yes, you're right, this is only a temp version. I have changed the name to _apply_dense_unstateful, XX_hash is a bad name. About seperate optimizer class, I'm not sure which option would be better, I prefer to use the same optimizer to provide a consistent experience for algorithm engineers, because a model in deep learning RecSys may contain dense weights and sparse weights at the same time..

May 04 '20 17:05 rhdong

I think TensorFlow can provide a way to extend optimizers so that you can extend existing optimizers to handle your sparse weights.

May 04 '20 19:05 yuefengz

+1 to Yuefeng's suggestion.

Can this proposal be enhanced with a section discussing such extension?

On Mon, May 4, 2020 at 12:14 PM Yuefeng Zhou [email protected] wrote:

I think TensorFlow can provide a way to extend optimizers so that you can extend existing optimizers to handle your sparse weights.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/community/pull/237#issuecomment-623651955, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRPX63755ZBUA6BC5ODRP4HXRANCNFSM4MQXNN6A .

--

Alex

May 04 '20 19:05 alextp

I think TensorFlow can provide a way to extend optimizers so that you can extend existing optimizers to handle your sparse weights.

cc @omalleyt12 who proposes the new customizable optimizer in #234. Mind to shed some light on this?

May 04 '20 23:05 byronyi

@yuefengz @byronyi @alextp @smilingday @facaiy @seanpmorgan @omalleyt12 Hi all, I just commit an important update for optimizer reusing scheme based on ResourceVariable and come up API detailed design. And I will provide a runnable demo on docker.io as soon as possible. Thank you.

May 23 '20 18:05 rhdong

I think this version scheme is simple and natural enough for core.

May 23 '20 18:05 rhdong

Since this RFC targets for SIG AddOns, add SIG AddOns leads @facaiy @seanpmorgan and TF sponsor @karmel as reviewers.

Thanks for ping me, @yuefengz @smilingday . The proposal is very interesting. I'm wondering if we can introduce a new kind of Variable class and reuse all existing optimizers (in tf-core or tf addons).

I'm afraid the proposal goes out of scope of tf-addons, so I suggest to put them in a separate repo first. @seanpmorgan Sean, what do you think?

Jun 03 '20 08:06 facaiy

Sean has discussed with SIG AddOns meetings and replied in seperate email threads that tf-addons might not be a good fit. We are still exploring the right place for those contributions.

Jun 03 '20 10:06 smilingday

I will provide the source code with unittest cases sooner.

Jun 04 '20 02:06 rhdong

Is this RFC related to the recently proposed paper "DynamicEmbedding: Extending TensorFlow for Colossal-Scale Applications" by Google? https://arxiv.org/pdf/2004.08366.pdf

Jun 17 '20 08:06 levyfan

Is this RFC related to the recently proposed paper "DynamicEmbedding: Extending TensorFlow for Colossal-Scale Applications" by Google? https://arxiv.org/pdf/2004.08366.pdf

No, this is a different scheme proposed in an earlier paper Distributed Equivalent Substitution Training for Large-Scale Recommender Systems(accepted by SIGIR'2020).

Jun 24 '20 03:06 rhdong

@yuefengz @tanzhenyu @byronyi @alextp Hi, I just updated this RFC and this update contains some key features include the scheme of compatible with all tf.initializer without hacking too much on MutableHashTableOfTensors::Find and we also provided the our patch to core https://github.com/tensorflow/tensorflow/pull/41371, please help us improve it, thank you!

Jul 14 '20 15:07 rhdong

is it compatible with tensorflow serving ? @rhdong

Jul 24 '20 08:07 shenbaise

is it compatible with tensorflow serving ? @rhdong

Yes

Jul 24 '20 09:07 rhdong

Hi @rhdong , I fix some bugs(shape of TrainableWrapper) and build tf 2.4.0, based on your code. It seems the dynamic_embedding didn't updated in training process.

Code as follows:

import tensorflow as tf
from tensorflow.keras.layers import Dense, Lambda
from tensorflow import dynamic_embedding as de
import numpy as np

idx = np.random.randint(0, 10, 100)
label = np.array([1.0 if a % 2 == 0 else 0.0 for a in idx], dtype=np.float32)

class MyModel(tf.keras.Model):
  def __init__(self):
    super(MyModel, self).__init__()
    self.w = de.get_variable(name="dynamic_embeddings", dim=8, initializer=np.random.random(8))
    self.d0 = Lambda(lambda x: de.embedding_lookup(params=self.w, ids=x, name="wide-sparse-weights"))
    self.d1 = Dense(10, activation='relu')
    self.d2 = Dense(1, activation='sigmoid')
    self.x0 = None
  def call(self, x):
    self.x0 = self.d0(x)
    x1 = self.d1(self.x0)
    return self.d2(x1)

model = MyModel()
loss_func = tf.keras.losses.BinaryCrossentropy()
optimizer = tf.keras.optimizers.Adagrad(learning_rate=.5)
train_loss = tf.keras.metrics.Mean(name='train_loss')

def train_step(x, label, print_loss=False):
    with tf.GradientTape() as tape:
      logits = model(x)
      loss = loss_func(logits, label)
    trainable_weights = model.trainable_variables
    # trainable_weights.append(model.x0)
    grads = tape.gradient(loss, trainable_weights)
    optimizer.apply_gradients(zip(grads, trainable_weights))
    if print_loss:
        print("loss:{}".format(train_loss(loss).numpy()))

def emb_sum():
    a = de.embedding_lookup(params=model.w, ids=np.array([2, 3]), name="wide-sparse-weights")
    return a.numpy().sum()

def kernel_sum():
    return model.d1.kernel.numpy().sum()

print("emb sum:{}".format(emb_sum()))
for i in range(20):
    train_step(idx.reshape(100, 1), label.reshape(100, 1))
print("emb sum:{}".format(emb_sum()))
print("kernel sum:{}".format(kernel_sum()))
# train more
for i in range(10):
    train_step(idx.reshape(100, 1), label.reshape(100, 1), print_loss=True)
print("emb sum:{}".format(emb_sum()))
print("kernel sum:{}".format(kernel_sum()))

# print trainable weights
print([v.name for v in model.trainable_weights])

console:

emb sum:**-0.031497083604335785**
emb sum:**-0.031497083604335785**
kernel sum:0.6821714043617249
loss:7.522636
loss:7.52227
loss:7.5219383
loss:7.521633
loss:7.521351
loss:7.521089
loss:7.520846
loss:7.5206184
loss:7.5204053
loss:7.5202055
emb sum:**-0.031497083604335785**
kernel sum:0.6808109283447266
['my_model/dense/kernel:0', 'my_model/dense/bias:0', 'my_model/dense_1/kernel:0', 
'my_model/dense_1/bias:0', 'my_model/lambda/TrainableWrapper:0']

Jul 28 '20 09:07 shenbaise

Hi @rhdong , I fix some bugs(shape of TrainableWrapper) and build tf 2.4.0, based on your code. It seems the dynamic_embedding didn't updated in training process.

Code as follows:

import tensorflow as tf
from tensorflow.keras.layers import Dense, Lambda
from tensorflow import dynamic_embedding as de
import numpy as np

idx = np.random.randint(0, 10, 100)
label = np.array([1.0 if a % 2 == 0 else 0.0 for a in idx], dtype=np.float32)

class MyModel(tf.keras.Model):
  def __init__(self):
    super(MyModel, self).__init__()
    self.w = de.get_variable(name="dynamic_embeddings", dim=8, initializer=np.random.random(8))
    self.d0 = Lambda(lambda x: de.embedding_lookup(params=self.w, ids=x, name="wide-sparse-weights"))
    self.d1 = Dense(10, activation='relu')
    self.d2 = Dense(1, activation='sigmoid')
    self.x0 = None
  def call(self, x):
    self.x0 = self.d0(x)
    x1 = self.d1(self.x0)
    return self.d2(x1)

model = MyModel()
loss_func = tf.keras.losses.BinaryCrossentropy()
optimizer = tf.keras.optimizers.Adagrad(learning_rate=.5)
train_loss = tf.keras.metrics.Mean(name='train_loss')

def train_step(x, label, print_loss=False):
    with tf.GradientTape() as tape:
      logits = model(x)
      loss = loss_func(logits, label)
    trainable_weights = model.trainable_variables
    # trainable_weights.append(model.x0)
    grads = tape.gradient(loss, trainable_weights)
    optimizer.apply_gradients(zip(grads, trainable_weights))
    if print_loss:
        print("loss:{}".format(train_loss(loss).numpy()))

def emb_sum():
    a = de.embedding_lookup(params=model.w, ids=np.array([2, 3]), name="wide-sparse-weights")
    return a.numpy().sum()

def kernel_sum():
    return model.d1.kernel.numpy().sum()

print("emb sum:{}".format(emb_sum()))
for i in range(20):
    train_step(idx.reshape(100, 1), label.reshape(100, 1))
print("emb sum:{}".format(emb_sum()))
print("kernel sum:{}".format(kernel_sum()))
# train more
for i in range(10):
    train_step(idx.reshape(100, 1), label.reshape(100, 1), print_loss=True)
print("emb sum:{}".format(emb_sum()))
print("kernel sum:{}".format(kernel_sum()))

# print trainable weights
print([v.name for v in model.trainable_weights])

console:

emb sum:**-0.031497083604335785**
emb sum:**-0.031497083604335785**
kernel sum:0.6821714043617249
loss:7.522636
loss:7.52227
loss:7.5219383
loss:7.521633
loss:7.521351
loss:7.521089
loss:7.520846
loss:7.5206184
loss:7.5204053
loss:7.5202055
emb sum:**-0.031497083604335785**
kernel sum:0.6808109283447266
['my_model/dense/kernel:0', 'my_model/dense/bias:0', 'my_model/dense_1/kernel:0', 
'my_model/dense_1/bias:0', 'my_model/lambda/TrainableWrapper:0']

@shenbaise Thank you for feedback, I will check and fix it as soon as possible.

Jul 28 '20 11:07 rhdong

Hi @rhdong , I fix some bugs(shape of TrainableWrapper) and build tf 2.4.0, based on your code. It seems the dynamic_embedding didn't updated in training process. Code as follows:

import tensorflow as tf
from tensorflow.keras.layers import Dense, Lambda
from tensorflow import dynamic_embedding as de
import numpy as np

idx = np.random.randint(0, 10, 100)
label = np.array([1.0 if a % 2 == 0 else 0.0 for a in idx], dtype=np.float32)

class MyModel(tf.keras.Model):
  def __init__(self):
    super(MyModel, self).__init__()
    self.w = de.get_variable(name="dynamic_embeddings", dim=8, initializer=np.random.random(8))
    self.d0 = Lambda(lambda x: de.embedding_lookup(params=self.w, ids=x, name="wide-sparse-weights"))
    self.d1 = Dense(10, activation='relu')
    self.d2 = Dense(1, activation='sigmoid')
    self.x0 = None
  def call(self, x):
    self.x0 = self.d0(x)
    x1 = self.d1(self.x0)
    return self.d2(x1)

model = MyModel()
loss_func = tf.keras.losses.BinaryCrossentropy()
optimizer = tf.keras.optimizers.Adagrad(learning_rate=.5)
train_loss = tf.keras.metrics.Mean(name='train_loss')

def train_step(x, label, print_loss=False):
    with tf.GradientTape() as tape:
      logits = model(x)
      loss = loss_func(logits, label)
    trainable_weights = model.trainable_variables
    # trainable_weights.append(model.x0)
    grads = tape.gradient(loss, trainable_weights)
    optimizer.apply_gradients(zip(grads, trainable_weights))
    if print_loss:
        print("loss:{}".format(train_loss(loss).numpy()))

def emb_sum():
    a = de.embedding_lookup(params=model.w, ids=np.array([2, 3]), name="wide-sparse-weights")
    return a.numpy().sum()

def kernel_sum():
    return model.d1.kernel.numpy().sum()

print("emb sum:{}".format(emb_sum()))
for i in range(20):
    train_step(idx.reshape(100, 1), label.reshape(100, 1))
print("emb sum:{}".format(emb_sum()))
print("kernel sum:{}".format(kernel_sum()))
# train more
for i in range(10):
    train_step(idx.reshape(100, 1), label.reshape(100, 1), print_loss=True)
print("emb sum:{}".format(emb_sum()))
print("kernel sum:{}".format(kernel_sum()))

# print trainable weights
print([v.name for v in model.trainable_weights])

console:

emb sum:**-0.031497083604335785**
emb sum:**-0.031497083604335785**
kernel sum:0.6821714043617249
loss:7.522636
loss:7.52227
loss:7.5219383
loss:7.521633
loss:7.521351
loss:7.521089
loss:7.520846
loss:7.5206184
loss:7.5204053
loss:7.5202055
emb sum:**-0.031497083604335785**
kernel sum:0.6808109283447266
['my_model/dense/kernel:0', 'my_model/dense/bias:0', 'my_model/dense_1/kernel:0', 
'my_model/dense_1/bias:0', 'my_model/lambda/TrainableWrapper:0']

@shenbaise Thank you for feedback, I will check and fix it as soon as possible.

Hi @shenbaise , the reason is that the commit is not compatible with keras, especially the optimizer v2, I need two days to fix it and add the UT cases, please wait a moment, Thank you!

Jul 30 '20 03:07 rhdong

Hi @rhdong , I fix some bugs(shape of TrainableWrapper) and build tf 2.4.0, based on your code. It seems the dynamic_embedding didn't updated in training process.

Code as follows:

import tensorflow as tf
from tensorflow.keras.layers import Dense, Lambda
from tensorflow import dynamic_embedding as de
import numpy as np

idx = np.random.randint(0, 10, 100)
label = np.array([1.0 if a % 2 == 0 else 0.0 for a in idx], dtype=np.float32)

class MyModel(tf.keras.Model):
  def __init__(self):
    super(MyModel, self).__init__()
    self.w = de.get_variable(name="dynamic_embeddings", dim=8, initializer=np.random.random(8))
    self.d0 = Lambda(lambda x: de.embedding_lookup(params=self.w, ids=x, name="wide-sparse-weights"))
    self.d1 = Dense(10, activation='relu')
    self.d2 = Dense(1, activation='sigmoid')
    self.x0 = None
  def call(self, x):
    self.x0 = self.d0(x)
    x1 = self.d1(self.x0)
    return self.d2(x1)

model = MyModel()
loss_func = tf.keras.losses.BinaryCrossentropy()
optimizer = tf.keras.optimizers.Adagrad(learning_rate=.5)
train_loss = tf.keras.metrics.Mean(name='train_loss')

def train_step(x, label, print_loss=False):
    with tf.GradientTape() as tape:
      logits = model(x)
      loss = loss_func(logits, label)
    trainable_weights = model.trainable_variables
    # trainable_weights.append(model.x0)
    grads = tape.gradient(loss, trainable_weights)
    optimizer.apply_gradients(zip(grads, trainable_weights))
    if print_loss:
        print("loss:{}".format(train_loss(loss).numpy()))

def emb_sum():
    a = de.embedding_lookup(params=model.w, ids=np.array([2, 3]), name="wide-sparse-weights")
    return a.numpy().sum()

def kernel_sum():
    return model.d1.kernel.numpy().sum()

print("emb sum:{}".format(emb_sum()))
for i in range(20):
    train_step(idx.reshape(100, 1), label.reshape(100, 1))
print("emb sum:{}".format(emb_sum()))
print("kernel sum:{}".format(kernel_sum()))
# train more
for i in range(10):
    train_step(idx.reshape(100, 1), label.reshape(100, 1), print_loss=True)
print("emb sum:{}".format(emb_sum()))
print("kernel sum:{}".format(kernel_sum()))

# print trainable weights
print([v.name for v in model.trainable_weights])

console:

emb sum:**-0.031497083604335785**
emb sum:**-0.031497083604335785**
kernel sum:0.6821714043617249
loss:7.522636
loss:7.52227
loss:7.5219383
loss:7.521633
loss:7.521351
loss:7.521089
loss:7.520846
loss:7.5206184
loss:7.5204053
loss:7.5202055
emb sum:**-0.031497083604335785**
kernel sum:0.6808109283447266
['my_model/dense/kernel:0', 'my_model/dense/bias:0', 'my_model/dense_1/kernel:0', 
'my_model/dense_1/bias:0', 'my_model/lambda/TrainableWrapper:0']

Hi @shenbaise, I fix the issue and the commit is here

Aug 11 '20 07:08 rhdong

FYI, @kttian wrote a prototype for a differentiable hash map, roughly the equivalent of TensorList, as part of her internship project. Here's a colab that demonstrates direct gradient updates: https://colab.sandbox.google.com/drive/1hyFmriuq4Bz61_rxg2bfdE_jXHVfX8Rr?usp=sharing#scrollTo=8HDUUBEFAesC

There may be an opportunity to join efforts on a core implementation.

@alextp @saxenasaurabh @dynamicwebpaige

Aug 26 '20 19:08 mdanatg

FYI, @kttian wrote a prototype for a differentiable hash map, roughly the equivalent of TensorList, as part of her internship project. Here's a colab that demonstrates direct gradient updates: https://colab.sandbox.google.com/drive/1hyFmriuq4Bz61_rxg2bfdE_jXHVfX8Rr?usp=sharing#scrollTo=8HDUUBEFAesC

There may be an opportunity to join efforts on a core implementation.

@alextp @saxenasaurabh @dynamicwebpaige

This is good job. But I think it is difficult to make the hash map trainable .

Sep 06 '20 04:09 rhdong

FYI, @kttian wrote a prototype for a differentiable hash map, roughly the equivalent of TensorList, as part of her internship project. Here's a colab that demonstrates direct gradient updates: https://colab.sandbox.google.com/drive/1hyFmriuq4Bz61_rxg2bfdE_jXHVfX8Rr?usp=sharing#scrollTo=8HDUUBEFAesC There may be an opportunity to join efforts on a core implementation. @alextp @saxenasaurabh @dynamicwebpaige

This is good job. But I think it is difficult to make the hash map trainable .

It already is trainable (at least in the sense of trainable that I believe you're referring to).

Sep 08 '20 12:09 mdanatg

@yuefengz Is this still in draft mode? What are the plans with this RFC?

Jan 31 '22 18:01 ematejska

community
community copied to clipboard

RFC: Sparse Domain Isolation for Supporting large-scale Sparse Weights Training.

Sparse Domain Isolation for supporting large-scale Recommender Systems.

community community copied to clipboard

RFC: Sparse Domain Isolation for Supporting large-scale Sparse Weights Training.

Sparse Domain Isolation for supporting large-scale Recommender Systems.

community
community copied to clipboard