unif issues

question on wide and deep

1

Hi, I've noticed that you have implemented the wide and deep structure which is differnt from the classical "youtube wide and deep". Here is my question: 1) what is the...

kiminh

作者您好！最近读了关于您写的对抗训练部分代码，非常的感兴趣，同时对smart算法的部分有一点疑惑， # runs at the start of each epoch self.init_tilda_op = tilda.assign(param) # runs at the end of each epoch self.update_tilda_op = tilda.assign( (1 - tilda_beta) * param + tilda_beta...

LSX-Sneakerprogrammer

单机多卡

3

您好~ 在使用unif的过程中，对下面这个函数有点疑惑，您用空的时候看看哈~ 如下函数求梯度的平均值时，如果grad是IndexedSlices类型的话，对value求平均，而indices则取第一个grad的indices；感觉每个grad的indices是不一样的，假如是四卡的情况，一个batch被分成四分，其数据是不一样的，那取得应该是embedding_table矩阵的不同行；这样的话，直接取第一个grad的indices作为indices感觉漏掉了embedding_table里一些参数的梯度；这里的value直接取平均的话，意思是把embedding_table里不同batch里的不同行的梯度值进行平均，感觉是不同参数的梯度值取了平均，直觉上是相同参数的梯度值取平均，所以感觉有些奇怪。看网上有的单机多卡的梯度平均实现是，不管是不是IndexedSlices类型，都直接用tf.divide(tf.add_n(split_grads), len(split_grads))来求平均，也不知道这样能解决我说的疑惑嘛？ https://github.com/geyingli/unif/blob/master/uf/utils.py#L748 ``` def average_n_grads(split_grads): split_grads = [grad for grad in split_grads if grad is not None] # Dealing with IndexedSlices for large-dimensional embedding #...

yupeijei1997

对抗训练

28

作者您好， with tf.control_dependencies([init_op]): # fix perturbation # Scale randomly initialized permutation, to make sure norm # of r is smaller than epsilon. shape = tf.cast(np.prod(init_r.shape.as_list()), tf.float32) r = tf.divide(init_r, tf.sqrt(shape))...

yupeijei1997

对抗训练tf2

5

``` class FreeAT(tf.keras.Model): def train_step(self, data): x, y = data last_r = 0.0 last_r_slice = 0.0 K = 3 ep = 1e-3 for t in range(K): with tf.GradientTape() as tape:...

luoda888