unif icon indicating copy to clipboard operation
unif copied to clipboard

基于 Tensorflow,仿 Scikit-Learn 设计的深度学习自然语言处理框架。支持 40 余种模型类,涵盖语言模型、文本分类、NER、MRC、知识蒸馏等各个领域

Results 8 unif issues
Sort by recently updated
recently updated
newest added

Hi, I've noticed that you have implemented the wide and deep structure which is differnt from the classical "youtube wide and deep". Here is my question: 1) what is the...

作者您好!最近读了关于您写的对抗训练部分代码,非常的感兴趣,同时对smart算法的部分有一点疑惑, # runs at the start of each epoch self.init_tilda_op = tilda.assign(param) # runs at the end of each epoch self.update_tilda_op = tilda.assign( (1 - tilda_beta) * param + tilda_beta...

您好~ 在使用unif的过程中,对下面这个函数有点疑惑,您用空的时候看看哈~ 如下函数求梯度的平均值时,如果grad是IndexedSlices类型的话,对value求平均,而indices则取第一个grad的indices; 感觉每个grad的indices是不一样的,假如是四卡的情况,一个batch被分成四分,其数据是不一样的,那取得应该是embedding_table矩阵的不同行; 这样的话,直接取第一个grad的indices作为indices感觉漏掉了embedding_table里一些参数的梯度;这里的value直接取平均的话,意思是把embedding_table里不同batch里的不同行的梯度值进行平均,感觉是不同参数的梯度值取了平均,直觉上是相同参数的梯度值取平均,所以感觉有些奇怪。看网上有的单机多卡的梯度平均实现是,不管是不是IndexedSlices类型,都直接用tf.divide(tf.add_n(split_grads), len(split_grads))来求平均,也不知道这样能解决我说的疑惑嘛? https://github.com/geyingli/unif/blob/master/uf/utils.py#L748 ``` def average_n_grads(split_grads): split_grads = [grad for grad in split_grads if grad is not None] # Dealing with IndexedSlices for large-dimensional embedding #...

作者您好, with tf.control_dependencies([init_op]): # fix perturbation # Scale randomly initialized permutation, to make sure norm # of r is smaller than epsilon. shape = tf.cast(np.prod(init_r.shape.as_list()), tf.float32) r = tf.divide(init_r, tf.sqrt(shape))...

``` class FreeAT(tf.keras.Model): def train_step(self, data): x, y = data last_r = 0.0 last_r_slice = 0.0 K = 3 ep = 1e-3 for t in range(K): with tf.GradientTape() as tape:...

请问支持模型在训练过程中调用类似 keras 中的callbacks 的函数进行验证吗

You are welcome to leave problems here. Any questions will be answered ASAP.

Nice job! Is there any plan about unilm model?