arxiv-updates
arxiv-updates copied to clipboard
New submissions for Wed, 10 Jan 24
Keyword: sgd
Private Fine-tuning of Large Language Models with Zeroth-order Optimization
- Authors: Authors: Xinyu Tang, Ashwinee Panda, Milad Nasr, Saeed Mahloujifar, Prateek Mittal
- Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
- Arxiv link: https://arxiv.org/abs/2401.04343
- Pdf link: https://arxiv.org/pdf/2401.04343
- Abstract Fine-tuning large pretrained models on private datasets may run the risk of violating privacy. Differential privacy is a framework for mitigating privacy risks by enforcing algorithmic stability. DP-SGD enables training models with private data in a privacy-preserving manner, but raises new obstacles in the form of performance loss and significant engineering challenges. We introduce DP-ZO, a new method for fine-tuning large language models that preserves the privacy of training data by privatizing zeroth-order optimization. A key insight into the design of our method is that the direction of the gradient in SPSA, the zeroth-order algorithm we use, is always random and the only information that depends on private data is the step size, i.e., a scalar. Therefore, we only need to privatize the scalar step size, which is memory-efficient. DP-ZO, which can be instantiated with either Laplace or Gaussian noise, provides a strong privacy-utility trade-off across different tasks, and model sizes, under conservative privacy budgets. One noteworthy result is that DP-ZO exhibits just $1.86%$ performance degradation due to privacy at $(1,10^{-5})$-DP when fine-tuning OPT-66B on 1000 training samples from SQuAD.
Keyword: optimization
Recent developments of selective laser processes for wearable devices
- Authors: Authors: Youngchan Kim, Eunseung Hwang, Chang Kai, Kaichen Xu, Heng Pan, Sukjoon Hong
- Subjects: Human-Computer Interaction (cs.HC)
- Arxiv link: https://arxiv.org/abs/2401.04109
- Pdf link: https://arxiv.org/pdf/2401.04109
- Abstract Recently, the growing interest in wearable technology for personal healthcare and smart VR/AR applications newly imposed a need for development of facile fabrication method. Regarding the issue, laser has long been proposing original answers to such challenging technological demands with its remote, sterile, rapid, and site-selective processing characteristics for arbitrary materials. In this review, recent developments in relevant laser processes are summarized in two separate categories. Firstly, transformative approaches represented by laser-induced graphene (LIG) are introduced. Apart from design optimization and alteration of native substrate, latest advancements in the transformative approach now enable not only more complex material compositions but also multilayer device configurations by simultaneous transformation of heterogeneous precursor or sequential addition of functional layers coupled with other electronic elements. Besides, more conventional laser techniques such as ablation, sintering and synthesis are still accessible for enhancing the functionality of the entire system through expansion of applicable materials and adoption of new mechanisms. Various wearable device components developed through the corresponding laser processes are then organized with emphasis on chemical/physical sensors and energy devices. At the same time, special attention is given to the applications utilizing multiple laser sources or multiple laser processes, which pave the way towards all-laser fabrication of wearable devices.
Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning
- Authors: Authors: Wenhan Xia, Chengwei Qin, Elad Hazan
- Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
- Arxiv link: https://arxiv.org/abs/2401.04151
- Pdf link: https://arxiv.org/pdf/2401.04151
- Abstract Fine-tuning is the primary methodology for tailoring pre-trained large language models to specific tasks. As the model's scale and the diversity of tasks expand, parameter-efficient fine-tuning methods are of paramount importance. One of the most widely used family of methods is low-rank adaptation (LoRA) and its variants. LoRA encodes weight update as the product of two low-rank matrices. Despite its advantages, LoRA falls short of full-parameter fine-tuning in terms of generalization error for certain tasks. We introduce Chain of LoRA (COLA), an iterative optimization framework inspired by the Frank-Wolfe algorithm, to bridge the gap between LoRA and full parameter fine-tuning, without incurring additional computational costs or memory overheads. COLA employs a residual learning procedure where it merges learned LoRA modules into the pre-trained language model parameters and re-initilize optimization for new born LoRA modules. We provide theoretical convergence guarantees as well as empirical results to validate the effectiveness of our algorithm. Across various models (OPT and llama-2) and seven benchmarking tasks, we demonstrate that COLA can consistently outperform LoRA without additional computational or memory costs.
A Fast Graph Search Algorithm with Dynamic Optimization and Reduced Histogram for Discrimination of Binary Classification Problem
- Authors: Authors: Qinwu Xu
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2401.04282
- Pdf link: https://arxiv.org/pdf/2401.04282
- Abstract This study develops a graph search algorithm to find the optimal discrimination path for the binary classification problem. The objective function is defined as the difference of variations between the true positive (TP) and false positive (FP). It uses the depth first search (DFS) algorithm to find the top-down paths for discrimination. It proposes a dynamic optimization procedure to optimize TP at the upper levels and then reduce FP at the lower levels. To accelerate computing speed with improving accuracy, it proposes a reduced histogram algorithm with variable bin size instead of looping over all data points, to find the feature threshold of discrimination. The algorithm is applied on top of a Support Vector Machine (SVM) model for a binary classification problem on whether a person is fit or unfit. It significantly improves TP and reduces FP of the SVM results (e.g., reduced FP by 90% with a loss of only\ 5% TP). The graph search auto-generates 39 ranked discrimination paths within 9 seconds on an input of total 328,464 objects, using a dual-core Laptop computer with a processor of 2.59 GHz.
G-Meta: Distributed Meta Learning in GPU Clusters for Large-Scale Recommender Systems
- Authors: Authors: Youshao Xiao, Shangchun Zhao, Zhenglei Zhou, Zhaoxin Huan, Lin Ju, Xiaolu Zhang, Lin Wang, Jun Zhou
- Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR)
- Arxiv link: https://arxiv.org/abs/2401.04338
- Pdf link: https://arxiv.org/pdf/2401.04338
- Abstract Recently, a new paradigm, meta learning, has been widely applied to Deep Learning Recommendation Models (DLRM) and significantly improves statistical performance, especially in cold-start scenarios. However, the existing systems are not tailored for meta learning based DLRM models and have critical problems regarding efficiency in distributed training in the GPU cluster. It is because the conventional deep learning pipeline is not optimized for two task-specific datasets and two update loops in meta learning. This paper provides a high-performance framework for large-scale training for Optimization-based Meta DLRM models over the \textbf{G}PU cluster, namely \textbf{G}-Meta. Firstly, G-Meta utilizes both data parallelism and model parallelism with careful orchestration regarding computation and communication efficiency, to enable high-speed distributed training. Secondly, it proposes a Meta-IO pipeline for efficient data ingestion to alleviate the I/O bottleneck. Various experimental results show that G-Meta achieves notable training speed without loss of statistical performance. Since early 2022, G-Meta has been deployed in Alipay's core advertising and recommender system, shrinking the continuous delivery of models by four times. It also obtains 6.48% improvement in Conversion Rate (CVR) and 1.06% increase in CPM (Cost Per Mille) in Alipay's homepage display advertising, with the benefit of larger training samples and tasks.
Private Fine-tuning of Large Language Models with Zeroth-order Optimization
- Authors: Authors: Xinyu Tang, Ashwinee Panda, Milad Nasr, Saeed Mahloujifar, Prateek Mittal
- Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
- Arxiv link: https://arxiv.org/abs/2401.04343
- Pdf link: https://arxiv.org/pdf/2401.04343
- Abstract Fine-tuning large pretrained models on private datasets may run the risk of violating privacy. Differential privacy is a framework for mitigating privacy risks by enforcing algorithmic stability. DP-SGD enables training models with private data in a privacy-preserving manner, but raises new obstacles in the form of performance loss and significant engineering challenges. We introduce DP-ZO, a new method for fine-tuning large language models that preserves the privacy of training data by privatizing zeroth-order optimization. A key insight into the design of our method is that the direction of the gradient in SPSA, the zeroth-order algorithm we use, is always random and the only information that depends on private data is the step size, i.e., a scalar. Therefore, we only need to privatize the scalar step size, which is memory-efficient. DP-ZO, which can be instantiated with either Laplace or Gaussian noise, provides a strong privacy-utility trade-off across different tasks, and model sizes, under conservative privacy budgets. One noteworthy result is that DP-ZO exhibits just $1.86%$ performance degradation due to privacy at $(1,10^{-5})$-DP when fine-tuning OPT-66B on 1000 training samples from SQuAD.
Learning with Noisy Labels: Interconnection of Two Expectation-Maximizations
- Authors: Authors: Heewon Kim, Hyun Sung Chang, Kiho Cho, Jaeyun Lee, Bohyung Han
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2401.04390
- Pdf link: https://arxiv.org/pdf/2401.04390
- Abstract Labor-intensive labeling becomes a bottleneck in developing computer vision algorithms based on deep learning. For this reason, dealing with imperfect labels has increasingly gained attention and has become an active field of study. We address learning with noisy labels (LNL) problem, which is formalized as a task of finding a structured manifold in the midst of noisy data. In this framework, we provide a proper objective function and an optimization algorithm based on two expectation-maximization (EM) cycles. The separate networks associated with the two EM cycles collaborate to optimize the objective function, where one model is for distinguishing clean labels from corrupted ones while the other is for refurbishing the corrupted labels. This approach results in a non-collapsing LNL-flywheel model in the end. Experiments show that our algorithm achieves state-of-the-art performance in multiple standard benchmarks with substantial margins under various types of label noise.
Fine-Grained Embedding Dimension Optimization During Training for Recommender Systems
- Authors: Authors: Qinyi Luo, Penghan Wang, Wei Zhang, Fan Lai, Jiachen Mao, Xiaohan Wei, Jun Song, Wei-Yu Tsai, Shuai Yang, Yuxi Hu, Xuehai Qian
- Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2401.04408
- Pdf link: https://arxiv.org/pdf/2401.04408
- Abstract Huge embedding tables in modern Deep Learning Recommender Models (DLRM) require prohibitively large memory during training and inference. Aiming to reduce the memory footprint of training, this paper proposes FIne-grained In-Training Embedding Dimension optimization (FIITED). Given the observation that embedding vectors are not equally important, FIITED adjusts the dimension of each individual embedding vector continuously during training, assigning longer dimensions to more important embeddings while adapting to dynamic changes in data. A novel embedding storage system based on virtually-hashed physically-indexed hash tables is designed to efficiently implement the embedding dimension adjustment and effectively enable memory saving. Experiments on two industry models show that FIITED is able to reduce the size of embeddings by more than 65% while maintaining the trained model's quality, saving significantly more memory than a state-of-the-art in-training embedding pruning method. On public click-through rate prediction datasets, FIITED is able to prune up to 93.75%-99.75% embeddings without significant accuracy loss.
Meta-forests: Domain generalization on random forests with meta-learning
- Authors: Authors: Yuyang Sun, Panagiotis Kosmas
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2401.04425
- Pdf link: https://arxiv.org/pdf/2401.04425
- Abstract Domain generalization is a popular machine learning technique that enables models to perform well on the unseen target domain, by learning from multiple source domains. Domain generalization is useful in cases where data is limited, difficult, or expensive to collect, such as in object recognition and biomedicine. In this paper, we propose a novel domain generalization algorithm called "meta-forests", which builds upon the basic random forests model by incorporating the meta-learning strategy and maximum mean discrepancy measure. The aim of meta-forests is to enhance the generalization ability of classifiers by reducing the correlation among trees and increasing their strength. More specifically, meta-forests conducts meta-learning optimization during each meta-task, while also utilizing the maximum mean discrepancy as a regularization term to penalize poor generalization performance in the meta-test process. To evaluate the effectiveness of our algorithm, we test it on two publicly object recognition datasets and a glucose monitoring dataset that we have used in a previous study. Our results show that meta-forests outperforms state-of-the-art approaches in terms of generalization performance on both object recognition and glucose monitoring datasets.
Online convex optimization for robust control of constrained dynamical systems
- Authors: Authors: Marko Nonhoff, Emiliano Dall'Anese, Matthias A. Müller
- Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
- Arxiv link: https://arxiv.org/abs/2401.04487
- Pdf link: https://arxiv.org/pdf/2401.04487
- Abstract This article investigates the problem of controlling linear time-invariant systems subject to time-varying and a priori unknown cost functions, state and input constraints, and exogenous disturbances. We combine the online convex optimization framework with tools from robust model predictive control to propose an algorithm that is able to guarantee robust constraint satisfaction. The performance of the closed loop emerging from application of our framework is studied in terms of its dynamic regret, which is proven to be bounded linearly by the variation of the cost functions and the magnitude of the disturbances. We corroborate our theoretical findings and illustrate implementational aspects of the proposed algorithm by a numerical case study of a tracking control problem of an autonomous vehicle.
UBfuzz: Finding Bugs in Sanitizer Implementations
- Authors: Authors: Shaohua Li, Zhendong Su
- Subjects: Cryptography and Security (cs.CR); Programming Languages (cs.PL); Software Engineering (cs.SE)
- Arxiv link: https://arxiv.org/abs/2401.04538
- Pdf link: https://arxiv.org/pdf/2401.04538
- Abstract In this paper, we propose a testing framework for validating sanitizer implementations in compilers. Our core components are (1) a program generator specifically designed for producing programs containing undefined behavior (UB), and (2) a novel test oracle for sanitizer testing. The program generator employs Shadow Statement Insertion, a general and effective approach for introducing UB into a valid seed program. The generated UB programs are subsequently utilized for differential testing of multiple sanitizer implementations. Nevertheless, discrepant sanitizer reports may stem from either compiler optimization or sanitizer bugs. To accurately determine if a discrepancy is caused by sanitizer bugs, we introduce a new test oracle called crash-site mapping. We have incorporated our techniques into UBfuzz, a practical tool for testing sanitizers. Over a five-month testing period, UBfuzz successfully found 31 bugs in both GCC and LLVM sanitizers. These bugs reveal the serious false negative problems in sanitizers, where certain UBs in programs went unreported. This research paves the way for further investigation in this crucial area of study.
A Discrete Particle Swarm Optimizer for the Design of Cryptographic Boolean Functions
- Authors: Authors: Luca Mariot, Alberto Leporati, Luca Manzoni
- Subjects: Neural and Evolutionary Computing (cs.NE); Cryptography and Security (cs.CR)
- Arxiv link: https://arxiv.org/abs/2401.04567
- Pdf link: https://arxiv.org/pdf/2401.04567
- Abstract A Particle Swarm Optimizer for the search of balanced Boolean functions with good cryptographic properties is proposed in this paper. The algorithm is a modified version of the permutation PSO by Hu, Eberhart and Shi which preserves the Hamming weight of the particles positions, coupled with the Hill Climbing method devised by Millan, Clark and Dawson to improve the nonlinearity and deviation from correlation immunity of Boolean functions. The parameters for the PSO velocity equation are tuned by means of two meta-optimization techniques, namely Local Unimodal Sampling (LUS) and Continuous Genetic Algorithms (CGA), finding that CGA produces better results. Using the CGA-evolved parameters, the PSO algorithm is then run on the spaces of Boolean functions from $n=7$ to $n=12$ variables. The results of the experiments are reported, observing that this new PSO algorithm generates Boolean functions featuring similar or better combinations of nonlinearity, correlation immunity and propagation criterion with respect to the ones obtained by other optimization methods.
Hypercomplex neural network in time series forecasting of stock data
- Authors: Authors: Radosław Kycia, Agnieszka Niemczynowicz
- Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2401.04632
- Pdf link: https://arxiv.org/pdf/2401.04632
- Abstract The three classes of architectures for time series prediction were tested. They differ by input layers which contain either convolutional, LSTM, or dense hypercomplex layers for 4D algebras. The input was four related Stock Market time series, and the prediction of one of them is expected. The optimization of hyperparameters related to the classes of architectures was performed in order to compare the best neural networks within the class. The results show that in most cases, the architecture with a hypercomplex dense layer provides similar MAE accuracy to other architectures, however, with considerably less trainable parameters. Thanks to it, hypercomplex neural networks can be learned and process data faster than the other tested architectures. Moreover, the order of the input time series has an impact on effectively.
Modified Levenberg-Marquardt Algorithm For Tensor CP Decomposition in Image Compression
- Authors: Authors: Ramin Goudarzi Karim, Dipak Dulal, Carmeliza Navasca
- Subjects: Numerical Analysis (math.NA)
- Arxiv link: https://arxiv.org/abs/2401.04670
- Pdf link: https://arxiv.org/pdf/2401.04670
- Abstract This paper explores a new version of the Levenberg-Marquardt algorithm used for Tensor Canonical Polyadic (CP) decomposition with an emphasis on image compression and reconstruction. Tensor computation, especially CP decomposition, holds significant applications in data compression and analysis. In this study, we formulate CP as a nonlinear least squares optimization problem. Then, we present an iterative Levenberg-Marquardt (LM) based algorithm for computing the CP decomposition. Ultimately, we test the algorithm on various datasets, including randomly generated tensors and RGB images. The proposed method proves to be both efficient and effective, offering a reduced computational burden when compared to the traditional Levenberg-Marquardt technique.
Keyword: adam
There is no result
Keyword: gradient
Private Fine-tuning of Large Language Models with Zeroth-order Optimization
- Authors: Authors: Xinyu Tang, Ashwinee Panda, Milad Nasr, Saeed Mahloujifar, Prateek Mittal
- Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
- Arxiv link: https://arxiv.org/abs/2401.04343
- Pdf link: https://arxiv.org/pdf/2401.04343
- Abstract Fine-tuning large pretrained models on private datasets may run the risk of violating privacy. Differential privacy is a framework for mitigating privacy risks by enforcing algorithmic stability. DP-SGD enables training models with private data in a privacy-preserving manner, but raises new obstacles in the form of performance loss and significant engineering challenges. We introduce DP-ZO, a new method for fine-tuning large language models that preserves the privacy of training data by privatizing zeroth-order optimization. A key insight into the design of our method is that the direction of the gradient in SPSA, the zeroth-order algorithm we use, is always random and the only information that depends on private data is the step size, i.e., a scalar. Therefore, we only need to privatize the scalar step size, which is memory-efficient. DP-ZO, which can be instantiated with either Laplace or Gaussian noise, provides a strong privacy-utility trade-off across different tasks, and model sizes, under conservative privacy budgets. One noteworthy result is that DP-ZO exhibits just $1.86%$ performance degradation due to privacy at $(1,10^{-5})$-DP when fine-tuning OPT-66B on 1000 training samples from SQuAD.
Take A Shortcut Back: Mitigating the Gradient Vanishing for Training Spiking Neural Networks
- Authors: Authors: Yufei Guo, Yuanpei Chen
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2401.04486
- Pdf link: https://arxiv.org/pdf/2401.04486
- Abstract The Spiking Neural Network (SNN) is a biologically inspired neural network infrastructure that has recently garnered significant attention. It utilizes binary spike activations to transmit information, thereby replacing multiplications with additions and resulting in high energy efficiency. However, training an SNN directly poses a challenge due to the undefined gradient of the firing spike process. Although prior works have employed various surrogate gradient training methods that use an alternative function to replace the firing process during back-propagation, these approaches ignore an intrinsic problem: gradient vanishing. To address this issue, we propose a shortcut back-propagation method in our paper, which advocates for transmitting the gradient directly from the loss to the shallow layers. This enables us to present the gradient to the shallow layers directly, thereby significantly mitigating the gradient vanishing problem. Additionally, this method does not introduce any burden during the inference phase. To strike a balance between final accuracy and ease of training, we also propose an evolutionary training framework and implement it by inducing a balance coefficient that dynamically changes with the training epoch, which further improves the network's performance. Extensive experiments conducted over static and dynamic datasets using several popular network structures reveal that our method consistently outperforms state-of-the-art methods.
Keyword: super-resolution
There is no result