arxiv-updates
arxiv-updates copied to clipboard
New submissions for Wed, 8 Nov 23
Keyword: sgd
User-level Differentially Private Stochastic Convex Optimization: Efficient Algorithms with Optimal Rates
- Authors: Authors: Hilal Asi, Daogao Liu
- Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC)
- Arxiv link: https://arxiv.org/abs/2311.03797
- Pdf link: https://arxiv.org/pdf/2311.03797
- Abstract We study differentially private stochastic convex optimization (DP-SCO) under user-level privacy, where each user may hold multiple data items. Existing work for user-level DP-SCO either requires super-polynomial runtime [Ghazi et al. (2023)] or requires the number of users to grow polynomially with the dimensionality of the problem with additional strict assumptions [Bassily et al. (2023)]. We develop new algorithms for user-level DP-SCO that obtain optimal rates for both convex and strongly convex functions in polynomial time and require the number of users to grow only logarithmically in the dimension. Moreover, our algorithms are the first to obtain optimal rates for non-smooth functions in polynomial time. These algorithms are based on multiple-pass DP-SGD, combined with a novel private mean estimation procedure for concentrated data, which applies an outlier removal step before estimating the mean of the gradients.
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization
- Authors: Authors: Elan Rosenfeld, Andrej Risteski
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2311.04163
- Pdf link: https://arxiv.org/pdf/2311.04163
- Abstract We identify a new phenomenon in neural network optimization which arises from the interaction of depth and a particular heavy-tailed structure in natural data. Our result offers intuitive explanations for several previously reported observations about network training dynamics. In particular, it implies a conceptually new cause for progressive sharpening and the edge of stability; we also highlight connections to other concepts in optimization and generalization including grokking, simplicity bias, and Sharpness-Aware Minimization. Experimentally, we demonstrate the significant influence of paired groups of outliers in the training data with strong opposing signals: consistent, large magnitude features which dominate the network output throughout training and provide gradients which point in opposite directions. Due to these outliers, early optimization enters a narrow valley which carefully balances the opposing groups; subsequent sharpening causes their loss to rise rapidly, oscillating between high on one group and then the other, until the overall loss spikes. We describe how to identify these groups, explore what sets them apart, and carefully study their effect on the network's optimization and behavior. We complement these experiments with a mechanistic explanation on a toy example of opposing signals and a theoretical analysis of a two-layer linear network on a simple model. Our finding enables new qualitative predictions of training behavior which we confirm experimentally. It also provides a new lens through which to study and improve modern training practices for stochastic optimization, which we highlight via a case study of Adam versus SGD.
Keyword: optimization
HIDA: A Hierarchical Dataflow Compiler for High-Level Synthesis
- Authors: Authors: Hanchen Ye, Hyegang Jun, Deming Chen
- Subjects: Hardware Architecture (cs.AR); Programming Languages (cs.PL)
- Arxiv link: https://arxiv.org/abs/2311.03379
- Pdf link: https://arxiv.org/pdf/2311.03379
- Abstract Dataflow architectures are growing in popularity due to their potential to mitigate the challenges posed by the memory wall inherent to the Von Neumann architecture. At the same time, high-level synthesis (HLS) has demonstrated its efficacy as a design methodology for generating efficient dataflow architectures within a short development cycle. However, existing HLS tools rely on developers to explore the vast dataflow design space, ultimately leading to suboptimal designs. This phenomenon is especially concerning as the size of the HLS design grows. To tackle these challenges, we introduce HIDA, a new scalable and hierarchical HLS framework that can systematically convert an algorithmic description into a dataflow implementation on hardware. We first propose a collection of efficient and versatile dataflow representations for modeling the hierarchical dataflow structure. Capitalizing on these representations, we develop an automated optimizer that decomposes the dataflow optimization problem into multiple levels based on the inherent dataflow hierarchy. Using FPGAs as an evaluation platform, working with a set of neural networks modeled in PyTorch, HIDA achieves up to 8.54$\times$ higher throughput compared to the state-of-the-art (SOTA) HLS optimization tool. Furthermore, despite being fully automated and able to handle various applications, HIDA achieves 1.29$\times$ higher throughput over the SOTA RTL-based neural network accelerators on an FPGA.
Training Multi-layer Neural Networks on Ising Machine
- Authors: Authors: Xujie Song, Tong Liu, Shengbo Eben Li, Jingliang Duan, Wenxuan Wang, Keqiang Li
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Quantum Physics (quant-ph)
- Arxiv link: https://arxiv.org/abs/2311.03408
- Pdf link: https://arxiv.org/pdf/2311.03408
- Abstract As a dedicated quantum device, Ising machines could solve large-scale binary optimization problems in milliseconds. There is emerging interest in utilizing Ising machines to train feedforward neural networks due to the prosperity of generative artificial intelligence. However, existing methods can only train single-layer feedforward networks because of the complex nonlinear network topology. This paper proposes an Ising learning algorithm to train quantized neural network (QNN), by incorporating two essential techinques, namely binary representation of topological network and order reduction of loss function. As far as we know, this is the first algorithm to train multi-layer feedforward networks on Ising machines, providing an alternative to gradient-based backpropagation. Firstly, training QNN is formulated as a quadratic constrained binary optimization (QCBO) problem by representing neuron connection and activation function as equality constraints. All quantized variables are encoded by binary bits based on binary encoding protocol. Secondly, QCBO is converted to a quadratic unconstrained binary optimization (QUBO) problem, that can be efficiently solved on Ising machines. The conversion leverages both penalty function and Rosenberg order reduction, who together eliminate equality constraints and reduce high-order loss function into a quadratic one. With some assumptions, theoretical analysis shows the space complexity of our algorithm is $\mathcal{O}(H^2L + HLN\log H)$, quantifying the required number of Ising spins. Finally, the algorithm effectiveness is validated with a simulated Ising machine on MNIST dataset. After annealing 700 ms, the classification accuracy achieves 98.3%. Among 100 runs, the success probability of finding the optimal solution is 72%. Along with the increasing number of spins on Ising machine, our algorithm has the potential to train deeper neural networks.
An AI-Guided Data Centric Strategy to Detect and Mitigate Biases in Healthcare Datasets
- Authors: Authors: Faris F. Gulamali, Ashwin S. Sawant, Lora Liharska, Carol R. Horowitz, Lili Chan, Patricia H. Kovatch, Ira Hofer, Karandeep Singh, Lynne D. Richardson, Emmanuel Mensah, Alexander W Charney, David L. Reich, Jianying Hu, Girish N. Nadkarni
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2311.03425
- Pdf link: https://arxiv.org/pdf/2311.03425
- Abstract The adoption of diagnosis and prognostic algorithms in healthcare has led to concerns about the perpetuation of bias against disadvantaged groups of individuals. Deep learning methods to detect and mitigate bias have revolved around modifying models, optimization strategies, and threshold calibration with varying levels of success. Here, we generate a data-centric, model-agnostic, task-agnostic approach to evaluate dataset bias by investigating the relationship between how easily different groups are learned at small sample sizes (AEquity). We then apply a systematic analysis of AEq values across subpopulations to identify and mitigate manifestations of racial bias in two known cases in healthcare - Chest X-rays diagnosis with deep convolutional neural networks and healthcare utilization prediction with multivariate logistic regression. AEq is a novel and broadly applicable metric that can be applied to advance equity by diagnosing and remediating bias in healthcare datasets.
Orion: A Fully Homomorphic Encryption Compiler for Private Deep Neural Network Inference
- Authors: Authors: Austin Ebel, Karthik Garimella, Brandon Reagen
- Subjects: Cryptography and Security (cs.CR)
- Arxiv link: https://arxiv.org/abs/2311.03470
- Pdf link: https://arxiv.org/pdf/2311.03470
- Abstract Fully Homomorphic Encryption (FHE) has the potential to substantially improve privacy and security by enabling computation on encrypted data. This is especially true with deep learning, as today many popular user services are powered by neural networks. One of the major challenges facing wide-scale deployment of FHE-secured neural inference is effectively mapping them to the FHE domain. FHE poses many programming challenges including packing large vectors, handling expensive rotations, and correctly implementing complex strided convolutions. This makes programming FHE inferences prone to poor performance and errors. In this paper we overcome these challenges with Orion, an automated optimizing FHE compiler for neural inference. Orion automatically maps PyTorch-specified networks to FHE, handling common layer types and arbitrary tensor shapes and strides. Moreover, we develop novel optimizations that balance dense FHE vector packing, efficient rotations, and minimize operations to improve performance. We have implemented Orion, which will be open sourced, and evaluated it on common benchmarks used by the FHE deep learning community. We compare Orion to multiple state-of-the-art solutions and report iso-accuracy speedups ranging from 2.7$\times$ to 20.5$\times$.
Type II Hamiltonian Lie Group Variational Integrators with Applications to Geometric Adjoint Sensitivity Analysis
- Authors: Authors: Brian K. Tran, Melvin Leok
- Subjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)
- Arxiv link: https://arxiv.org/abs/2311.03527
- Pdf link: https://arxiv.org/pdf/2311.03527
- Abstract Variational integrators for Euler--Lagrange equations and Hamilton's equations are a class of structure-preserving numerical methods that respect the conservative properties of such systems. Lie group variational integrators are a particular class of these integrators that apply to systems which evolve over the tangent bundle and cotangent bundle of Lie groups. Traditionally, these are constructed from a variational principle which assumes fixed position endpoints. In this paper, we instead construct Lie group variational integrators with a novel Type II variational principle on the cotangent bundle of a Lie group which allows for Type II boundary conditions, i.e., fixed initial position and final momenta; these boundary conditions are particularly important for adjoint sensitivity analysis, which is the motivating application in our paper. In general, such Type II variational principles are only globally defined on vector spaces or locally defined on general manifolds; however, by left translation, we are able to define this variational principle globally on cotangent bundles of Lie groups. By developing the continuous and discrete Type II variational principles over Lie groups, we construct a structure-preserving Lie group variational integrator that is both symplectic and momentum-preserving. Subsequently, we introduce adjoint systems on Lie groups, and show how these adjoint systems can be used to perform geometric adjoint sensitivity analysis for optimization problems on Lie groups. Finally, we conclude with two numerical examples to show how adjoint sensitivity analysis can be used to solve initial-value optimization problems and optimal control problems on Lie groups.
Time-optimal Design and Control of Electric Race Cars Equipped with Multi-speed Transmissions
- Authors: Authors: Camiel Cartignij, Mauro Salazar
- Subjects: Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2311.03545
- Pdf link: https://arxiv.org/pdf/2311.03545
- Abstract This paper presents a framework to jointly optimize the design and control of an electric race car equipped with a multiple-gear transmission, specifically accounting for the discrete gearshift dynamics. We formulate the problem as a mixed-integer optimal control problem, and deal with its complexity by combining convex optimization and Pontryagin's Minimum Principle in a computationally efficient iterative algorithm satisfying necessary conditions for optimality upon convergence. Finally, we leverage our framework to compute the achievable lap time of a race car equipped with a fixed-gear transmission, a continuously variable transmission and a multiple-gear transmission with 2 to 4 speeds, revealing that a multiple-gear transmission can strike the best trade-off in terms of electric motor control, and transmission weight and efficiency, ultimately yielding the overall best lap time.
iDb-A*: Iterative Search and Optimization for Optimal Kinodynamic Motion Planning
- Authors: Authors: Joaquim Ortiz-Haro, Wolfgang Hoenig, Valentin N. Hartmann, Marc Toussaint
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2311.03553
- Pdf link: https://arxiv.org/pdf/2311.03553
- Abstract Motion planning for robotic systems with complex dynamics is a challenging problem. While recent sampling-based algorithms achieve asymptotic optimality by propagating random control inputs, their empirical convergence rate is often poor, especially in high-dimensional systems such as multirotors. An alternative approach is to first plan with a simplified geometric model and then use trajectory optimization to follow the reference path while accounting for the true dynamics. However, this approach may fail to produce a valid trajectory if the initial guess is not close to a dynamically feasible trajectory. In this paper, we present Iterative Discontinuity Bounded A* (iDb-A*), a novel kinodynamic motion planner that combines search and optimization iteratively. The search step utilizes a finite set of short trajectories (motion primitives) that are interconnected while allowing for a bounded discontinuity between them. The optimization step locally repairs the discontinuities with trajectory optimization. By progressively reducing the allowed discontinuity and incorporating more motion primitives, our algorithm achieves asymptotic optimality with excellent any-time performance. We provide a benchmark of 43 problems across eight different dynamical systems, including different versions of unicycles and multirotors. Compared to state-of-the-art methods, iDb-A* consistently solves more problem instances and finds lower-cost solutions more rapidly.
Stochastic convergence of regularized solutions for backward heat conduction problems
- Authors: Authors: Zhongjian Wang, Wenlong Zhang, Zhiwen Zhang
- Subjects: Numerical Analysis (math.NA)
- Arxiv link: https://arxiv.org/abs/2311.03623
- Pdf link: https://arxiv.org/pdf/2311.03623
- Abstract In this paper, we study the stochastic convergence of regularized solutions for backward heat conduction problems. These problems are recognized as ill-posed due to the exponential decay of eigenvalues associated with the forward problems. We derive an error estimate for the least-squares regularized minimization problem within the framework of stochastic convergence. Our analysis reveals that the optimal error of the Tikhonov-type least-squares optimization problem depends on the noise level, the number of sensors, and the underlying ground truth. Moreover, we propose a self-adaptive algorithm to identify the optimal regularization parameter for the optimization problem without requiring knowledge of the noise level or any other prior information, which will be very practical in applications. We present numerical examples to demonstrate the accuracy and efficiency of our proposed method. These numerical results show that our method is efficient in solving backward heat conduction problems.
PUMA: Fully Decentralized Uncertainty-aware Multiagent Trajectory Planner with Real-time Image Segmentation-based Frame Alignment
- Authors: Authors: Kota Kondo, Claudius T. Tewari, Mason B. Peterson, Annika Thomas, Jouko Kinnari, Andrea Tagliabue, Jonathan P. How
- Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA)
- Arxiv link: https://arxiv.org/abs/2311.03655
- Pdf link: https://arxiv.org/pdf/2311.03655
- Abstract Fully decentralized, multiagent trajectory planners enable complex tasks like search and rescue or package delivery by ensuring safe navigation in unknown environments. However, deconflicting trajectories with other agents and ensuring collision-free paths in a fully decentralized setting is complicated by dynamic elements and localization uncertainty. To this end, this paper presents (1) an uncertainty-aware multiagent trajectory planner and (2) an image segmentation-based frame alignment pipeline. The uncertainty-aware planner propagates uncertainty associated with the future motion of detected obstacles, and by incorporating this propagated uncertainty into optimization constraints, the planner effectively navigates around obstacles. Unlike conventional methods that emphasize explicit obstacle tracking, our approach integrates implicit tracking. Sharing trajectories between agents can cause potential collisions due to frame misalignment. Addressing this, we introduce a novel frame alignment pipeline that rectifies inter-agent frame misalignment. This method leverages a zero-shot image segmentation model for detecting objects in the environment and a data association framework based on geometric consistency for map alignment. Our approach accurately aligns frames with only 0.18 m and 2.7 deg of mean frame alignment error in our most challenging simulation scenario. In addition, we conducted hardware experiments and successfully achieved 0.29 m and 2.59 deg of frame alignment error. Together with the alignment framework, our planner ensures safe navigation in unknown environments and collision avoidance in decentralized settings.
GNN-Based Beamforming for Sum-Rate Maximization in MU-MISO Networks
- Authors: Authors: Yuhang Li, Yang Lu, Bo Ai, Octavia A. Dobre, Zhiguo Ding, Dusit Niyato
- Subjects: Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2311.03659
- Pdf link: https://arxiv.org/pdf/2311.03659
- Abstract The advantages of graph neural networks (GNNs) in leveraging the graph topology of wireless networks have drawn increasing attentions. This paper studies the GNN-based learning approach for the sum-rate maximization in multiple-user multiple-input single-output (MU-MISO) networks subject to the users' individual data rate requirements and the power budget of the base station. By modeling the MU-MISO network as a graph, a GNN-based architecture named CRGAT is proposed to directly map the channel state information to the beamforming vectors. The attention-enabled aggregation and the residual-assisted combination are adopted to enhance the learning capability and avoid the oversmoothing issue. Furthermore, a novel activation function is proposed for the constraint due to the limited power budget at the base station. The CRGAT is trained in an unsupervised learning manner with two proposed loss functions. An evaluation method is proposed for the learning-based approach, based on which the effectiveness of the proposed CRGAT is validated in comparison with several convex optimization and learning based approaches. Numerical results are provided to reveal the advantages of the CRGAT including the millisecond-level response with limited optimality performance loss, the scalability to different number of users and power budgets, and the adaptability to different system settings.
Deep Bayesian Reinforcement Learning for Spacecraft Proximity Maneuvers and Docking
- Authors: Authors: Desong Du, Naiming Qi, Yanfang Liu, Wei Pan
- Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2311.03680
- Pdf link: https://arxiv.org/pdf/2311.03680
- Abstract In the pursuit of autonomous spacecraft proximity maneuvers and docking(PMD), we introduce a novel Bayesian actor-critic reinforcement learning algorithm to learn a control policy with the stability guarantee. The PMD task is formulated as a Markov decision process that reflects the relative dynamic model, the docking cone and the cost function. Drawing from the principles of Lyapunov theory, we frame the temporal difference learning as a constrained Gaussian process regression problem. This innovative approach allows the state-value function to be expressed as a Lyapunov function, leveraging the Gaussian process and deep kernel learning. We develop a novel Bayesian quadrature policy optimization procedure to analytically compute the policy gradient while integrating Lyapunov-based stability constraints. This integration is pivotal in satisfying the rigorous safety demands of spaceflight missions. The proposed algorithm has been experimentally evaluated on a spacecraft air-bearing testbed and shows impressive and promising performance.
Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models
- Authors: Authors: Longteng Zhang, Xiang Liu, Zeyu Li, Xinglin Pan, Peijie Dong, Ruibo Fan, Rui Guo, Xin Wang, Qiong Luo, Shaohuai Shi, Xiaowen Chu
- Subjects: Performance (cs.PF); Computation and Language (cs.CL); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2311.03687
- Pdf link: https://arxiv.org/pdf/2311.03687
- Abstract Large Language Models (LLMs) have seen great advance in both academia and industry, and their popularity results in numerous open-source frameworks and techniques in accelerating LLM pre-training, fine-tuning, and inference. Training and deploying LLMs are expensive as it requires considerable computing resources and memory, hence many efficient approaches have been developed for improving system pipelines as well as operators. However, the runtime performance can vary significantly across hardware and software stacks, which makes it difficult to choose the best configuration. In this work, we aim to benchmark the performance from both macro and micro perspectives. First, we benchmark the end-to-end performance of pre-training, fine-tuning, and serving LLMs in different sizes , i.e., 7, 13, and 70 billion parameters (7B, 13B, and 70B) on three 8-GPU platforms with and without individual optimization techniques, including ZeRO, quantization, recomputation, FlashAttention. Then, we dive deeper to provide a detailed runtime analysis of the sub-modules, including computing and communication operators in LLMs. For end users, our benchmark and findings help better understand different optimization techniques, training and inference frameworks, together with hardware platforms in choosing configurations for deploying LLMs. For researchers, our in-depth module-wise analyses discover potential opportunities for future work to further optimize the runtime performance of LLMs.
Loss Balancing for Fair Supervised Learning
- Authors: Authors: Mohammad Mahdi Khalili, Xueru Zhang, Mahed Abroshan
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2311.03714
- Pdf link: https://arxiv.org/pdf/2311.03714
- Abstract Supervised learning models have been used in various domains such as lending, college admission, face recognition, natural language processing, etc. However, they may inherit pre-existing biases from training data and exhibit discrimination against protected social groups. Various fairness notions have been proposed to address unfairness issues. In this work, we focus on Equalized Loss (EL), a fairness notion that requires the expected loss to be (approximately) equalized across different groups. Imposing EL on the learning process leads to a non-convex optimization problem even if the loss function is convex, and the existing fair learning algorithms cannot properly be adopted to find the fair predictor under the EL constraint. This paper introduces an algorithm that can leverage off-the-shelf convex programming tools (e.g., CVXPY) to efficiently find the global optimum of this non-convex optimization. In particular, we propose the ELminimizer algorithm, which finds the optimal fair predictor under EL by reducing the non-convex optimization to a sequence of convex optimization problems. We theoretically prove that our algorithm finds the global optimal solution under certain conditions. Then, we support our theoretical results through several empirical studies.
Posterior Sampling-Based Bayesian Optimization with Tighter Bayesian Regret Bounds
- Authors: Authors: Shion Takeno, Yu Inatsu, Masayuki Karasuyama, Ichiro Takeuchi
- Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2311.03760
- Pdf link: https://arxiv.org/pdf/2311.03760
- Abstract Among various acquisition functions (AFs) in Bayesian optimization (BO), Gaussian process upper confidence bound (GP-UCB) and Thompson sampling (TS) are well-known options with established theoretical properties regarding Bayesian cumulative regret (BCR). Recently, it has been shown that a randomized variant of GP-UCB achieves a tighter BCR bound compared with GP-UCB, which we call the tighter BCR bound for brevity. Inspired by this study, this paper first shows that TS achieves the tighter BCR bound. On the other hand, GP-UCB and TS often practically suffer from manual hyperparameter tuning and over-exploration issues, respectively. To overcome these difficulties, we propose yet another AF called a probability of improvement from the maximum of a sample path (PIMS). We show that PIMS achieves the tighter BCR bound and avoids the hyperparameter tuning, unlike GP-UCB. Furthermore, we demonstrate a wide range of experiments, focusing on the effectiveness of PIMS that mitigates the practical issues of GP-UCB and TS.
Multi-Beam Forming with Movable-Antenna Array
- Authors: Authors: Wenyan Ma, Lipeng Zhu, Rui Zhang
- Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2311.03775
- Pdf link: https://arxiv.org/pdf/2311.03775
- Abstract Conventional multi-beam forming with fixed-position antenna (FPA) arrays needs to trade-off between maximizing the beamforming gain over desired directions and minimizing the interference power over undesired directions. In this letter, we study the enhanced multi-beam forming with a linear movable-antenna (MA) array by exploiting the new degrees of freedom (DoFs) via antennas' position optimization. Specifically, we jointly optimize the antenna position vector (APV) and antenna weight vector (AWV) to maximize the minimum beamforming gain over multiple desired directions, subject to a given constraint on the maximum interference power over undesired directions. We propose an efficient alternating optimization algorithm to find a suboptimal solution by iteratively optimizing one of the APV and AWV with the other being fixed. Numerical results show that the proposed multi-beam forming design with MA arrays can significantly outperform that with the traditional FPA arrays and other benchmark schemes in terms of both beamforming gain and interference suppression.
User-level Differentially Private Stochastic Convex Optimization: Efficient Algorithms with Optimal Rates
- Authors: Authors: Hilal Asi, Daogao Liu
- Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC)
- Arxiv link: https://arxiv.org/abs/2311.03797
- Pdf link: https://arxiv.org/pdf/2311.03797
- Abstract We study differentially private stochastic convex optimization (DP-SCO) under user-level privacy, where each user may hold multiple data items. Existing work for user-level DP-SCO either requires super-polynomial runtime [Ghazi et al. (2023)] or requires the number of users to grow polynomially with the dimensionality of the problem with additional strict assumptions [Bassily et al. (2023)]. We develop new algorithms for user-level DP-SCO that obtain optimal rates for both convex and strongly convex functions in polynomial time and require the number of users to grow only logarithmically in the dimension. Moreover, our algorithms are the first to obtain optimal rates for non-smooth functions in polynomial time. These algorithms are based on multiple-pass DP-SGD, combined with a novel private mean estimation procedure for concentrated data, which applies an outlier removal step before estimating the mean of the gradients.
On Deep Reinforcement Learning for Traffic Steering Intelligent ORAN
- Authors: Authors: Fatemeh Kavehmadavani, Van-Dinh Nguyen, Thang X. Vu, Symeon Chatzinotas
- Subjects: Networking and Internet Architecture (cs.NI)
- Arxiv link: https://arxiv.org/abs/2311.03853
- Pdf link: https://arxiv.org/pdf/2311.03853
- Abstract This paper aims to develop the intelligent traffic steering (TS) framework, which has recently been considered as one of the key developments of 3GPP for advanced 5G. Since achieving key performance indicators (KPIs) for heterogeneous services may not be possible in the monolithic architecture, a novel deep reinforcement learning (DRL)-based TS algorithm is proposed at the non-real-time (non-RT) RAN intelligent controller (RIC) within the open radio access network (ORAN) architecture. To enable ORAN's intelligence, we distribute traffic load onto appropriate paths, which helps efficiently allocate resources to end users in a downlink multi-service scenario. Our proposed approach employs a three-step hierarchical process that involves heuristics, machine learning, and convex optimization to steer traffic flows. Through system-level simulations, we show the superior performance of the proposed intelligent TS scheme, surpassing established benchmark systems by 45.50%.
An Explainable Framework for Machine learning-Based Reactive Power Optimization of Distribution Network
- Authors: Authors: Wenlong Liao, Benjamin Schäfer, Dalin Qin, Gonghao Zhang, Zhixian Wang, Zhe Yang
- Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2311.03863
- Pdf link: https://arxiv.org/pdf/2311.03863
- Abstract To reduce the heavy computational burden of reactive power optimization of distribution networks, machine learning models are receiving increasing attention. However, most machine learning models (e.g., neural networks) are usually considered as black boxes, making it challenging for power system operators to identify and comprehend potential biases or errors in the decision-making process of machine learning models. To address this issue, an explainable machine-learning framework is proposed to optimize the reactive power in distribution networks. Firstly, a Shapley additive explanation framework is presented to measure the contribution of each input feature to the solution of reactive power optimizations generated from machine learning models. Secondly, a model-agnostic approximation method is developed to estimate Shapley values, so as to avoid the heavy computational burden associated with direct calculations of Shapley values. The simulation results show that the proposed explainable framework can accurately explain the solution of the machine learning model-based reactive power optimization by using visual analytics, from both global and instance perspectives. Moreover, the proposed explainable framework is model-agnostic, and thus applicable to various models (e.g., neural networks).
Improved Deterministic Streaming Algorithms for Non-monotone Submodular Maximization
- Authors: Authors: Xiaoming Sun, Jialin Zhang, Shuo Zhang
- Subjects: Data Structures and Algorithms (cs.DS)
- Arxiv link: https://arxiv.org/abs/2311.03895
- Pdf link: https://arxiv.org/pdf/2311.03895
- Abstract Submodular maximization is one of the central topics in combinatorial optimization. It has found numerous applications in the real world. Streaming algorithms for submodule maximization have gained attention in recent years, allowing for real-time processing of large data sets by looking at each piece of data only once. However, most of the state-of-the-art algorithms are subject to monotone cardinality constraint. There remain non-negligible gaps with respect to approximation ratios between cardinality and other constraints like $d$-knapsack in non-monotone submodular maximization. In this paper, we propose deterministic algorithms with improved approximation ratios for non-monotone submodular maximization. Specifically, for the cardinality constraint, we provide a deterministic $1/6-\epsilon$ approximation algorithm with $O(\frac{k\log k}{\epsilon})$ memory and sublinear query time, while the previous best algorithm is randomized with a $0.1921$ approximation ratio. To the best of our knowledge, this is the first deterministic streaming algorithm for the cardinality constraint. For the $d$-knapsack constraint, we provide a deterministic $\frac{1}{4(d+1)}-\epsilon$ approximation algorithm with $O(\frac{b\log b}{\epsilon})$ memory and $O(\frac{\log b}{\epsilon})$ query time per element. To the best of our knowledge, there is currently no streaming algorithm for this constraint.
Primal Separation and Approximation for the ${0, 1/2}$-closure
- Authors: Authors: Lukas Brandl, Andreas S. Schulz
- Subjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO); Optimization and Control (math.OC)
- Arxiv link: https://arxiv.org/abs/2311.03909
- Pdf link: https://arxiv.org/pdf/2311.03909
- Abstract We advance the theoretical study of ${0, 1/2}$-cuts for integer programming problems $\max{c^T x \colon A x \leq b, x \text{ integer}}$. Such cuts are Gomory-Chv'atal cuts that only need multipliers of value $0$ or $1/2$ in their derivation. The intersection of all ${0, 1/2}$-cuts derived from $Ax \le b$ is denoted by $P_{1/2}$ and called the ${0,1/2}$-closure of $P = {x : Ax \le b}$. The primal separation problem for ${0, 1/2}$-cuts is: Given a vertex $\hat x$ of the integer hull of $P$ and some fractional point $x^* \in P$, does there exist a ${0,1/2}$-cut that is tight at $\hat x$ and violated by $x^*$? Primal separation is the key ingredient of primal cutting-plane approaches to integer programming. In general, primal separation for ${0,1/2}$-cuts is NP-hard. We present two cases for which primal separation is solvable in polynomial time. As an interesting side product, we obtain a(nother) simple proof that matching can be solved in polynomial time. Furthermore, since optimization over the Gomory-Chv'atal closure is also NP-hard, there has been recent research on solving the optimization problem over the Gomory-Chv'atal closure approximately. In a similar spirit, we show that the optimization problem over the ${0,1/2}$-closure can be solved in polynomial time up to a factor $(1 + \varepsilon)$, for any fixed $\varepsilon > 0$.
Analysis of NaN Divergence in Training Monocular Depth Estimation Model
- Authors: Authors: Bum Jun Kim, Hyeonah Jang, Sang Woo Kim
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2311.03938
- Pdf link: https://arxiv.org/pdf/2311.03938
- Abstract The latest advances in deep learning have facilitated the development of highly accurate monocular depth estimation models. However, when training a monocular depth estimation network, practitioners and researchers have observed not a number (NaN) loss, which disrupts gradient descent optimization. Although several practitioners have reported the stochastic and mysterious occurrence of NaN loss that bothers training, its root cause is not discussed in the literature. This study conducted an in-depth analysis of NaN loss during training a monocular depth estimation network and identified three types of vulnerabilities that cause NaN loss: 1) the use of square root loss, which leads to an unstable gradient; 2) the log-sigmoid function, which exhibits numerical stability issues; and 3) certain variance implementations, which yield incorrect computations. Furthermore, for each vulnerability, the occurrence of NaN loss was demonstrated and practical guidelines to prevent NaN loss were presented. Experiments showed that both optimization stability and performance on monocular depth estimation could be improved by following our guidelines.
Federated Learning via Active RIS Assisted Over-the-Air Computation
- Authors: Authors: Deyou Zhang, Ming Xiao, Mikael Skoglund, H. Vincent Poor
- Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2311.03982
- Pdf link: https://arxiv.org/pdf/2311.03982
- Abstract In this paper, we propose leveraging the active reconfigurable intelligence surface (RIS) to support reliable gradient aggregation for over-the-air computation (AirComp) enabled federated learning (FL) systems. An analysis of the FL convergence property reveals that minimizing gradient aggregation errors in each training round is crucial for narrowing the convergence gap. As such, we formulate an optimization problem, aiming to minimize these errors by jointly optimizing the transceiver design and RIS configuration. To handle the formulated highly non-convex problem, we devise a two-layer alternative optimization framework to decompose it into several convex subproblems, each solvable optimally. Simulation results demonstrate the superiority of the active RIS in reducing gradient aggregation errors compared to its passive counterpart.
Impact of HPO on AutoML Forecasting Ensembles
- Authors: Authors: David Hoffmann
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2311.04034
- Pdf link: https://arxiv.org/pdf/2311.04034
- Abstract A forecasting ensemble consisting of a diverse range of estimators for both local and global univariate forecasting, in particular MQ-CNN,DeepAR, Prophet, NPTS, ARIMA and ETS, can be used to make forecasts for a variety of problems. This paper delves into the aspect of adding different hyperparameter optimization strategies to the deep learning models in such a setup (DeepAR and MQ-CNN), exploring the trade-off between added training cost and the increase in accuracy for different configurations. It shows that in such a setup, adding hyperparameter optimization can lead to performance improvements, with the final setup having a 9.9 % percent accuracy improvement with respect to the avg-wQL over the baseline ensemble without HPO, accompanied by a 65.8 % increase in end-to-end ensemble latency. This improvement is based on an empirical analysis of combining the ensemble pipeline with different tuning strategies, namely Bayesian Optimisation and Hyperband and different configurations of those strategies. In the final configuration, the proposed combination of ensemble learning and HPO outperforms the state of the art commercial AutoML forecasting solution, Amazon Forecast, with a 3.5 % lower error and 16.0 % lower end-to-end ensemble latency.
Data exploitation: multi-task learning of object detection and semantic segmentation on partially annotated data
- Authors: Authors: Hoàng-Ân Lê, Minh-Tan Pham
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2311.04040
- Pdf link: https://arxiv.org/pdf/2311.04040
- Abstract Multi-task partially annotated data where each data point is annotated for only a single task are potentially helpful for data scarcity if a network can leverage the inter-task relationship. In this paper, we study the joint learning of object detection and semantic segmentation, the two most popular vision problems, from multi-task data with partial annotations. Extensive experiments are performed to evaluate each task performance and explore their complementarity when a multi-task network cannot optimize both tasks simultaneously. We propose employing knowledge distillation to leverage joint-task optimization. The experimental results show favorable results for multi-task learning and knowledge distillation over single-task learning and even full supervision scenario. All code and data splits are available at https://github.com/lhoangan/multas
Implementation and Comparison of Methods to Extract Reliability KPIs out of Textual Wind Turbine Maintenance Work Orders
- Authors: Authors: Marc-Alexander Lutz, Bastian Schäfermeier, Rachael Sexton, Michael Sharp, Alden Dima, Stefan Faulstich, Jagan Mohini Aluri
- Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2311.04064
- Pdf link: https://arxiv.org/pdf/2311.04064
- Abstract Maintenance work orders are commonly used to document information about wind turbine operation and maintenance. This includes details about proactive and reactive wind turbine downtimes, such as preventative and corrective maintenance. However, the information contained in maintenance work orders is often unstructured and difficult to analyze, making it challenging for decision-makers to use this information for optimizing operation and maintenance. To address this issue, this work presents three different approaches to calculate reliability key performance indicators from maintenance work orders. The first approach involves manual labeling of the maintenance work orders by domain experts, using the schema defined in an industrial guideline to assign the label accordingly. The second approach involves the development of a model that automatically labels the maintenance work orders using text classification methods. The third technique uses an AI-assisted tagging tool to tag and structure the raw maintenance information contained in the maintenance work orders. The resulting calculated reliability key performance indicator of the first approach are used as a benchmark for comparison with the results of the second and third approaches. The quality and time spent are considered as criteria for evaluation. Overall, these three methods make extracting maintenance information from maintenance work orders more efficient, enable the assessment of reliability key performance indicators and therefore support the optimization of wind turbine operation and maintenance.
Formulating and Heuristic Solving of Contact Problems in Hybrid Data-Driven Computational Mechanics
- Authors: Authors: Cristian Guillermo Gebhardt, Senta Lange, Marc Christian Steinbach
- Subjects: Numerical Analysis (math.NA)
- Arxiv link: https://arxiv.org/abs/2311.04083
- Pdf link: https://arxiv.org/pdf/2311.04083
- Abstract In this work we consider the hybrid Data-Driven Computational Mechanics (DDCM) approach, in which a smooth constitutive manifold is reconstructed to obtain a well-behaved nonlinear optimization problem (NLP) rather than the much harder discrete-continous NLP (DCNLP) of the direct DDCM approach. The key focus is on the addition of geometric inequality constraints to the hybrid DDCM formulation. Therein, the required constraint force leads to a contact problem in the form of a mathematical program with complementarity constraints (MPCC), a problem class that is still less complex than the DCNLP. For this MPCC we propose a heuristic quick-shot solution approach, which can produce verifiable solutions by solving up to four NLPs. We perform various numerical experiments on three different contact problems of increasing difficulty to demonstrate the potential and limitations of this approach.
From Diagram to Deployment: Translating BPMN Collaborations into X-Klaim for Efficient Multi-Robot System Programming
- Authors: Authors: Khalid Bourr, Francesco Tiezzi
- Subjects: Robotics (cs.RO); Software Engineering (cs.SE)
- Arxiv link: https://arxiv.org/abs/2311.04126
- Pdf link: https://arxiv.org/pdf/2311.04126
- Abstract This paper introduces a novel method for translating Business Process Model and Notation (BPMN) diagrams into executable X-Klaim code for Multi-Robot Systems (MRSs). Merging the clarity of BPMN with the operational strength of X-Klaim, we enable the design and execution of complex robotic interactions without requiring in-depth knowledge of the underlying programming language from the users. Our approach maintains the BPMN model's core design principles and logic in the translation to X-Klaim, thus enhancing the readability and maintainability of MRS applications. We offer a series of translated examples, address optimization strategies, and introduce the B2XKLAIM tool, which automates the conversion process. This method aims to streamline MRS programming and improve collaboration between roboticists and domain experts throughout the design and implementation stages.
Black-Box Prompt Optimization: Aligning Large Language Models without Model Training
- Authors: Authors: Jiale Cheng, Xiao Liu, Kehan Zheng, Pei Ke, Hongning Wang, Yuxiao Dong, Jie Tang, Minlie Huang
- Subjects: Computation and Language (cs.CL)
- Arxiv link: https://arxiv.org/abs/2311.04155
- Pdf link: https://arxiv.org/pdf/2311.04155
- Abstract Large language models (LLMs) have shown impressive success in various applications. However, these models are often not well aligned with human intents, which calls for additional treatments on them, that is, the alignment problem. To make LLMs better follow user instructions, existing alignment methods mostly focus on further training them. However, the extra training of LLMs are usually expensive in terms of GPU compute; worse still, LLMs of interest are oftentimes not accessible for user-demanded training, such as GPTs. In this work, we take a different perspective -- Black-Box Prompt Optimization (BPO) -- to perform alignments. The idea is to optimize user prompts to suit LLMs' input understanding, so as to best realize users' intents without updating LLMs' parameters. BPO is model-agnostic and the empirical results demonstrate that the BPO-aligned ChatGPT yields a 22% increase in the win rate against its original version, and 10% for GPT-4. Importantly, the \model-aligned LLMs can outperform the same models aligned by PPO and DPO, and it also brings additional performance gains when combining \model with PPO or DPO. Code and datasets are released at https://github.com/thu-coai/BPO.
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization
- Authors: Authors: Elan Rosenfeld, Andrej Risteski
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2311.04163
- Pdf link: https://arxiv.org/pdf/2311.04163
- Abstract We identify a new phenomenon in neural network optimization which arises from the interaction of depth and a particular heavy-tailed structure in natural data. Our result offers intuitive explanations for several previously reported observations about network training dynamics. In particular, it implies a conceptually new cause for progressive sharpening and the edge of stability; we also highlight connections to other concepts in optimization and generalization including grokking, simplicity bias, and Sharpness-Aware Minimization. Experimentally, we demonstrate the significant influence of paired groups of outliers in the training data with strong opposing signals: consistent, large magnitude features which dominate the network output throughout training and provide gradients which point in opposite directions. Due to these outliers, early optimization enters a narrow valley which carefully balances the opposing groups; subsequent sharpening causes their loss to rise rapidly, oscillating between high on one group and then the other, until the overall loss spikes. We describe how to identify these groups, explore what sets them apart, and carefully study their effect on the network's optimization and behavior. We complement these experiments with a mechanistic explanation on a toy example of opposing signals and a theoretical analysis of a two-layer linear network on a simple model. Our finding enables new qualitative predictions of training behavior which we confirm experimentally. It also provides a new lens through which to study and improve modern training practices for stochastic optimization, which we highlight via a case study of Adam versus SGD.
Measure transport via polynomial density surrogates
- Authors: Authors: Josephine Westermann, Jakob Zech
- Subjects: Numerical Analysis (math.NA); Statistics Theory (math.ST)
- Arxiv link: https://arxiv.org/abs/2311.04172
- Pdf link: https://arxiv.org/pdf/2311.04172
- Abstract We discuss an algorithm to compute transport maps that couple the uniform measure on $[0,1]^d$ with a specified target distribution $\pi$ on $[0,1]^d$. The primary objectives are either to sample from or to compute expectations w.r.t. $\pi$. The method is based on leveraging a polynomial surrogate of the target density, which is obtained by a least-squares or interpolation approximation. We discuss the design and construction of suitable sparse approximation spaces, and provide a complete error and cost analysis for target densities belonging to certain smoothness classes. Further, we explore the relation between our proposed algorithm and related approaches that aim to find suitable transports via optimization over a class of parametrized transports. Finally, we discuss the efficient implementation of our algorithm and report on numerical experiments which confirm our theory.
Keyword: adam
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization
- Authors: Authors: Elan Rosenfeld, Andrej Risteski
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2311.04163
- Pdf link: https://arxiv.org/pdf/2311.04163
- Abstract We identify a new phenomenon in neural network optimization which arises from the interaction of depth and a particular heavy-tailed structure in natural data. Our result offers intuitive explanations for several previously reported observations about network training dynamics. In particular, it implies a conceptually new cause for progressive sharpening and the edge of stability; we also highlight connections to other concepts in optimization and generalization including grokking, simplicity bias, and Sharpness-Aware Minimization. Experimentally, we demonstrate the significant influence of paired groups of outliers in the training data with strong opposing signals: consistent, large magnitude features which dominate the network output throughout training and provide gradients which point in opposite directions. Due to these outliers, early optimization enters a narrow valley which carefully balances the opposing groups; subsequent sharpening causes their loss to rise rapidly, oscillating between high on one group and then the other, until the overall loss spikes. We describe how to identify these groups, explore what sets them apart, and carefully study their effect on the network's optimization and behavior. We complement these experiments with a mechanistic explanation on a toy example of opposing signals and a theoretical analysis of a two-layer linear network on a simple model. Our finding enables new qualitative predictions of training behavior which we confirm experimentally. It also provides a new lens through which to study and improve modern training practices for stochastic optimization, which we highlight via a case study of Adam versus SGD.
Keyword: gradient
Leveraging Generative AI: Improving Software Metadata Classification with Generated Code-Comment Pairs
- Authors: Authors: Samah Syed, Angel Deborah S
- Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2311.03365
- Pdf link: https://arxiv.org/pdf/2311.03365
- Abstract In software development, code comments play a crucial role in enhancing code comprehension and collaboration. This research paper addresses the challenge of objectively classifying code comments as "Useful" or "Not Useful." We propose a novel solution that harnesses contextualized embeddings, particularly BERT, to automate this classification process. We address this task by incorporating generated code and comment pairs. The initial dataset comprised 9048 pairs of code and comments written in C, labeled as either Useful or Not Useful. To augment this dataset, we sourced an additional 739 lines of code-comment pairs and generated labels using a Large Language Model Architecture, specifically BERT. The primary objective was to build classification models that can effectively differentiate between useful and not useful code comments. Various machine learning algorithms were employed, including Logistic Regression, Decision Tree, K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Gradient Boosting, Random Forest, and a Neural Network. Each algorithm was evaluated using precision, recall, and F1-score metrics, both with the original seed dataset and the augmented dataset. This study showcases the potential of generative AI for enhancing binary code comment quality classification models, providing valuable insights for software developers and researchers in the field of natural language processing and software engineering.
Unscrambling the Rectification of Adversarial Attacks Transferability across Computer Networks
- Authors: Authors: Ehsan Nowroozi, Samaneh Ghelichkhani, Imran Haider, Ali Dehghantanha
- Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2311.03373
- Pdf link: https://arxiv.org/pdf/2311.03373
- Abstract Convolutional neural networks (CNNs) models play a vital role in achieving state-of-the-art performances in various technological fields. CNNs are not limited to Natural Language Processing (NLP) or Computer Vision (CV) but also have substantial applications in other technological domains, particularly in cybersecurity. The reliability of CNN's models can be compromised because of their susceptibility to adversarial attacks, which can be generated effortlessly, easily applied, and transferred in real-world scenarios. In this paper, we present a novel and comprehensive method to improve the strength of attacks and assess the transferability of adversarial examples in CNNs when such strength changes, as well as whether the transferability property issue exists in computer network applications. In the context of our study, we initially examined six distinct modes of attack: the Carlini and Wagner (C&W), Fast Gradient Sign Method (FGSM), Iterative Fast Gradient Sign Method (I-FGSM), Jacobian-based Saliency Map (JSMA), Limited-memory Broyden fletcher Goldfarb Shanno (L-BFGS), and Projected Gradient Descent (PGD) attack. We applied these attack techniques on two popular datasets: the CIC and UNSW datasets. The outcomes of our experiment demonstrate that an improvement in transferability occurs in the targeted scenarios for FGSM, JSMA, LBFGS, and other attacks. Our findings further indicate that the threats to security posed by adversarial examples, even in computer network applications, necessitate the development of novel defense mechanisms to enhance the security of DL-based techniques.
Communication Efficient and Privacy-Preserving Federated Learning Based on Evolution Strategies
- Authors: Authors: Guangchen Lan
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2311.03405
- Pdf link: https://arxiv.org/pdf/2311.03405
- Abstract Federated learning (FL) is an emerging paradigm for training deep neural networks (DNNs) in distributed manners. Current FL approaches all suffer from high communication overhead and information leakage. In this work, we present a federated learning algorithm based on evolution strategies (FedES), a zeroth-order training method. Instead of transmitting model parameters, FedES only communicates loss values, and thus has very low communication overhead. Moreover, a third party is unable to estimate gradients without knowing the pre-shared seed, which protects data privacy. Experimental results demonstrate FedES can achieve the above benefits while keeping convergence performance the same as that with back propagation methods.
Training Multi-layer Neural Networks on Ising Machine
- Authors: Authors: Xujie Song, Tong Liu, Shengbo Eben Li, Jingliang Duan, Wenxuan Wang, Keqiang Li
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Quantum Physics (quant-ph)
- Arxiv link: https://arxiv.org/abs/2311.03408
- Pdf link: https://arxiv.org/pdf/2311.03408
- Abstract As a dedicated quantum device, Ising machines could solve large-scale binary optimization problems in milliseconds. There is emerging interest in utilizing Ising machines to train feedforward neural networks due to the prosperity of generative artificial intelligence. However, existing methods can only train single-layer feedforward networks because of the complex nonlinear network topology. This paper proposes an Ising learning algorithm to train quantized neural network (QNN), by incorporating two essential techinques, namely binary representation of topological network and order reduction of loss function. As far as we know, this is the first algorithm to train multi-layer feedforward networks on Ising machines, providing an alternative to gradient-based backpropagation. Firstly, training QNN is formulated as a quadratic constrained binary optimization (QCBO) problem by representing neuron connection and activation function as equality constraints. All quantized variables are encoded by binary bits based on binary encoding protocol. Secondly, QCBO is converted to a quadratic unconstrained binary optimization (QUBO) problem, that can be efficiently solved on Ising machines. The conversion leverages both penalty function and Rosenberg order reduction, who together eliminate equality constraints and reduce high-order loss function into a quadratic one. With some assumptions, theoretical analysis shows the space complexity of our algorithm is $\mathcal{O}(H^2L + HLN\log H)$, quantifying the required number of Ising spins. Finally, the algorithm effectiveness is validated with a simulated Ising machine on MNIST dataset. After annealing 700 ms, the classification accuracy achieves 98.3%. Among 100 runs, the success probability of finding the optimal solution is 72%. Along with the increasing number of spins on Ising machine, our algorithm has the potential to train deeper neural networks.
Stable Modular Control via Contraction Theory for Reinforcement Learning
- Authors: Authors: Bing Song, Jean-Jacques Slotine, Quang-Cuong Pham
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2311.03669
- Pdf link: https://arxiv.org/pdf/2311.03669
- Abstract We propose a novel way to integrate control techniques with reinforcement learning (RL) for stability, robustness, and generalization: leveraging contraction theory to realize modularity in neural control, which ensures that combining stable subsystems can automatically preserve the stability. We realize such modularity via signal composition and dynamic decomposition. Signal composition creates the latent space, within which RL applies to maximizing rewards. Dynamic decomposition is realized by coordinate transformation that creates an auxiliary space, within which the latent signals are coupled in the way that their combination can preserve stability provided each signal, that is, each subsystem, has stable self-feedbacks. Leveraging modularity, the nonlinear stability problem is deconstructed into algebraically solvable ones, the stability of the subsystems in the auxiliary space, yielding linear constraints on the input gradients of control networks that can be as simple as switching the signs of network weights. This minimally invasive method for stability allows arguably easy integration into the modular neural architectures in machine learning, like hierarchical RL, and improves their performance. We demonstrate in simulation the necessity and the effectiveness of our method: the necessity for robustness and generalization, and the effectiveness in improving hierarchical RL for manipulation learning.
Deep Bayesian Reinforcement Learning for Spacecraft Proximity Maneuvers and Docking
- Authors: Authors: Desong Du, Naiming Qi, Yanfang Liu, Wei Pan
- Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2311.03680
- Pdf link: https://arxiv.org/pdf/2311.03680
- Abstract In the pursuit of autonomous spacecraft proximity maneuvers and docking(PMD), we introduce a novel Bayesian actor-critic reinforcement learning algorithm to learn a control policy with the stability guarantee. The PMD task is formulated as a Markov decision process that reflects the relative dynamic model, the docking cone and the cost function. Drawing from the principles of Lyapunov theory, we frame the temporal difference learning as a constrained Gaussian process regression problem. This innovative approach allows the state-value function to be expressed as a Lyapunov function, leveraging the Gaussian process and deep kernel learning. We develop a novel Bayesian quadrature policy optimization procedure to analytically compute the policy gradient while integrating Lyapunov-based stability constraints. This integration is pivotal in satisfying the rigorous safety demands of spaceflight missions. The proposed algorithm has been experimentally evaluated on a spacecraft air-bearing testbed and shows impressive and promising performance.
User-level Differentially Private Stochastic Convex Optimization: Efficient Algorithms with Optimal Rates
- Authors: Authors: Hilal Asi, Daogao Liu
- Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC)
- Arxiv link: https://arxiv.org/abs/2311.03797
- Pdf link: https://arxiv.org/pdf/2311.03797
- Abstract We study differentially private stochastic convex optimization (DP-SCO) under user-level privacy, where each user may hold multiple data items. Existing work for user-level DP-SCO either requires super-polynomial runtime [Ghazi et al. (2023)] or requires the number of users to grow polynomially with the dimensionality of the problem with additional strict assumptions [Bassily et al. (2023)]. We develop new algorithms for user-level DP-SCO that obtain optimal rates for both convex and strongly convex functions in polynomial time and require the number of users to grow only logarithmically in the dimension. Moreover, our algorithms are the first to obtain optimal rates for non-smooth functions in polynomial time. These algorithms are based on multiple-pass DP-SGD, combined with a novel private mean estimation procedure for concentrated data, which applies an outlier removal step before estimating the mean of the gradients.
Reducing Spatial Fitting Error in Distillation of Denoising Diffusion Models
- Authors: Authors: Shengzhe Zhou, Zejian Lee, Shengyuan Zhang, Lefan Hou, Changyuan Yang, Guang Yang, Lingyun Sun
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2311.03830
- Pdf link: https://arxiv.org/pdf/2311.03830
- Abstract Denoising Diffusion models have exhibited remarkable capabilities in image generation. However, generating high-quality samples requires a large number of iterations. Knowledge distillation for diffusion models is an effective method to address this limitation with a shortened sampling process but causes degraded generative quality. Based on our analysis with bias-variance decomposition and experimental observations, we attribute the degradation to the spatial fitting error occurring in the training of both the teacher and student model. Accordingly, we propose $\textbf{S}$patial $\textbf{F}$itting-$\textbf{E}$rror $\textbf{R}$eduction $\textbf{D}$istillation model ($\textbf{SFERD}$). SFERD utilizes attention guidance from the teacher model and a designed semantic gradient predictor to reduce the student's fitting error. Empirically, our proposed model facilitates high-quality sample generation in a few function evaluations. We achieve an FID of 5.31 on CIFAR-10 and 9.39 on ImageNet 64$\times$64 with only one step, outperforming existing diffusion methods. Our study provides a new perspective on diffusion distillation by highlighting the intrinsic denoising ability of models.
FLORA: Fine-grained Low-Rank Architecture Search for Vision Transformer
- Authors: Authors: Chi-Chih Chang, Yuan-Yao Sung, Shixing Yu, Ning-Chi Huang, Diana Marculescu, Kai-Chiang Wu
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2311.03912
- Pdf link: https://arxiv.org/pdf/2311.03912
- Abstract Vision Transformers (ViT) have recently demonstrated success across a myriad of computer vision tasks. However, their elevated computational demands pose significant challenges for real-world deployment. While low-rank approximation stands out as a renowned method to reduce computational loads, efficiently automating the target rank selection in ViT remains a challenge. Drawing from the notable similarity and alignment between the processes of rank selection and One-Shot NAS, we introduce FLORA, an end-to-end automatic framework based on NAS. To overcome the design challenge of supernet posed by vast search space, FLORA employs a low-rank aware candidate filtering strategy. This method adeptly identifies and eliminates underperforming candidates, effectively alleviating potential undertraining and interference among subnetworks. To further enhance the quality of low-rank supernets, we design a low-rank specific training paradigm. First, we propose weight inheritance to construct supernet and enable gradient sharing among low-rank modules. Secondly, we adopt low-rank aware sampling to strategically allocate training resources, taking into account inherited information from pre-trained models. Empirical results underscore FLORA's efficacy. With our method, a more fine-grained rank configuration can be generated automatically and yield up to 33% extra FLOPs reduction compared to a simple uniform configuration. More specific, FLORA-DeiT-B/FLORA-Swin-B can save up to 55%/42% FLOPs almost without performance degradtion. Importantly, FLORA boasts both versatility and orthogonality, offering an extra 21%-26% FLOPs reduction when integrated with leading compression techniques or compact hybrid structures. Our code is publicly available at https://github.com/shadowpa0327/FLORA.
Analysis of NaN Divergence in Training Monocular Depth Estimation Model
- Authors: Authors: Bum Jun Kim, Hyeonah Jang, Sang Woo Kim
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2311.03938
- Pdf link: https://arxiv.org/pdf/2311.03938
- Abstract The latest advances in deep learning have facilitated the development of highly accurate monocular depth estimation models. However, when training a monocular depth estimation network, practitioners and researchers have observed not a number (NaN) loss, which disrupts gradient descent optimization. Although several practitioners have reported the stochastic and mysterious occurrence of NaN loss that bothers training, its root cause is not discussed in the literature. This study conducted an in-depth analysis of NaN loss during training a monocular depth estimation network and identified three types of vulnerabilities that cause NaN loss: 1) the use of square root loss, which leads to an unstable gradient; 2) the log-sigmoid function, which exhibits numerical stability issues; and 3) certain variance implementations, which yield incorrect computations. Furthermore, for each vulnerability, the occurrence of NaN loss was demonstrated and practical guidelines to prevent NaN loss were presented. Experiments showed that both optimization stability and performance on monocular depth estimation could be improved by following our guidelines.
Federated Learning via Active RIS Assisted Over-the-Air Computation
- Authors: Authors: Deyou Zhang, Ming Xiao, Mikael Skoglund, H. Vincent Poor
- Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2311.03982
- Pdf link: https://arxiv.org/pdf/2311.03982
- Abstract In this paper, we propose leveraging the active reconfigurable intelligence surface (RIS) to support reliable gradient aggregation for over-the-air computation (AirComp) enabled federated learning (FL) systems. An analysis of the FL convergence property reveals that minimizing gradient aggregation errors in each training round is crucial for narrowing the convergence gap. As such, we formulate an optimization problem, aiming to minimize these errors by jointly optimizing the transceiver design and RIS configuration. To handle the formulated highly non-convex problem, we devise a two-layer alternative optimization framework to decompose it into several convex subproblems, each solvable optimally. Simulation results demonstrate the superiority of the active RIS in reducing gradient aggregation errors compared to its passive counterpart.
An Initialization Schema for Neuronal Networks on Tabular Data
- Authors: Authors: Wolfgang Fuhl
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2311.03996
- Pdf link: https://arxiv.org/pdf/2311.03996
- Abstract Nowadays, many modern applications require heterogeneous tabular data, which is still a challenging task in terms of regression and classification. Many approaches have been proposed to adapt neural networks for this task, but still, boosting and bagging of decision trees are the best-performing methods for this task. In this paper, we show that a binomial initialized neural network can be used effectively on tabular data. The proposed approach shows a simple but effective approach for initializing the first hidden layer in neural networks. We also show that this initializing schema can be used to jointly train ensembles by adding gradient masking to batch entries and using the binomial initialization for the last layer in a neural network. For this purpose, we modified the hinge binary loss and the soft max loss to make them applicable for joint ensemble training. We evaluate our approach on multiple public datasets and showcase the improved performance compared to other neural network-based approaches. In addition, we discuss the limitations and possible further research of our approach for improving the applicability of neural networks to tabular data. Link: https://es-cloud.cs.uni-tuebingen.de/d/8e2ab8c3fdd444e1a135/?p=%2FInitializationNeuronalNetworksTabularData&mode=list
Over-the-Air Computation Empowered Federated Learning: A Joint Uplink-Downlink Design
- Authors: Authors: Deyou Zhang, Ming Xiao, Mikael Skoglund
- Subjects: Information Theory (cs.IT)
- Arxiv link: https://arxiv.org/abs/2311.04059
- Pdf link: https://arxiv.org/pdf/2311.04059
- Abstract In this paper, we investigate the communication designs of over-the-air computation (AirComp) empowered federated learning (FL) systems considering uplink model aggregation and downlink model dissemination jointly. We first derive an upper bound on the expected difference between the training loss and the optimal loss, which reveals that optimizing the FL performance is equivalent to minimizing the distortion in the received global gradient vector at each edge node. As such, we jointly optimize each edge node transmit and receive equalization coefficients along with the edge server forwarding matrix to minimize the maximum gradient distortion across all edge nodes. We further utilize the MNIST dataset to evaluate the performance of the considered FL system in the context of the handwritten digit recognition task. Experiment results show that deploying multiple antennas at the edge server significantly reduces the distortion in the received global gradient vector, leading to a notable improvement in recognition accuracy compared to the single antenna case.
Time-Efficient Reinforcement Learning with Stochastic Stateful Policies
- Authors: Authors: Firas Al-Hafez, Guoping Zhao, Jan Peters, Davide Tateo
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2311.04082
- Pdf link: https://arxiv.org/pdf/2311.04082
- Abstract Stateful policies play an important role in reinforcement learning, such as handling partially observable environments, enhancing robustness, or imposing an inductive bias directly into the policy structure. The conventional method for training stateful policies is Backpropagation Through Time (BPTT), which comes with significant drawbacks, such as slow training due to sequential gradient propagation and the occurrence of vanishing or exploding gradients. The gradient is often truncated to address these issues, resulting in a biased policy update. We present a novel approach for training stateful policies by decomposing the latter into a stochastic internal state kernel and a stateless policy, jointly optimized by following the stateful policy gradient. We introduce different versions of the stateful policy gradient theorem, enabling us to easily instantiate stateful variants of popular reinforcement learning and imitation learning algorithms. Furthermore, we provide a theoretical analysis of our new gradient estimator and compare it with BPTT. We evaluate our approach on complex continuous control tasks, e.g., humanoid locomotion, and demonstrate that our gradient estimator scales effectively with task complexity while offering a faster and simpler alternative to BPTT.
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization
- Authors: Authors: Elan Rosenfeld, Andrej Risteski
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2311.04163
- Pdf link: https://arxiv.org/pdf/2311.04163
- Abstract We identify a new phenomenon in neural network optimization which arises from the interaction of depth and a particular heavy-tailed structure in natural data. Our result offers intuitive explanations for several previously reported observations about network training dynamics. In particular, it implies a conceptually new cause for progressive sharpening and the edge of stability; we also highlight connections to other concepts in optimization and generalization including grokking, simplicity bias, and Sharpness-Aware Minimization. Experimentally, we demonstrate the significant influence of paired groups of outliers in the training data with strong opposing signals: consistent, large magnitude features which dominate the network output throughout training and provide gradients which point in opposite directions. Due to these outliers, early optimization enters a narrow valley which carefully balances the opposing groups; subsequent sharpening causes their loss to rise rapidly, oscillating between high on one group and then the other, until the overall loss spikes. We describe how to identify these groups, explore what sets them apart, and carefully study their effect on the network's optimization and behavior. We complement these experiments with a mechanistic explanation on a toy example of opposing signals and a theoretical analysis of a two-layer linear network on a simple model. Our finding enables new qualitative predictions of training behavior which we confirm experimentally. It also provides a new lens through which to study and improve modern training practices for stochastic optimization, which we highlight via a case study of Adam versus SGD.
Models towards Risk Behavior Prediction and Analysis: A Netherlands Case study
- Authors: Authors: Onaopepo Adekunle, Arno Riedl, Michel Dumontier
- Subjects: Computational Engineering, Finance, and Science (cs.CE)
- Arxiv link: https://arxiv.org/abs/2311.04164
- Pdf link: https://arxiv.org/pdf/2311.04164
- Abstract In many countries financial service providers have to elicit their customers risk preferences, when offering products and services. For instance, in the Netherlands pension funds will be legally obliged to factor in their clients risk preferences when devising their investment strategies. Therefore, assessing and measuring the risk preferences of individuals is critical for the analysis of individuals' behavior and policy prescriptions. In the psychology and economics, a number of methods to elicit risk preferences have been developed using hypothetical scenarios and economic experiments. These methods of eliciting individual risk preferences are usually applied to small samples because they are expensive and the implementation can be complex and not suitable when large cohorts need to be measured. A large number of supervised learning models ranging from linear regression to support vector machines are used to predict risk preference measures using socio-economic register data such as age, gender, migration background and other demographic variables in combination with data on income, wealth, pension fund contributions, and other financial data. The employed machine learning models cover a range of assumptions and properties as well as a diverse set of regression metrics. The optimum model is selected using the metrics and interpretability of the model. The optimal models are lasso regression and gradient boosting machines with mean average percentage error of about 30%. This is important as it helps to estimate risk attitudes without actually measuring them. It should be noted that with the current accuracy the tested models are not ready for deployment for applications that require high accuracy. However, the results do indicate which models should be used in situations that do not require the most accurate predictions such as augmentation data for pensions' recommendation.
Deep Hashing via Householder Quantization
- Authors: Authors: Lucas R. Schwengber, Lucas Resende, Paulo Orenstein, Roberto I. Oliveira
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
- Arxiv link: https://arxiv.org/abs/2311.04207
- Pdf link: https://arxiv.org/pdf/2311.04207
- Abstract Hashing is at the heart of large-scale image similarity search, and recent methods have been substantially improved through deep learning techniques. Such algorithms typically learn continuous embeddings of the data. To avoid a subsequent costly binarization step, a common solution is to employ loss functions that combine a similarity learning term (to ensure similar images are grouped to nearby embeddings) and a quantization penalty term (to ensure that the embedding entries are close to binarized entries, e.g., -1 or 1). Still, the interaction between these two terms can make learning harder and the embeddings worse. We propose an alternative quantization strategy that decomposes the learning problem in two stages: first, perform similarity learning over the embedding space with no quantization; second, find an optimal orthogonal transformation of the embeddings so each coordinate of the embedding is close to its sign, and then quantize the transformed embedding through the sign function. In the second step, we parametrize orthogonal transformations using Householder matrices to efficiently leverage stochastic gradient descent. Since similarity measures are usually invariant under orthogonal transformations, this quantization strategy comes at no cost in terms of performance. The resulting algorithm is unsupervised, fast, hyperparameter-free and can be run on top of any existing deep hashing or metric learning algorithm. We provide extensive experimental results showing that this approach leads to state-of-the-art performance on widely used image datasets, and, unlike other quantization strategies, brings consistent improvements in performance to existing deep hashing algorithms.
Keyword: super-resolution
Predictive Sampling for Efficient Pairwise Subjective Image Quality Assessment
- Authors: Authors: Shima Mohammadi, João Ascenso
- Subjects: Multimedia (cs.MM)
- Arxiv link: https://arxiv.org/abs/2311.03850
- Pdf link: https://arxiv.org/pdf/2311.03850
- Abstract Subjective image quality assessment studies are used in many scenarios, such as the evaluation of compression, super-resolution, and denoising solutions. Among the available subjective test methodologies, pair comparison is attracting popularity due to its simplicity, reliability, and robustness to changes in the test conditions, e.g. display resolutions. The main problem that impairs its wide acceptance is that the number of pairs to compare by subjects grows quadratically with the number of stimuli that must be considered. Usually, the paired comparison data obtained is fed into an aggregation model to obtain a final score for each degraded image and thus, not every comparison contributes equally to the final quality score. In the past years, several solutions that sample pairs (from all possible combinations) have been proposed, from random sampling to active sampling based on the past subjects' decisions. This paper introduces a novel sampling solution called \textbf{P}redictive \textbf{S}ampling for \textbf{P}airwise \textbf{C}omparison (PS-PC) which exploits the characteristics of the input data to make a prediction of which pairs should be evaluated by subjects. The proposed solution exploits popular machine learning techniques to select the most informative pairs for subjects to evaluate, while for the other remaining pairs, it predicts the subjects' preferences. The experimental results show that PS-PC is the best choice among the available sampling algorithms with higher performance for the same number of pairs. Moreover, since the choice of the pairs is done \emph{a priori} before the subjective test starts, the algorithm is not required to run during the test and thus much more simple to deploy in online crowdsourcing subjective tests.