arxiv-updates
arxiv-updates copied to clipboard
New submissions for Tue, 17 Oct 23
Keyword: sgd
KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training
- Authors: Authors: Truong Thao Nguyen, Balazs Gerofi, Edgar Josafat Martinez-Noriega, François Trahay, Mohamed Wahib
- Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2310.10102
- Pdf link: https://arxiv.org/pdf/2310.10102
- Abstract This paper proposes a method for hiding the least-important samples during the training of deep neural networks to increase efficiency, i.e., to reduce the cost of training. Using information about the loss and prediction confidence during training, we adaptively find samples to exclude in a given epoch based on their contribution to the overall learning process, without significantly degrading accuracy. We explore the converge properties when accounting for the reduction in the number of SGD updates. Empirical results on various large-scale datasets and models used directly in image classification and segmentation show that while the with-replacement importance sampling algorithm performs poorly on large datasets, our method can reduce total training time by up to 22% impacting accuracy only by 0.4% compared to the baseline. Code available at https://github.com/TruongThaoNguyen/kakurenbo
A Comprehensive Study of Privacy Risks in Curriculum Learning
- Authors: Authors: Joann Qiongna Chen, Xinlei He, Zheng Li, Yang Zhang, Zhou Li
- Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2310.10124
- Pdf link: https://arxiv.org/pdf/2310.10124
- Abstract Training a machine learning model with data following a meaningful order, i.e., from easy to hard, has been proven to be effective in accelerating the training process and achieving better model performance. The key enabling technique is curriculum learning (CL), which has seen great success and has been deployed in areas like image and text classification. Yet, how CL affects the privacy of machine learning is unclear. Given that CL changes the way a model memorizes the training data, its influence on data privacy needs to be thoroughly evaluated. To fill this knowledge gap, we perform the first study and leverage membership inference attack (MIA) and attribute inference attack (AIA) as two vectors to quantify the privacy leakage caused by CL. Our evaluation of nine real-world datasets with attack methods (NN-based, metric-based, label-only MIA, and NN-based AIA) revealed new insights about CL. First, MIA becomes slightly more effective when CL is applied, but the impact is much more prominent to a subset of training samples ranked as difficult. Second, a model trained under CL is less vulnerable under AIA, compared to MIA. Third, the existing defense techniques like DP-SGD, MemGuard, and MixupMMD are still effective under CL, though DP-SGD has a significant impact on target model accuracy. Finally, based on our insights into CL, we propose a new MIA, termed Diff-Cali, which exploits the difficulty scores for result calibration and is demonstrated to be effective against all CL methods and the normal training method. With this study, we hope to draw the community's attention to the unintended privacy risks of emerging machine-learning techniques and develop new attack benchmarks and defense solutions.
Block-missing data in linear systems: An unbiased stochastic gradient descent approach
- Authors: Authors: Chelsea Huynh, Anna Ma, Michael Strand
- Subjects: Numerical Analysis (math.NA)
- Arxiv link: https://arxiv.org/abs/2310.10147
- Pdf link: https://arxiv.org/pdf/2310.10147
- Abstract Achieving accurate approximations to solutions of large linear systems is crucial, especially when those systems utilize real-world data. A consequence of using real-world data is that there will inevitably be missingness. Current approaches for dealing with missing data, such as deletion and imputation, can introduce bias. Recent studies proposed an adaptation of stochastic gradient descent (SGD) in specific missing-data models. In this work, we propose a new algorithm, $\ell$-tuple mSGD, for the setting in which data is missing in a block-wise, tuple pattern. We prove that our proposed method uses unbiased estimates of the gradient of the least squares objective in the presence of tuple missing data. We also draw connections between $\ell$-tuple mSGD and previously established SGD-type methods for missing data. Furthermore, we prove our algorithm converges when using updating step sizes and empirically demonstrate the convergence of $\ell$-tuple mSGD on synthetic data. Lastly, we evaluate $\ell$-tuple mSGD applied to real-world continuous glucose monitoring (CGM) device data.
Contextual Data Augmentation for Task-Oriented Dialog Systems
- Authors: Authors: Dustin Axman, Avik Ray, Shubham Garg, Jing Huang
- Subjects: Computation and Language (cs.CL)
- Arxiv link: https://arxiv.org/abs/2310.10380
- Pdf link: https://arxiv.org/pdf/2310.10380
- Abstract Collection of annotated dialogs for training task-oriented dialog systems have been one of the key bottlenecks in improving current models. While dialog response generation has been widely studied on the agent side, it is not evident if similar generative models can be used to generate a large variety of, and often unexpected, user inputs that real dialog systems encounter in practice. Existing data augmentation techniques such as paraphrase generation do not take the dialog context into consideration. In this paper, we develop a novel dialog augmentation model that generates a user turn, conditioning on full dialog context. Additionally, with a new prompt design for language model, and output re-ranking, the dialogs generated from our model can be directly used to train downstream dialog systems. On common benchmark datasets MultiWoZ and SGD, we show that our dialog augmentation model generates high quality dialogs and improves dialog success rate by as much as $8%$ over baseline.
Keyword: optimization
Digital Twin Assisted Deep Reinforcement Learning for Online Optimization of Network Slicing Admission Control
- Authors: Authors: Zhenyu Tao, Wei Xu, Xiaohu You
- Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2310.09299
- Pdf link: https://arxiv.org/pdf/2310.09299
- Abstract The proliferation of diverse network services in 5G and beyond networks has led to the emergence of network slicing technologies. Among these, admission control plays a crucial role in achieving specific optimization goals through the selective acceptance of service requests. Although Deep Reinforcement Learning (DRL) forms the foundation in many admission control approaches for its effectiveness and flexibility, the initial instability of DRL models hinders their practical deployment in real-world networks. In this work, we propose a digital twin (DT) assisted DRL solution to address this issue. Specifically, we first formulate the admission decision-making process as a semi-Markov decision process, which is subsequently simplified into an equivalent discrete-time Markov decision process to facilitate the implementation of DRL methods. The DT is established through supervised learning and employed to assist the training phase of the DRL model. Extensive simulations show that the DT-assisted DRL model increased resource utilization by over 40% compared to the directly trained state-of-the-art Dueling-DQN and over 20% compared to our directly trained DRL model during initial training. This improvement is achieved while preserving the model's capacity to optimize the long-term rewards.
Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task
- Authors: Authors: Maya Okawa, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2310.09336
- Pdf link: https://arxiv.org/pdf/2310.09336
- Abstract Modern generative models exhibit unprecedented capabilities to generate extremely realistic data. However, given the inherent compositionality of the real world, reliable use of these models in practical applications requires that they exhibit the capability to compose a novel set of concepts to generate outputs not seen in the training data set. Prior work demonstrates that recent diffusion models do exhibit intriguing compositional generalization abilities, but also fail unpredictably. Motivated by this, we perform a controlled study for understanding compositional generalization in conditional diffusion models in a synthetic setting, varying different attributes of the training data and measuring the model's ability to generate samples out-of-distribution. Our results show: (i) the order in which the ability to generate samples from a concept and compose them emerges is governed by the structure of the underlying data-generating process; (ii) performance on compositional tasks exhibits a sudden ``emergence'' due to multiplicative reliance on the performance of constituent tasks, partially explaining emergent phenomena seen in generative models; and (iii) composing concepts with lower frequency in the training data to generate out-of-distribution samples requires considerably more optimization steps compared to generating in-distribution samples. Overall, our study lays a foundation for understanding capabilities and compositionality in generative models from a data-centric perspective.
Medial Skeletal Diagram: A Generalized Medial Axis Approach for Compact 3D Shape Representation
- Authors: Authors: Minghao Guo, Bohan Wang, Wojciech Matusik
- Subjects: Graphics (cs.GR)
- Arxiv link: https://arxiv.org/abs/2310.09395
- Pdf link: https://arxiv.org/pdf/2310.09395
- Abstract We propose the Medial Skeletal Diagram, a novel skeletal representation that tackles the prevailing issues around compactness and reconstruction accuracy in existing skeletal representations. Our approach augments the continuous elements in the medial axis representation to effectively shift the complexity away from discrete elements. To that end, we introduce generalized enveloping primitives, an enhancement of the standard primitives in medial axis, which ensures efficient coverage of intricate local features of the input shape and substantially reduces the number of discrete elements required. Moreover, we present a computational framework that constructs a medial skeletal diagram from an arbitrary closed manifold mesh. Our optimization pipeline ensures that the resulting medial skeletal diagram comprehensively covers the input shape with the fewest primitives. Additionally, each optimized primitive undergoes a post-refinement process to guarantee an accurate match with the source mesh in both geometry and tessellation. We validate our approach on a comprehensive benchmark of 100 shapes, demonstrating its compactness of the discrete elements and superior reconstruction accuracy across a variety of cases. Furthermore, we exemplify the versatility of our representation in downstream applications such as shape optimization, shape generation, mesh decomposition, mesh alignment, mesh compression, and user-interactive design.
Hybrid Reinforcement Learning for Optimizing Pump Sustainability in Real-World Water Distribution Networks
- Authors: Authors: Harsh Patel, Yuan Zhou, Alexander P Lamb, Shu Wang, Jieliang Luo
- Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2310.09412
- Pdf link: https://arxiv.org/pdf/2310.09412
- Abstract This article addresses the pump-scheduling optimization problem to enhance real-time control of real-world water distribution networks (WDNs). Our primary objectives are to adhere to physical operational constraints while reducing energy consumption and operational costs. Traditional optimization techniques, such as evolution-based and genetic algorithms, often fall short due to their lack of convergence guarantees. Conversely, reinforcement learning (RL) stands out for its adaptability to uncertainties and reduced inference time, enabling real-time responsiveness. However, the effective implementation of RL is contingent on building accurate simulation models for WDNs, and prior applications have been limited by errors in simulation training data. These errors can potentially cause the RL agent to learn misleading patterns and actions and recommend suboptimal operational strategies. To overcome these challenges, we present an improved "hybrid RL" methodology. This method integrates the benefits of RL while anchoring it in historical data, which serves as a baseline to incrementally introduce optimal control recommendations. By leveraging operational data as a foundation for the agent's actions, we enhance the explainability of the agent's actions, foster more robust recommendations, and minimize error. Our findings demonstrate that the hybrid RL agent can significantly improve sustainability, operational efficiency, and dynamically adapt to emerging scenarios in real-world WDNs.
Target Variable Engineering
- Authors: Authors: Jessica Clark
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2310.09440
- Pdf link: https://arxiv.org/pdf/2310.09440
- Abstract How does the formulation of a target variable affect performance within the ML pipeline? The experiments in this study examine numeric targets that have been binarized by comparing against a threshold. We compare the predictive performance of regression models trained to predict the numeric targets vs. classifiers trained to predict their binarized counterparts. Specifically, we make this comparison at every point of a randomized hyperparameter optimization search to understand the effect of computational resource budget on the tradeoff between the two. We find that regression requires significantly more computational effort to converge upon the optimal performance, and is more sensitive to both randomness and heuristic choices in the training process. Although classification can and does benefit from systematic hyperparameter tuning and model selection, the improvements are much less than for regression. This work comprises the first systematic comparison of regression and classification within the framework of computational resource requirements. Our findings contribute to calls for greater replicability and efficiency within the ML pipeline for the sake of building more sustainable and robust AI systems.
Tackling Heterogeneity in Medical Federated learning via Vision Transformers
- Authors: Authors: Erfan Darzi, Yiqing Shen, Nanna M. Sijtsema, P.M.A van Ooijen
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2310.09444
- Pdf link: https://arxiv.org/pdf/2310.09444
- Abstract Optimization-based regularization methods have been effective in addressing the challenges posed by data heterogeneity in medical federated learning, particularly in improving the performance of underrepresented clients. However, these methods often lead to lower overall model accuracy and slower convergence rates. In this paper, we demonstrate that using Vision Transformers can substantially improve the performance of underrepresented clients without a significant trade-off in overall accuracy. This improvement is attributed to the Vision transformer's ability to capture long-range dependencies within the input data.
JM3D & JM3D-LLM: Elevating 3D Representation with Joint Multi-modal Cues
- Authors: Authors: Jiayi Ji, Haowei Wang, Changli Wu, Yiwei Ma, Xiaoshuai Sun, Rongrong Ji
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2310.09503
- Pdf link: https://arxiv.org/pdf/2310.09503
- Abstract The rising importance of 3D representation learning, pivotal in computer vision, autonomous driving, and robotics, is evident. However, a prevailing trend, which straightforwardly resorted to transferring 2D alignment strategies to the 3D domain, encounters three distinct challenges: (1) Information Degradation: This arises from the alignment of 3D data with mere single-view 2D images and generic texts, neglecting the need for multi-view images and detailed subcategory texts. (2) Insufficient Synergy: These strategies align 3D representations to image and text features individually, hampering the overall optimization for 3D models. (3) Underutilization: The fine-grained information inherent in the learned representations is often not fully exploited, indicating a potential loss in detail. To address these issues, we introduce JM3D, a comprehensive approach integrating point cloud, text, and image. Key contributions include the Structured Multimodal Organizer (SMO), enriching vision-language representation with multiple views and hierarchical text, and the Joint Multi-modal Alignment (JMA), combining language understanding with visual representation. Our advanced model, JM3D-LLM, marries 3D representation with large language models via efficient fine-tuning. Evaluations on ModelNet40 and ScanObjectNN establish JM3D's superiority. The superior performance of JM3D-LLM further underscores the effectiveness of our representation transfer approach. Our code and models are available at https://github.com/Mr-Neko/JM3D.
Advancing Test-Time Adaptation for Acoustic Foundation Models in Open-World Shifts
- Authors: Authors: Hongfu Liu, Hengguan Huang, Ye Wang
- Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- Arxiv link: https://arxiv.org/abs/2310.09505
- Pdf link: https://arxiv.org/pdf/2310.09505
- Abstract Test-Time Adaptation (TTA) is a critical paradigm for tackling distribution shifts during inference, especially in visual recognition tasks. However, while acoustic models face similar challenges due to distribution shifts in test-time speech, TTA techniques specifically designed for acoustic modeling in the context of open-world data shifts remain scarce. This gap is further exacerbated when considering the unique characteristics of acoustic foundation models: 1) they are primarily built on transformer architectures with layer normalization and 2) they deal with test-time speech data of varying lengths in a non-stationary manner. These aspects make the direct application of vision-focused TTA methods, which are mostly reliant on batch normalization and assume independent samples, infeasible. In this paper, we delve into TTA for pre-trained acoustic models facing open-world data shifts. We find that noisy, high-entropy speech frames, often non-silent, carry key semantic content. Traditional TTA methods might inadvertently filter out this information using potentially flawed heuristics. In response, we introduce a heuristic-free, learning-based adaptation enriched by confidence enhancement. Noting that speech signals' short-term consistency, we also apply consistency regularization during test-time optimization. Our experiments on synthetic and real-world datasets affirm our method's superiority over existing baselines.
Online Parameter Identification of Generalized Non-cooperative Game
- Authors: Authors: Jianguo Chen, Jinlong Lei, Hongsheng Qi, Yiguang Hong
- Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2310.09511
- Pdf link: https://arxiv.org/pdf/2310.09511
- Abstract This work studies the parameter identification problem of a generalized non-cooperative game, where each player's cost function is influenced by an observable signal and some unknown parameters. We consider the scenario where equilibrium of the game at some observable signals can be observed with noises, whereas our goal is to identify the unknown parameters with the observed data. Assuming that the observable signals and the corresponding noise-corrupted equilibriums are acquired sequentially, we construct this parameter identification problem as online optimization and introduce a novel online parameter identification algorithm. To be specific, we construct a regularized loss function that balances conservativeness and correctiveness, where the conservativeness term ensures that the new estimates do not deviate significantly from the current estimates, while the correctiveness term is captured by the Karush-Kuhn-Tucker conditions. We then prove that when the players' cost functions are linear with respect to the unknown parameters and the learning rate of the online parameter identification algorithm satisfies \mu_k \propto 1/\sqrt{k}, along with other assumptions, the regret bound of the proposed algorithm is O(\sqrt{K}). Finally, we conduct numerical simulations on a Nash-Cournot problem to demonstrate that the performance of the online identification algorithm is comparable to that of the offline setting.
Hypernetwork-based Meta-Learning for Low-Rank Physics-Informed Neural Networks
- Authors: Authors: Woojin Cho, Kookjin Lee, Donsub Rim, Noseong Park
- Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)
- Arxiv link: https://arxiv.org/abs/2310.09528
- Pdf link: https://arxiv.org/pdf/2310.09528
- Abstract In various engineering and applied science applications, repetitive numerical simulations of partial differential equations (PDEs) for varying input parameters are often required (e.g., aircraft shape optimization over many design parameters) and solvers are required to perform rapid execution. In this study, we suggest a path that potentially opens up a possibility for physics-informed neural networks (PINNs), emerging deep-learning-based solvers, to be considered as one such solver. Although PINNs have pioneered a proper integration of deep-learning and scientific computing, they require repetitive time-consuming training of neural networks, which is not suitable for many-query scenarios. To address this issue, we propose a lightweight low-rank PINNs containing only hundreds of model parameters and an associated hypernetwork-based meta-learning algorithm, which allows efficient approximation of solutions of PDEs for varying ranges of PDE input parameters. Moreover, we show that the proposed method is effective in overcoming a challenging issue, known as "failure modes" of PINNs.
Automatic alignment in cone-beam tomography via fan-beam symmetry and variable projection
- Authors: Authors: Patricio Guerrero, Simon Bellens, Ricardo Santander, Wim Dewulf
- Subjects: Numerical Analysis (math.NA)
- Arxiv link: https://arxiv.org/abs/2310.09567
- Pdf link: https://arxiv.org/pdf/2310.09567
- Abstract This work is concerned with cone-beam computed tomography with circular source trajectory, where the reconstruction inverse problem requires an accurate knowledge of source, detector and rotational axis relative positions and orientations. We address this problem as a preceding step of the reconstruction process directly from the acquired projections. The method estimates both the detector shift (orthogonal to focal and rotational axes) and the in-plane detector rotation, relative to source and rotational axis. The obtained algorithm is based on a fan-beam symmetry condition and the variable projection optimization approach with a low computational cost. Therefore, the alignment problem for fan-beam tomography is addressed as well. The methods are validated with simulated and real industrial tomographic data with code examples available for both fan- and cone-beam geometries.
Wafer-scale Computing: Advancements, Challenges, and Future Perspectives
- Authors: Authors: Yang Hu, Xinhan Lin, Huizheng Wang, Zhen He, Xingmao Yu, Jiahao Zhang, Qize Yang, Zheng Xu, Sihan Guan, Jiahao Fang, Haoran Shang, Xinru Tang, Xu Dai, Shaojun Wei, Shouyi Yin
- Subjects: Hardware Architecture (cs.AR)
- Arxiv link: https://arxiv.org/abs/2310.09568
- Pdf link: https://arxiv.org/pdf/2310.09568
- Abstract Nowadays, artificial intelligence (AI) technology with large models plays an increasingly important role in both academia and industry. It also brings a rapidly increasing demand for the computing power of the hardware. As the computing demand for AI continues to grow, the growth of hardware computing power has failed to keep up. This has become a significant factor restricting the development of AI. The augmentation of hardware computing power is mainly propelled by the escalation of transistor density and chip area. However, the former is impeded by the termination of the Moore's Law and Dennard scaling, and the latter is significantly restricted by the challenge of disrupting the legacy fabrication equipment and process. In recent years, advanced packaging technologies that have gradually matured are increasingly used to implement bigger chips that integrate multiple chiplets, while still providing interconnections with chip-level density and bandwidth. Compared to conventional high-performance computing paradigms such as multi-accelerator and datacenter-scale computing, Wafer-scale Computing shows remarkable advantages in communication bandwidth, integration density, and programmability potential. Not surprisingly, disruptive Wafer-scale Computing also brings unprecedented design challenges for hardware architecture, design-system-technology co-optimization, power and cooling systems, and compiler tool chain. At present, there are no comprehensive surveys summarizing the current state and design insights of Wafer-scale Computing. This paper aims to take the first step to help academia and industry review existing wafer-scale chips and essential technologies in a one-stop manner. So that people can conveniently grasp the basic knowledge and key points, understand the achievements and shortcomings of existing research, and contribute to this promising research direction.
Reduced Policy Optimization for Continuous Control with Hard Constraints
- Authors: Authors: Shutong Ding, Jingya Wang, Yali Du, Ye Shi
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2310.09574
- Pdf link: https://arxiv.org/pdf/2310.09574
- Abstract Recent advances in constrained reinforcement learning (RL) have endowed reinforcement learning with certain safety guarantees. However, deploying existing constrained RL algorithms in continuous control tasks with general hard constraints remains challenging, particularly in those situations with non-convex hard constraints. Inspired by the generalized reduced gradient (GRG) algorithm, a classical constrained optimization technique, we propose a reduced policy optimization (RPO) algorithm that combines RL with GRG to address general hard constraints. RPO partitions actions into basic actions and nonbasic actions following the GRG method and outputs the basic actions via a policy network. Subsequently, RPO calculates the nonbasic actions by solving equations based on equality constraints using the obtained basic actions. The policy network is then updated by implicitly differentiating nonbasic actions with respect to basic actions. Additionally, we introduce an action projection procedure based on the reduced gradient and apply a modified Lagrangian relaxation technique to ensure inequality constraints are satisfied. To the best of our knowledge, RPO is the first attempt that introduces GRG to RL as a way of efficiently handling both equality and inequality hard constraints. It is worth noting that there is currently a lack of RL environments with complex hard constraints, which motivates us to develop three new benchmarks: two robotics manipulation tasks and a smart grid operation control task. With these benchmarks, RPO achieves better performance than previous constrained RL algorithms in terms of both cumulative reward and constraint violation. We believe RPO, along with the new benchmarks, will open up new opportunities for applying RL to real-world problems with complex constraints.
Where to Decide? Centralized vs. Distributed Vehicle Assignment for Platoon Formation
- Authors: Authors: Julian Heinovski, Falko Dressler
- Subjects: Multiagent Systems (cs.MA)
- Arxiv link: https://arxiv.org/abs/2310.09580
- Pdf link: https://arxiv.org/pdf/2310.09580
- Abstract Platooning is a promising cooperative driving application for future intelligent transportation systems. In order to assign vehicles to platoons, some algorithm for platoon formation is required. Such vehicle-to-platoon assignments have to be computed on-demand, e.g., when vehicles join or leave the freeways. In order to get best results from platooning, individual properties of involved vehicles have to be considered during the assignment computation. In this paper, we explore the computation of vehicle-to-platoon assignments as an optimization problem based on similarity between vehicles. We define the similarity and, vice versa, the deviation among vehicles based on the desired driving speed of vehicles and their position on the road. We create three approaches to solve this assignment problem: centralized solver, centralized greedy, and distributed greedy, using a Mixed Integer Programming solver and greedy heuristics, respectively. Conceptually, the approaches differ in both knowledge about vehicles as well as methodology. We perform a large-scale simulation study using PlaFoSim to compare all approaches. While the distributed greedy approach seems to have disadvantages due to the limited local knowledge, it performs as good as the centralized solver approach across most metrics. Both outperform the centralized greedy approach, which suffers from synchronization and greedy selection effects.Since the centralized solver approach assumes global knowledge and requires a complex Mixed Integer Programming solver to compute vehicle-to-platoon assignments, we consider the distributed greedy approach to have the best performance among all presented approaches.
DPZero: Dimension-Independent and Differentially Private Zeroth-Order Optimization
- Authors: Authors: Liang Zhang, Kiran Koshy Thekumparampil, Sewoong Oh, Niao He
- Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Optimization and Control (math.OC); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2310.09639
- Pdf link: https://arxiv.org/pdf/2310.09639
- Abstract The widespread practice of fine-tuning pretrained large language models (LLMs) on domain-specific data faces two major challenges in memory and privacy. First, as the size of LLMs continue to grow, encompassing billions of parameters, the memory demands of gradient-based training methods via backpropagation become prohibitively high. Second, given the tendency of LLMs to memorize and disclose sensitive training data, the privacy of fine-tuning data must be respected. To this end, we explore the potential of zeroth-order methods in differentially private optimization for fine-tuning LLMs. Zeroth-order methods, which rely solely on forward passes, substantially reduce memory consumption during training. However, directly combining them with standard differential privacy mechanism poses dimension-dependent complexity. To bridge the gap, we introduce DPZero, a novel differentially private zeroth-order algorithm with nearly dimension-independent rates. Our theoretical analysis reveals that its complexity hinges primarily on the problem's intrinsic dimension and exhibits only a logarithmic dependence on the ambient dimension. This renders DPZero a highly practical option for real-world LLMs deployments.
Enhancing Column Generation by Reinforcement Learning-Based Hyper-Heuristic for Vehicle Routing and Scheduling Problems
- Authors: Authors: Kuan Xu, Li Shen, Lindong Liu
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2310.09686
- Pdf link: https://arxiv.org/pdf/2310.09686
- Abstract Column generation (CG) is a vital method to solve large-scale problems by dynamically generating variables. It has extensive applications in common combinatorial optimization, such as vehicle routing and scheduling problems, where each iteration step requires solving an NP-hard constrained shortest path problem. Although some heuristic methods for acceleration already exist, they are not versatile enough to solve different problems. In this work, we propose a reinforcement learning-based hyper-heuristic framework, dubbed RLHH, to enhance the performance of CG. RLHH is a selection module embedded in CG to accelerate convergence and get better integer solutions. In each CG iteration, the RL agent selects a low-level heuristic to construct a reduced network only containing the edges with a greater chance of being part of the optimal solution. In addition, we specify RLHH to solve two typical combinatorial optimization problems: Vehicle Routing Problem with Time Windows (VRPTW) and Bus Driver Scheduling Problem (BDSP). The total cost can be reduced by up to 27.9% in VRPTW and 15.4% in BDSP compared to the best lower-level heuristic in our tested scenarios, within equivalent or even less computational time. The proposed RLHH is the first RL-based CG method that outperforms traditional approaches in terms of solution quality, which can promote the application of CG in combinatorial optimization.
Solving Max-Min Fair Resource Allocations Quickly on Large Graphs
- Authors: Authors: Pooria Namyar, Behnaz Arzani, Srikanth Kandula, Santiago Segarra, Daniel Crankshaw, Umesh Krishnaswamy, Ramesh Govindan, Himanshu Raj
- Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)
- Arxiv link: https://arxiv.org/abs/2310.09699
- Pdf link: https://arxiv.org/pdf/2310.09699
- Abstract We consider the max-min fair resource allocation problem. The best-known solutions use either a sequence of optimizations or waterfilling, which only applies to a narrow set of cases. These solutions have become a practical bottleneck in WAN traffic engineering and cluster scheduling, especially at larger problem sizes. We improve both approaches: (1) we show how to convert the optimization sequence into a single fast optimization, and (2) we generalize waterfilling to the multi-path case. We empirically show our new algorithms Pareto-dominate prior techniques: they produce faster, fairer, and more efficient allocations. Some of our allocators also have theoretical guarantees: they trade off a bounded amount of unfairness for faster allocation. We have deployed our allocators in Azure's WAN traffic engineering pipeline, where we preserve solution quality and achieve a roughly $3\times$ speedup.
A generalization of the achievable rate of a MISO system using Bode-Fano wideband matching theory
- Authors: Authors: Nitish Deshpande, Miguel R. Castellanos, Saeed R. Khosravirad, Jinfeng Du, Harish Viswanathan, Robert W. Heath Jr
- Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2310.09723
- Pdf link: https://arxiv.org/pdf/2310.09723
- Abstract Impedance-matching networks affect power transfer from the radio frequency (RF) chains to the antennas. Their design impacts the signal to noise ratio (SNR) and the achievable rate. In this paper, we maximize the information-theoretic achievable rate of a multiple-input-single-output (MISO) system with wideband matching constraints. Using a multiport circuit theory approach with frequency-selective scattering parameters, we propose a general framework for optimizing the MISO achievable rate that incorporates Bode-Fano wideband matching theory. We express the solution to the achievable rate optimization problem in terms of the optimized transmission coefficient and the Lagrangian parameters corresponding to the Bode-Fano inequality constraints. We apply this framework to a single electric Chu's antenna and an array of two electric Chu's antennas. We compare the optimized achievable rate obtained numerically with other benchmarks like the ideal achievable rate computed by disregarding matching constraints and the achievable rate obtained by using sub-optimal matching strategies like conjugate matching and frequency-flat transmission. We also propose a practical methodology to approximate the achievable rate bound by using the optimal transmission coefficient to derive a physically realizable matching network through the ADS software.
Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer
- Authors: Authors: Boan Liu, Liang Ding, Li Shen, Keqin Peng, Yu Cao, Dazhao Cheng, Dacheng Tao
- Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2310.09762
- Pdf link: https://arxiv.org/pdf/2310.09762
- Abstract The Mixture of Experts (MoE) has emerged as a highly successful technique in deep learning, based on the principle of divide-and-conquer to maximize model capacity without significant additional computational cost. Even in the era of large-scale language models (LLMs), MoE continues to play a crucial role, as some researchers have indicated that GPT-4 adopts the MoE structure to ensure diverse inference results. However, MoE is susceptible to performance degeneracy, particularly evident in the issues of imbalance and homogeneous representation among experts. While previous studies have extensively addressed the problem of imbalance, the challenge of homogeneous representation remains unresolved. In this study, we shed light on the homogeneous representation problem, wherein experts in the MoE fail to specialize and lack diversity, leading to frustratingly high similarities in their representations (up to 99% in a well-performed MoE model). This problem restricts the expressive power of the MoE and, we argue, contradicts its original intention. To tackle this issue, we propose a straightforward yet highly effective solution: OMoE, an orthogonal expert optimizer. Additionally, we introduce an alternating training strategy that encourages each expert to update in a direction orthogonal to the subspace spanned by other experts. Our algorithm facilitates MoE training in two key ways: firstly, it explicitly enhances representation diversity, and secondly, it implicitly fosters interaction between experts during orthogonal weights computation. Through extensive experiments, we demonstrate that our proposed optimization algorithm significantly improves the performance of fine-tuning the MoE model on the GLUE benchmark, SuperGLUE benchmark, question-answering task, and name entity recognition tasks.
Worst-Case Analysis is Maximum-A-Posteriori Estimation
- Authors: Authors: Hongjun Wu, Di Wang
- Subjects: Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2310.09774
- Pdf link: https://arxiv.org/pdf/2310.09774
- Abstract The worst-case resource usage of a program can provide useful information for many software-engineering tasks, such as performance optimization and algorithmic-complexity-vulnerability discovery. This paper presents a generic, adaptive, and sound fuzzing framework, called DSE-SMC, for estimating worst-case resource usage. DSE-SMC is generic because it is black-box as long as the user provides an interface for retrieving resource-usage information on a given input; adaptive because it automatically balances between exploration and exploitation of candidate inputs; and sound because it is guaranteed to converge to the true resource-usage distribution of the analyzed program. DSE-SMC is built upon a key observation: resource accumulation in a program is isomorphic to the soft-conditioning mechanism in Bayesian probabilistic programming; thus, worst-case resource analysis is isomorphic to the maximum-a-posteriori-estimation problem of Bayesian statistics. DSE-SMC incorporates sequential Monte Carlo (SMC) -- a generic framework for Bayesian inference -- with adaptive evolutionary fuzzing algorithms, in a sound manner, i.e., DSE-SMC asymptotically converges to the posterior distribution induced by resource-usage behavior of the analyzed program. Experimental evaluation on Java applications demonstrates that DSE-SMC is significantly more effective than existing black-box fuzzing methods for worst-case analysis.
CBARF: Cascaded Bundle-Adjusting Neural Radiance Fields from Imperfect Camera Poses
- Authors: Authors: Hongyu Fu, Xin Yu, Lincheng Li, Li Zhang
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2310.09776
- Pdf link: https://arxiv.org/pdf/2310.09776
- Abstract Existing volumetric neural rendering techniques, such as Neural Radiance Fields (NeRF), face limitations in synthesizing high-quality novel views when the camera poses of input images are imperfect. To address this issue, we propose a novel 3D reconstruction framework that enables simultaneous optimization of camera poses, dubbed CBARF (Cascaded Bundle-Adjusting NeRF).In a nutshell, our framework optimizes camera poses in a coarse-to-fine manner and then reconstructs scenes based on the rectified poses. It is observed that the initialization of camera poses has a significant impact on the performance of bundle-adjustment (BA). Therefore, we cascade multiple BA modules at different scales to progressively improve the camera poses. Meanwhile, we develop a neighbor-replacement strategy to further optimize the results of BA in each stage. In this step, we introduce a novel criterion to effectively identify poorly estimated camera poses. Then we replace them with the poses of neighboring cameras, thus further eliminating the impact of inaccurate camera poses. Once camera poses have been optimized, we employ a density voxel grid to generate high-quality 3D reconstructed scenes and images in novel views. Experimental results demonstrate that our CBARF model achieves state-of-the-art performance in both pose optimization and novel view synthesis, especially in the existence of large camera pose noise.
Auto-LfD: Towards Closing the Loop for Learning from Demonstrations
- Authors: Authors: Shaokang Wu, Yijin Wang, Yanlong Huang
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2310.09791
- Pdf link: https://arxiv.org/pdf/2310.09791
- Abstract Over the past few years, there have been numerous works towards advancing the generalization capability of robots, among which learning from demonstrations (LfD) has drawn much attention by virtue of its user-friendly and data-efficient nature. While many LfD solutions have been reported, a key question has not been properly addressed: how can we evaluate the generalization performance of LfD? For instance, when a robot draws a letter that needs to pass through new desired points, how does it ensure the new trajectory maintains a similar shape to the demonstration? This question becomes more relevant when a new task is significantly far from the demonstrated region. To tackle this issue, a user often resorts to manual tuning of the hyperparameters of an LfD approach until a satisfactory trajectory is attained. In this paper, we aim to provide closed-loop evaluative feedback for LfD and optimize LfD in an automatic fashion. Specifically, we consider dynamical movement primitives (DMP) and kernelized movement primitives (KMP) as examples and develop a generic optimization framework capable of measuring the generalization performance of DMP and KMP and auto-optimizing their hyperparameters without any human inputs. Evaluations including a peg-in-hole task and a pushing task on a real robot evidence the applicability of our framework.
Model Inversion Attacks on Homogeneous and Heterogeneous Graph Neural Networks
- Authors: Authors: Renyang Liu, Wei Zhou, Jinhong Zhang, Xiaoyuan Liu, Peiyuan Si, Haoran Li
- Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2310.09800
- Pdf link: https://arxiv.org/pdf/2310.09800
- Abstract Recently, Graph Neural Networks (GNNs), including Homogeneous Graph Neural Networks (HomoGNNs) and Heterogeneous Graph Neural Networks (HeteGNNs), have made remarkable progress in many physical scenarios, especially in communication applications. Despite achieving great success, the privacy issue of such models has also received considerable attention. Previous studies have shown that given a well-fitted target GNN, the attacker can reconstruct the sensitive training graph of this model via model inversion attacks, leading to significant privacy worries for the AI service provider. We advocate that the vulnerability comes from the target GNN itself and the prior knowledge about the shared properties in real-world graphs. Inspired by this, we propose a novel model inversion attack method on HomoGNNs and HeteGNNs, namely HomoGMI and HeteGMI. Specifically, HomoGMI and HeteGMI are gradient-descent-based optimization methods that aim to maximize the cross-entropy loss on the target GNN and the $1^{st}$ and $2^{nd}$-order proximities on the reconstructed graph. Notably, to the best of our knowledge, HeteGMI is the first attempt to perform model inversion attacks on HeteGNNs. Extensive experiments on multiple benchmarks demonstrate that the proposed method can achieve better performance than the competitors.
Optimizing K-means for Big Data: A Comparative Study
- Authors: Authors: Ravil Mussabayev, Rustam Mussabayev
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
- Arxiv link: https://arxiv.org/abs/2310.09819
- Pdf link: https://arxiv.org/pdf/2310.09819
- Abstract This paper presents a comparative analysis of different optimization techniques for the K-means algorithm in the context of big data. K-means is a widely used clustering algorithm, but it can suffer from scalability issues when dealing with large datasets. The paper explores different approaches to overcome these issues, including parallelization, approximation, and sampling methods. The authors evaluate the performance of these techniques on various benchmark datasets and compare them in terms of speed, quality of clustering, and scalability according to the LIMA dominance criterion. The results show that different techniques are more suitable for different types of datasets and provide insights into the trade-offs between speed and accuracy in K-means clustering for big data. Overall, the paper offers a comprehensive guide for practitioners and researchers on how to optimize K-means for big data applications.
MIR2: Towards Provably Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization
- Authors: Authors: Simin Li, Ruixiao Xu, Jun Guo, Pu Feng, Jiakai Wang, Aishan Liu, Yaodong Yang, Xianglong Liu, Weifeng Lv
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2310.09833
- Pdf link: https://arxiv.org/pdf/2310.09833
- Abstract Robust multi-agent reinforcement learning (MARL) necessitates resilience to uncertain or worst-case actions by unknown allies. Existing max-min optimization techniques in robust MARL seek to enhance resilience by training agents against worst-case adversaries, but this becomes intractable as the number of agents grows, leading to exponentially increasing worst-case scenarios. Attempts to simplify this complexity often yield overly pessimistic policies, inadequate robustness across scenarios and high computational demands. Unlike these approaches, humans naturally learn adaptive and resilient behaviors without the necessity of preparing for every conceivable worst-case scenario. Motivated by this, we propose MIR2, which trains policy in routine scenarios and minimize Mutual Information as Robust Regularization. Theoretically, we frame robustness as an inference problem and prove that minimizing mutual information between histories and actions implicitly maximizes a lower bound on robustness under certain assumptions. Further analysis reveals that our proposed approach prevents agents from overreacting to others through an information bottleneck and aligns the policy with a robust action prior. Empirically, our MIR2 displays even greater resilience against worst-case adversaries than max-min optimization in StarCraft II, Multi-agent Mujoco and rendezvous. Our superiority is consistent when deployed in challenging real-world robot swarm control scenario. See code and demo videos in Supplementary Materials.
XRMDN: A Recurrent Mixture Density Networks-based Architecture for Short-Term Probabilistic Demand Forecasting in Mobility-on-Demand Systems with High Volatility
- Authors: Authors: Xiaoming Li, Hubert Normandin-Taillon, Chun Wang, Xiao Huang
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2310.09847
- Pdf link: https://arxiv.org/pdf/2310.09847
- Abstract In real Mobility-on-Demand (MoD) systems, demand is subject to high and dynamic volatility, which is difficult to predict by conventional time-series forecasting approaches. Most existing forecasting approaches yield the point value as the prediction result, which ignores the uncertainty that exists in the forecasting result. This will lead to the forecasting result severely deviating from the true demand value due to the high volatility existing in demand. To fill the gap, we propose an extended recurrent mixture density network (XRMDN), which extends the weight and mean neural networks to recurrent neural networks. The recurrent neurons for mean and variance can capture the trend of the historical data-series data, which enables a better forecasting result in dynamic and high volatility. We conduct comprehensive experiments on one taxi trip record and one bike-sharing real MoD data set to validate the performance of XRMDN. Specifically, we compare our model to three types of benchmark models, including statistical, machine learning, and deep learning models on three evaluation metrics. The validation results show that XRMDN outperforms the three groups of benchmark models in terms of the evaluation metrics. Most importantly, XRMDN substantially improves the forecasting accuracy with the demands in strong volatility. Last but not least, this probabilistic demand forecasting model contributes not only to the demand prediction in MoD systems but also to other optimization application problems, especially optimization under uncertainty, in MoD applications.
Federated Reinforcement Learning for Resource Allocation in V2X Networks
- Authors: Authors: Kaidi Xu, Shenglong Zhou, Geoffrey Ye Li
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2310.09858
- Pdf link: https://arxiv.org/pdf/2310.09858
- Abstract Resource allocation significantly impacts the performance of vehicle-to-everything (V2X) networks. Most existing algorithms for resource allocation are based on optimization or machine learning (e.g., reinforcement learning). In this paper, we explore resource allocation in a V2X network under the framework of federated reinforcement learning (FRL). On one hand, the usage of RL overcomes many challenges from the model-based optimization schemes. On the other hand, federated learning (FL) enables agents to deal with a number of practical issues, such as privacy, communication overhead, and exploration efficiency. The framework of FRL is then implemented by the inexact alternative direction method of multipliers (ADMM), where subproblems are solved approximately using policy gradients and accelerated by an adaptive step size calculated from their second moments. The developed algorithm, PASM, is proven to be convergent under mild conditions and has a nice numerical performance compared with some baseline methods for solving the resource allocation problem in a V2X network.
Federated Multi-Objective Learning
- Authors: Authors: Haibo Yang, Zhuqing Liu, Jia Liu, Chaosheng Dong, Michinari Momma
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
- Arxiv link: https://arxiv.org/abs/2310.09866
- Pdf link: https://arxiv.org/pdf/2310.09866
- Abstract In recent years, multi-objective optimization (MOO) emerges as a foundational problem underpinning many multi-agent multi-task learning applications. However, existing algorithms in MOO literature remain limited to centralized learning settings, which do not satisfy the distributed nature and data privacy needs of such multi-agent multi-task learning applications. This motivates us to propose a new federated multi-objective learning (FMOL) framework with multiple clients distributively and collaboratively solving an MOO problem while keeping their training data private. Notably, our FMOL framework allows a different set of objective functions across different clients to support a wide range of applications, which advances and generalizes the MOO formulation to the federated learning paradigm for the first time. For this FMOL framework, we propose two new federated multi-objective optimization (FMOO) algorithms called federated multi-gradient descent averaging (FMGDA) and federated stochastic multi-gradient descent averaging (FSMGDA). Both algorithms allow local updates to significantly reduce communication costs, while achieving the {\em same} convergence rates as those of the their algorithmic counterparts in the single-objective federated learning. Our extensive experiments also corroborate the efficacy of our proposed FMOO algorithms.
Leveraging Large Language Models (LLMs) to Empower Training-Free Dataset Condensation for Content-Based Recommendation
- Authors: Authors: Jiahao Wu, Qijiong Liu, Hengchang Hu, Wenqi Fan, Shengcai Liu, Qing Li, Xiao-Ming Wu, Ke Tang
- Subjects: Information Retrieval (cs.IR)
- Arxiv link: https://arxiv.org/abs/2310.09874
- Pdf link: https://arxiv.org/pdf/2310.09874
- Abstract Modern techniques in Content-based Recommendation (CBR) leverage item content information to provide personalized services to users, but suffer from resource-intensive training on large datasets. To address this issue, we explore the dataset condensation for textual CBR in this paper. The goal of dataset condensation is to synthesize a small yet informative dataset, upon which models can achieve performance comparable to those trained on large datasets. While existing condensation approaches are tailored to classification tasks for continuous data like images or embeddings, direct application of them to CBR has limitations. To bridge this gap, we investigate efficient dataset condensation for content-based recommendation. Inspired by the remarkable abilities of large language models (LLMs) in text comprehension and generation, we leverage LLMs to empower the generation of textual content during condensation. To handle the interaction data involving both users and items, we devise a dual-level condensation method: content-level and user-level. At content-level, we utilize LLMs to condense all contents of an item into a new informative title. At user-level, we design a clustering-based synthesis module, where we first utilize LLMs to extract user interests. Then, the user interests and user embeddings are incorporated to condense users and generate interactions for condensed users. Notably, the condensation paradigm of this method is forward and free from iterative optimization on the synthesized dataset. Extensive empirical findings from our study, conducted on three authentic datasets, substantiate the efficacy of the proposed method. Particularly, we are able to approximate up to 97% of the original performance while reducing the dataset size by 95% (i.e., on dataset MIND).
Score-Based Methods for Discrete Optimization in Deep Learning
- Authors: Authors: Eric Lei, Arman Adibi, Hamed Hassani
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2310.09890
- Pdf link: https://arxiv.org/pdf/2310.09890
- Abstract Discrete optimization problems often arise in deep learning tasks, despite the fact that neural networks typically operate on continuous data. One class of these problems involve objective functions which depend on neural networks, but optimization variables which are discrete. Although the discrete optimization literature provides efficient algorithms, they are still impractical in these settings due to the high cost of an objective function evaluation, which involves a neural network forward-pass. In particular, they require $O(n)$ complexity per iteration, but real data such as point clouds have values of $n$ in thousands or more. In this paper, we investigate a score-based approximation framework to solve such problems. This framework uses a score function as a proxy for the marginal gain of the objective, leveraging embeddings of the discrete variables and speed of auto-differentiation frameworks to compute backward-passes in parallel. We experimentally demonstrate, in adversarial set classification tasks, that our method achieves a superior trade-off in terms of speed and solution quality compared to heuristic methods.
Towards Deep Learning Models Resistant to Transfer-based Adversarial Attacks via Data-centric Robust Learning
- Authors: Authors: Yulong Yang, Chenhao Lin, Xiang Ji, Qiwei Tian, Qian Li, Hongshan Yang, Zhibo Wang, Chao Shen
- Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2310.09891
- Pdf link: https://arxiv.org/pdf/2310.09891
- Abstract Transfer-based adversarial attacks raise a severe threat to real-world deep learning systems since they do not require access to target models. Adversarial training (AT), which is recognized as the strongest defense against white-box attacks, has also guaranteed high robustness to (black-box) transfer-based attacks. However, AT suffers from heavy computational overhead since it optimizes the adversarial examples during the whole training process. In this paper, we demonstrate that such heavy optimization is not necessary for AT against transfer-based attacks. Instead, a one-shot adversarial augmentation prior to training is sufficient, and we name this new defense paradigm Data-centric Robust Learning (DRL). Our experimental results show that DRL outperforms widely-used AT techniques (e.g., PGD-AT, TRADES, EAT, and FAT) in terms of black-box robustness and even surpasses the top-1 defense on RobustBench when combined with diverse data augmentations and loss regularizations. We also identify other benefits of DRL, for instance, the model generalization capability and robust fairness.
BONES: Near-Optimal Neural-Enhanced Video Streaming
- Authors: Authors: Lingdong Wang, Simran Singh, Jacob Chakareski, Mohammad Hajiesmaili, Ramesh K. Sitaraman
- Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
- Arxiv link: https://arxiv.org/abs/2310.09920
- Pdf link: https://arxiv.org/pdf/2310.09920
- Abstract Accessing high-quality video content can be challenging due to insufficient and unstable network bandwidth. Recent advances in neural enhancement have shown promising results in improving the quality of degraded videos through deep learning. Neural-Enhanced Streaming (NES) incorporates this new approach into video streaming, allowing users to download low-quality video segments and then enhance them to obtain high-quality content without violating the playback of the video stream. We introduce BONES, an NES control algorithm that jointly manages the network and computational resources to maximize the quality of experience (QoE) of the user. BONES formulates NES as a Lyapunov optimization problem and solves it in an online manner with near-optimal performance, making it the first NES algorithm to provide a theoretical performance guarantee. Our comprehensive experimental results indicate that BONES increases QoE by 4% to 13% over state-of-the-art algorithms, demonstrating its potential to enhance the video streaming experience for users. Our code and data will be released to the public.
ProteusNeRF: Fast Lightweight NeRF Editing using 3D-Aware Image Context
- Authors: Authors: Binglun Wang, Niladri Shekhar Dutt, Niloy J. Mitra
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- Arxiv link: https://arxiv.org/abs/2310.09965
- Pdf link: https://arxiv.org/pdf/2310.09965
- Abstract Neural Radiance Fields (NeRFs) have recently emerged as a popular option for photo-realistic object capture due to their ability to faithfully capture high-fidelity volumetric content even from handheld video input. Although much research has been devoted to efficient optimization leading to real-time training and rendering, options for interactive editing NeRFs remain limited. We present a very simple but effective neural network architecture that is fast and efficient while maintaining a low memory footprint. This architecture can be incrementally guided through user-friendly image-based edits. Our representation allows straightforward object selection via semantic feature distillation at the training stage. More importantly, we propose a local 3D-aware image context to facilitate view-consistent image editing that can then be distilled into fine-tuned NeRFs, via geometric and appearance adjustments. We evaluate our setup on a variety of examples to demonstrate appearance and geometric edits and report 10-30x speedup over concurrent work focusing on text-guided NeRF editing. Video results can be seen on our project webpage at https://proteusnerf.github.io.
Algorithmic Contract Design for Crowdsourced Ranking
- Authors: Authors: Kiriaki Frangias, Andrew Lin, Ellen Vitercik, Manolis Zampetakis
- Subjects: Computer Science and Game Theory (cs.GT)
- Arxiv link: https://arxiv.org/abs/2310.09974
- Pdf link: https://arxiv.org/pdf/2310.09974
- Abstract Ranking is fundamental to many areas, such as search engine and recommender system optimization, as well as peer grading. Crowdsourcing, which is often used for these tasks, requires proper incentivization to ensure accurate inputs. In this work, we draw on the field of contract theory from Economics to propose a novel mechanism that enables a principal to accurately rank a set of items by incentivizing agents to provide pairwise comparisons of the items. Our mechanism implements these incentives by verifying a subset of each agent's comparisons with a ground-truth ordering, a task we assume to be costly. The agent is compensated (for example, monetarily or with class credit) based on the accuracy of these comparisons. Our mechanism achieves the following guarantees: (1) it only requires the principal to verify $O(\log s)$ comparisons, where $s$ is the total number of agents, and (2) it provably achieves higher total utility for principal compared to ranking the items herself with no crowdsourcing.
Deep Unfolding Network for Image Compressed Sensing by Content-adaptive Gradient Updating and Deformation-invariant Non-local Modeling
- Authors: Authors: Wenxue Cui, Xiaopeng Fan, Jian Zhang, Debin Zhao
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
- Arxiv link: https://arxiv.org/abs/2310.10033
- Pdf link: https://arxiv.org/pdf/2310.10033
- Abstract Inspired by certain optimization solvers, the deep unfolding network (DUN) has attracted much attention in recent years for image compressed sensing (CS). However, there still exist the following two issues: 1) In existing DUNs, most hyperparameters are usually content independent, which greatly limits their adaptability for different input contents. 2) In each iteration, a plain convolutional neural network is usually adopted, which weakens the perception of wider context prior and therefore depresses the expressive ability. In this paper, inspired by the traditional Proximal Gradient Descent (PGD) algorithm, a novel DUN for image compressed sensing (dubbed DUN-CSNet) is proposed to solve the above two issues. Specifically, for the first issue, a novel content adaptive gradient descent network is proposed, in which a well-designed step size generation sub-network is developed to dynamically allocate the corresponding step sizes for different textures of input image by generating a content-aware step size map, realizing a content-adaptive gradient updating. For the second issue, considering the fact that many similar patches exist in an image but have undergone a deformation, a novel deformation-invariant non-local proximal mapping network is developed, which can adaptively build the long-range dependencies between the nonlocal patches by deformation-invariant non-local modeling, leading to a wider perception on context priors. Extensive experiments manifest that the proposed DUN-CSNet outperforms existing state-of-the-art CS methods by large margins.
Empirical Study of Zero-Shot NER with ChatGPT
- Authors: Authors: Tingyu Xie, Qi Li, Jian Zhang, Yan Zhang, Zuozhu Liu, Hongwei Wang
- Subjects: Computation and Language (cs.CL)
- Arxiv link: https://arxiv.org/abs/2310.10035
- Pdf link: https://arxiv.org/pdf/2310.10035
- Abstract Large language models (LLMs) exhibited powerful capability in various natural language processing tasks. This work focuses on exploring LLM performance on zero-shot information extraction, with a focus on the ChatGPT and named entity recognition (NER) task. Inspired by the remarkable reasoning capability of LLM on symbolic and arithmetic reasoning, we adapt the prevalent reasoning methods to NER and propose reasoning strategies tailored for NER. First, we explore a decomposed question-answering paradigm by breaking down the NER task into simpler subproblems by labels. Second, we propose syntactic augmentation to stimulate the model's intermediate thinking in two ways: syntactic prompting, which encourages the model to analyze the syntactic structure itself, and tool augmentation, which provides the model with the syntactic information generated by a parsing tool. Besides, we adapt self-consistency to NER by proposing a two-stage majority voting strategy, which first votes for the most consistent mentions, then the most consistent types. The proposed methods achieve remarkable improvements for zero-shot NER across seven benchmarks, including Chinese and English datasets, and on both domain-specific and general-domain scenarios. In addition, we present a comprehensive analysis of the error types with suggestions for optimization directions. We also verify the effectiveness of the proposed methods on the few-shot setting and other LLMs.
TpopT: Efficient Trainable Template Optimization on Low-Dimensional Manifolds
- Authors: Authors: Jingkai Yan, Shiyu Wang, Xinyu Rain Wei, Jimmy Wang, Zsuzsanna Márka, Szabolcs Márka, John Wright
- Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2310.10039
- Pdf link: https://arxiv.org/pdf/2310.10039
- Abstract In scientific and engineering scenarios, a recurring task is the detection of low-dimensional families of signals or patterns. A classic family of approaches, exemplified by template matching, aims to cover the search space with a dense template bank. While simple and highly interpretable, it suffers from poor computational efficiency due to unfavorable scaling in the signal space dimensionality. In this work, we study TpopT (TemPlate OPTimization) as an alternative scalable framework for detecting low-dimensional families of signals which maintains high interpretability. We provide a theoretical analysis of the convergence of Riemannian gradient descent for TpopT, and prove that it has a superior dimension scaling to covering. We also propose a practical TpopT framework for nonparametric signal sets, which incorporates techniques of embedding and kernel interpolation, and is further configurable into a trainable network architecture by unrolled optimization. The proposed trainable TpopT exhibits significantly improved efficiency-accuracy tradeoffs for gravitational wave detection, where matched filtering is currently a method of choice. We further illustrate the general applicability of this approach with experiments on handwritten digit data.
Latent Conservative Objective Models for Data-Driven Crystal Structure Prediction
- Authors: Authors: Han Qi, Xinyang Geng, Stefano Rando, Iku Ohama, Aviral Kumar, Sergey Levine
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2310.10056
- Pdf link: https://arxiv.org/pdf/2310.10056
- Abstract In computational chemistry, crystal structure prediction (CSP) is an optimization problem that involves discovering the lowest energy stable crystal structure for a given chemical formula. This problem is challenging as it requires discovering globally optimal designs with the lowest energies on complex manifolds. One approach to tackle this problem involves building simulators based on density functional theory (DFT) followed by running search in simulation, but these simulators are painfully slow. In this paper, we study present and study an alternate, data-driven approach to crystal structure prediction: instead of directly searching for the most stable structures in simulation, we train a surrogate model of the crystal formation energy from a database of existing crystal structures, and then optimize this model with respect to the parameters of the crystal structure. This surrogate model is trained to be conservative so as to prevent exploitation of its errors by the optimizer. To handle optimization in the non-Euclidean space of crystal structures, we first utilize a state-of-the-art graph diffusion auto-encoder (CD-VAE) to convert a crystal structure into a vector-based search space and then optimize a conservative surrogate model of the crystal energy, trained on top of this vector representation. We show that our approach, dubbed LCOMs (latent conservative objective models), performs comparably to the best current approaches in terms of success rate of structure prediction, while also drastically reducing computational cost.
Expression Domain Translation Network for Cross-domain Head Reenactment
- Authors: Authors: Taewoong Kang, Jeongsik Oh, Jaeseong Lee, Sunghyun Park, Jaegul Choo
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2310.10073
- Pdf link: https://arxiv.org/pdf/2310.10073
- Abstract Despite the remarkable advancements in head reenactment, the existing methods face challenges in cross-domain head reenactment, which aims to transfer human motions to domains outside the human, including cartoon characters. It is still difficult to extract motion from out-of-domain images due to the distinct appearances, such as large eyes. Recently, previous work introduced a large-scale anime dataset called AnimeCeleb and a cross-domain head reenactment model, including an optimization-based mapping function to translate the human domain's expressions to the anime domain. However, we found that the mapping function, which relies on a subset of expressions, imposes limitations on the mapping of various expressions. To solve this challenge, we introduce a novel expression domain translation network that transforms human expressions into anime expressions. Specifically, to maintain the geometric consistency of expressions between the input and output of the expression domain translation network, we employ a 3D geometric-aware loss function that reduces the distances between the vertices in the 3D mesh of the human and anime. By doing so, it forces high-fidelity and one-to-one mapping with respect to two cross-expression domains. Our method outperforms existing methods in both qualitative and quantitative analysis, marking a significant advancement in the field of cross-domain head reenactment.
Let's reward step by step: Step-Level reward model as the Navigators for Reasoning
- Authors: Authors: Qianli Ma, Haotian Zhou, Tingkai Liu, Jianbo Yuan, Pengfei Liu, Yang You, Hongxia Yang
- Subjects: Computation and Language (cs.CL)
- Arxiv link: https://arxiv.org/abs/2310.10080
- Pdf link: https://arxiv.org/pdf/2310.10080
- Abstract Recent years have seen considerable advancements in multi-step reasoning with Large Language Models (LLMs). The previous studies have elucidated the merits of integrating feedback or search mechanisms during model inference to improve the reasoning accuracy. The Process-Supervised Reward Model (PRM), typically furnishes LLMs with step-by-step feedback during the training phase, akin to Proximal Policy Optimization (PPO) or reject sampling. Our objective is to examine the efficacy of PRM in the inference phase to help discern the optimal solution paths for multi-step tasks such as mathematical reasoning and code generation. To this end, we propose a heuristic greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs. This tailored PRM demonstrated enhanced results compared to the Chain of Thought (CoT) on mathematical benchmarks like GSM8K and MATH. Additionally, to explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks. Thus highlighting the robust nature of our reward-model-based approach to inference for reasoning tasks.
Solution to Advanced Manufacturing Process Problems using Cohort Intelligence Algorithm with Improved Constraint Handling Approaches
- Authors: Authors: Aniket Nargundkar, Madhav Rawal, Aryaman Patel, Anand J Kulkarni, Apoorva S Shastri
- Subjects: Neural and Evolutionary Computing (cs.NE)
- Arxiv link: https://arxiv.org/abs/2310.10085
- Pdf link: https://arxiv.org/pdf/2310.10085
- Abstract Recently, various Artificial Intelligence (AI) based optimization metaheuristics are proposed and applied for a variety of problems. Cohort Intelligence (CI) algorithm is a socio inspired optimization technique which is successfully applied for solving several unconstrained & constrained real-world problems from the domains such as design, manufacturing, supply chain, healthcare, etc. Generally, real-world problems are constrained in nature. Even though most of the Evolutionary Algorithms (EAs) can efficiently solve unconstrained problems, their performance degenerates when the constraints are involved. In this paper, two novel constraint handling approaches based on modulus and hyperbolic tangent probability distributions are proposed. Constrained CI algorithm with constraint handling approaches based on triangular, modulus and hyperbolic tangent is presented and applied for optimizing advanced manufacturing processes such as Water Jet Machining (WJM), Abrasive Jet Machining (AJM), Ultrasonic Machining (USM) and Grinding process. The solutions obtained using proposed CI algorithm are compared with contemporary algorithms such as Genetic Algorithm, Simulated Annealing, Teaching Learning Based Optimization, etc. The proposed approaches achieved 2%-127% maximization of material removal rate satisfying hard constraints. As compared to the GA, CI with Hyperbolic tangent probability distribution achieved 15%, 2%, 2%, 127%, and 4% improvement in MRR for AJMB, AJMD, WJM, USM, and Grinding processes, respectively contributing to the productivity improvement. The contributions in this paper have opened several avenues for further applicability of the proposed constraint handling approaches for solving complex constrained problems.
Empowering SMPC: Bridging the Gap Between Scalability, Memory Efficiency and Privacy in Neural Network Inference
- Authors: Authors: Ramya Burra, Anshoo Tandon, Srishti Mittal
- Subjects: Cryptography and Security (cs.CR); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2310.10133
- Pdf link: https://arxiv.org/pdf/2310.10133
- Abstract This paper aims to develop an efficient open-source Secure Multi-Party Computation (SMPC) repository, that addresses the issue of practical and scalable implementation of SMPC protocol on machines with moderate computational resources, while aiming to reduce the execution time. We implement the ABY2.0 protocol for SMPC, providing developers with effective tools for building applications on the ABY 2.0 protocol. This article addresses the limitations of the C++ based MOTION2NX framework for secure neural network inference, including memory constraints and operation compatibility issues. Our enhancements include optimizing the memory usage, reducing execution time using a third-party Helper node, and enhancing efficiency while still preserving data privacy. These optimizations enable MNIST dataset inference in just 32 seconds with only 0.2 GB of RAM for a 5-layer neural network. In contrast, the previous baseline implementation required 8.03 GB of RAM and 200 seconds of execution time.
DaPPA: A Data-Parallel Framework for Processing-in-Memory Architectures
- Authors: Authors: Geraldo F. Oliveira, Alain Kohli, David Novo, Juan Gómez-Luna, Onur Mutlu
- Subjects: Hardware Architecture (cs.AR)
- Arxiv link: https://arxiv.org/abs/2310.10168
- Pdf link: https://arxiv.org/pdf/2310.10168
- Abstract To ease the programmability of PIM architectures, we propose DaPPA(data-parallel processing-in-memory architecture), a framework that can, for a given application, automatically distribute input and gather output data, handle memory management, and parallelize work across the DPUs. The key idea behind DaPPA is to remove the responsibility of managing hardware resources from the programmer by providing an intuitive data-parallel pattern-based programming interface that abstracts the hardware components of the UPMEM system. Using this key idea, DaPPA transforms a data-parallel pattern-based application code into the appropriate UPMEM-target code, including the required APIs for data management and code partition, which can then be compiled into a UPMEM-based binary transparently from the programmer. While generating UPMEM-target code, DaPPA implements several code optimizations to improve end-to-end performance.
AdaLomo: Low-memory Optimization with Adaptive Learning Rate
- Authors: Authors: Kai Lv, Hang Yan, Qipeng Guo, Haijun Lv, Xipeng Qiu
- Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
- Arxiv link: https://arxiv.org/abs/2310.10195
- Pdf link: https://arxiv.org/pdf/2310.10195
- Abstract Large language models have achieved remarkable success, but their extensive parameter size necessitates substantial memory for training, thereby setting a high threshold. While the recently proposed low-memory optimization (LOMO) reduces memory footprint, its optimization technique, akin to stochastic gradient descent, is sensitive to hyper-parameters and exhibits suboptimal convergence, failing to match the performance of the prevailing optimizer for large language models, AdamW. Through empirical analysis of the Adam optimizer, we found that, compared to momentum, the adaptive learning rate is more critical for bridging the gap. Building on this insight, we introduce the low-memory optimization with adaptive learning rate (AdaLomo), which offers an adaptive learning rate for each parameter. To maintain memory efficiency, we employ non-negative matrix factorization for the second-order moment estimation in the optimizer state. Additionally, we suggest the use of a grouped update normalization to stabilize convergence. Our experiments with instruction-tuning and further pre-training demonstrate that AdaLomo achieves results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
GEVO-ML: Optimizing Machine Learning Code with Evolutionary Computation
- Authors: Authors: Jhe-Yu Liou, Stephanie Forrest, Carole-Jean Wu
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2310.10211
- Pdf link: https://arxiv.org/pdf/2310.10211
- Abstract Parallel accelerators, such as GPUs, are key enablers for large-scale Machine Learning (ML) applications. However, ML model developers often lack detailed knowledge of the underlying system architectures, while system programmers usually do not have a high-level understanding of the ML model that runs on the specific system. To mitigate this gap between two relevant aspects of domain knowledge, this paper proposes GEVO-ML, a tool for automatically discovering optimization opportunities and tuning the performance of ML kernels, where the model and training/prediction processes are uniformly represented in a single intermediate language, the Multiple-Layer Intermediate Representation (MLIR). GEVO-ML uses multi-objective evolutionary search to find edits (mutations) to MLIR code that ultimately runs on GPUs, improving performance on desired criteria while retaining required functionality. We demonstrate GEVO-ML on two different ML workloads for both model training and prediction. GEVO-ML finds significant Pareto improvements for these models, achieving 90.43% performance improvement when model accuracy is relaxed by 2%, from 91.2% to 89.3%. For the training workloads, GEVO-ML finds a 4.88% improvement in model accuracy, from 91% to 96%, without sacrificing training or testing speed. Our analysis of key GEVO-ML mutations reveals diverse code modifications, while might be foreign to human developers, achieving similar effects with how human developers improve model design, for example, by changing learning rates or pruning non-essential layer parameters.
Using Global Land Cover Product as Prompt for Cropland Mapping via Visual Foundation Model
- Authors: Authors: Chao Tao, Aoran Hu, Rong Xiao, Haifeng Li, Yuze Wang
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2310.10219
- Pdf link: https://arxiv.org/pdf/2310.10219
- Abstract Data-driven deep learning methods have shown great potential in cropland mapping. However, due to multiple factors such as attributes of cropland (topography, climate, crop type) and imaging conditions (viewing angle, illumination, scale), croplands under different scenes demonstrate a great domain gap. This makes it difficult for models trained in the specific scenes to directly generalize to other scenes. A common way to handle this problem is through the "Pretrain+Fine-tuning" paradigm. Unfortunately, considering the variety of features of cropland that are affected by multiple factors, it is hardly to handle the complex domain gap between pre-trained data and target data using only sparse fine-tuned samples as general constraints. Moreover, as the number of model parameters grows, fine-tuning is no longer an easy and low-cost task. With the emergence of prompt learning via visual foundation models, the "Pretrain+Prompting" paradigm redesigns the optimization target by introducing individual prompts for each single sample. This simplifies the domain adaption from generic to specific scenes during model reasoning processes. Therefore, we introduce the "Pretrain+Prompting" paradigm to interpreting cropland scenes and design the auto-prompting (APT) method based on freely available global land cover product. It can achieve a fine-grained adaptation process from generic scenes to specialized cropland scenes without introducing additional label costs. To our best knowledge, this work pioneers the exploration of the domain adaption problems for cropland mapping under prompt learning perspectives. Our experiments using two sub-meter cropland datasets from southern and northern China demonstrated that the proposed method via visual foundation models outperforms traditional supervised learning and fine-tuning approaches in the field of remote sensing.
Moving Object Localization based on the Fusion of Ultra-WideBand and LiDAR with a Mobile Robot
- Authors: Authors: Muhammad Shalihan, Zhiqiang Cao, Khattiya Pongsirijinda, Lin Guo, Billy Pik Lik Lau, Ran Liu, Chau Yuen, U-Xuan Tan
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2310.10289
- Pdf link: https://arxiv.org/pdf/2310.10289
- Abstract Localization of objects is vital for robot-object interaction. Light Detection and Ranging (LiDAR) application in robotics is an emerging and widely used object localization technique due to its accurate distance measurement, long-range, wide field of view, and robustness in different conditions. However, LiDAR is unable to identify the objects when they are obstructed by obstacles, resulting in inaccuracy and noise in localization. To address this issue, we present an approach incorporating LiDAR and Ultra-Wideband (UWB) ranging for object localization. The UWB is popular in sensor fusion localization algorithms due to its low weight and low power consumption. In addition, the UWB is able to return ranging measurements even when the object is not within line-of-sight. Our approach provides an efficient solution to combine an anonymous optical sensor (LiDAR) with an identity-based radio sensor (UWB) to improve the localization accuracy of the object. Our approach consists of three modules. The first module is an object-identification algorithm that compares successive scans from the LiDAR to detect a moving object in the environment and returns the position with the closest range to UWB ranging. The second module estimates the moving object's moving direction using the previous and current estimated position from our object-identification module. It removes the suspicious estimations through an outlier rejection criterion. Lastly, we fuse the LiDAR, UWB ranging, and odometry measurements in pose graph optimization (PGO) to recover the entire trajectory of the robot and object. Extensive experiments were performed to evaluate the performance of the proposed approach.
Multi-Body Neural Scene Flow
- Authors: Authors: Kavisha Vidanapathirana, Shin-Fang Chng, Xueqian Li, Simon Lucey
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2310.10301
- Pdf link: https://arxiv.org/pdf/2310.10301
- Abstract The test-time optimization of scene flow - using a coordinate network as a neural prior - has gained popularity due to its simplicity, lack of dataset bias, and state-of-the-art performance. We observe, however, that although coordinate networks capture general motions by implicitly regularizing the scene flow predictions to be spatially smooth, the neural prior by itself is unable to identify the underlying multi-body rigid motions present in real-world data. To address this, we show that multi-body rigidity can be achieved without the cumbersome and brittle strategy of constraining the $SE(3)$ parameters of each rigid body as done in previous works. This is achieved by regularizing the scene flow optimization to encourage isometry in flow predictions for rigid bodies. This strategy enables multi-body rigidity in scene flow while maintaining a continuous flow field, hence allowing dense long-term scene flow integration across a sequence of point clouds. We conduct extensive experiments on real-world datasets and demonstrate that our approach outperforms the state-of-the-art in 3D scene flow and long-term point-wise 4D trajectory prediction. The code is available at: \href{https://github.com/kavisha725/MBNSF}{https://github.com/kavisha725/MBNSF}.
Adaptive Particle Swarm Optimization for through-foliage target detection with drone swarms
- Authors: Authors: Julia Pöschl
- Subjects: Systems and Control (eess.SY); Neural and Evolutionary Computing (cs.NE)
- Arxiv link: https://arxiv.org/abs/2310.10320
- Pdf link: https://arxiv.org/pdf/2310.10320
- Abstract This work contributes to efforts on autonomously detecting a vegetation-occluded target by airborne observers. It investigates and enhances previous work on a Particle Swarm Optimization (PSO) strategy for Airborne Optical Sectioning (AOS) drone swarms. First, it identifies two issues with that method and proposes to resolve them by a leader stabilization for its scattering and projection-based line positions for its default scanning pattern. Second, it connects this method to other PSO variants and presents a new adaptive PSO strategy for AOS drone swarms that draws on the ideas of Adaptive PSO (APSO).
Optimizing Layerwise Polynomial Approximation for Efficient Private Inference on Fully Homomorphic Encryption: A Dynamic Programming Approach
- Authors: Authors: Junghyun Lee, Eunsang Lee, Young-Sik Kim, Yongwoo Lee, Joon-Woo Lee, Yongjune Kim, Jong-Seon No
- Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2310.10349
- Pdf link: https://arxiv.org/pdf/2310.10349
- Abstract Recent research has explored the implementation of privacy-preserving deep neural networks solely using fully homomorphic encryption. However, its practicality has been limited because of prolonged inference times. When using a pre-trained model without retraining, a major factor contributing to these prolonged inference times is the high-degree polynomial approximation of activation functions such as the ReLU function. The high-degree approximation consumes a substantial amount of homomorphic computational resources, resulting in slower inference. Unlike the previous works approximating activation functions uniformly and conservatively, this paper presents a \emph{layerwise} degree optimization of activation functions to aggressively reduce the inference time while maintaining classification accuracy by taking into account the characteristics of each layer. Instead of the minimax approximation commonly used in state-of-the-art private inference models, we employ the weighted least squares approximation method with the input distributions of activation functions. Then, we obtain the layerwise optimized degrees for activation functions through the \emph{dynamic programming} algorithm, considering how each layer's approximation error affects the classification accuracy of the deep neural network. Furthermore, we propose modulating the ciphertext moduli-chain layerwise to reduce the inference time. By these proposed layerwise optimization methods, we can reduce inference times for the ResNet-20 model and the ResNet-32 model by 3.44 times and 3.16 times, respectively, in comparison to the prior implementations employing uniform degree polynomials and a consistent ciphertext modulus.
Topology optimization of fluidic pressure-driven multi-material compliant mechanisms
- Authors: Authors: Prabhat Kumar, Josh Pinskier, David Howard, Matthijs Langelaar
- Subjects: Computational Engineering, Finance, and Science (cs.CE)
- Arxiv link: https://arxiv.org/abs/2310.10355
- Pdf link: https://arxiv.org/pdf/2310.10355
- Abstract Compliant mechanisms actuated by pneumatic loads are receiving increasing attention due to their direct applicability as soft robots that perform tasks using their flexible bodies. Using multiple materials to build them can further improve their performance and efficiency. Due to developments in additive manufacturing, the fabrication of multi-material soft robots is becoming a real possibility. To exploit this opportunity, there is a need for a dedicated design approach. This paper offers a systematic approach to developing such mechanisms using topology optimization. The extended SIMP scheme is employed for multi-material modeling. The design-dependent nature of the pressure load is modeled using the Darcy law with a volumetric drainage term. Flow coefficient of each element is interpolated using a smoothed Heaviside function. The obtained pressure field is converted to consistent nodal loads. The adjoint-variable approach is employed to determine the sensitivities. A robust formulation is employed, wherein a min-max optimization problem is formulated using the output displacements of the eroded and blueprint designs. Volume constraints are applied to the blueprint design, whereas the strain energy constraint is formulated with respect to the eroded design. The efficacy and success of the approach are demonstrated by designing pneumatically actuated multi-material gripper and contractor mechanisms. A numerical study confirms that multiple-material mechanisms perform relatively better than their single-material counterparts.
BEVGPT: Generative Pre-trained Large Model for Autonomous Driving Prediction, Decision-Making, and Planning
- Authors: Authors: Pengqin Wang, Meixin Zhu, Hongliang Lu, Hui Zhong, Xianda Chen, Shaojie Shen, Xuesong Wang, Yinhai Wang
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2310.10357
- Pdf link: https://arxiv.org/pdf/2310.10357
- Abstract Prediction, decision-making, and motion planning are essential for autonomous driving. In most contemporary works, they are considered as individual modules or combined into a multi-task learning paradigm with a shared backbone but separate task heads. However, we argue that they should be integrated into a comprehensive framework. Although several recent approaches follow this scheme, they suffer from complicated input representations and redundant framework designs. More importantly, they can not make long-term predictions about future driving scenarios. To address these issues, we rethink the necessity of each module in an autonomous driving task and incorporate only the required modules into a minimalist autonomous driving framework. We propose BEVGPT, a generative pre-trained large model that integrates driving scenario prediction, decision-making, and motion planning. The model takes the bird's-eye-view (BEV) images as the only input source and makes driving decisions based on surrounding traffic scenarios. To ensure driving trajectory feasibility and smoothness, we develop an optimization-based motion planning method. We instantiate BEVGPT on Lyft Level 5 Dataset and use Woven Planet L5Kit for realistic driving simulation. The effectiveness and robustness of the proposed framework are verified by the fact that it outperforms previous methods in 100% decision-making metrics and 66% motion planning metrics. Furthermore, the ability of our framework to accurately generate BEV images over the long term is demonstrated through the task of driving scenario prediction. To the best of our knowledge, this is the first generative pre-trained large model for autonomous driving prediction, decision-making, and motion planning with only BEV images as input.
Continuously Adapting Random Sampling (CARS) for Power Electronics Parameter Design
- Authors: Authors: Dominik Happel, Philipp Brendel, Andreas Rosskopf, Stefan Ditze
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2310.10425
- Pdf link: https://arxiv.org/pdf/2310.10425
- Abstract To date, power electronics parameter design tasks are usually tackled using detailed optimization approaches with detailed simulations or using brute force grid search grid search with very fast simulations. A new method, named "Continuously Adapting Random Sampling" (CARS) is proposed, which provides a continuous method in between. This allows for very fast, and / or large amounts of simulations, but increasingly focuses on the most promising parameter ranges. Inspirations are drawn from multi-armed bandit research and lead to prioritized sampling of sub-domains in one high-dimensional parameter tensor. Performance has been evaluated on three exemplary power electronic use-cases, where resulting designs appear competitive to genetic algorithms, but additionally allow for highly parallelizable simulation, as well as continuous progression between explorative and exploitative settings.
Object Detection in Aerial Images in Scarce Data Regimes
- Authors: Authors: Pierre Le Jeune
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2310.10433
- Pdf link: https://arxiv.org/pdf/2310.10433
- Abstract Most contributions on Few-Shot Object Detection (FSOD) evaluate their methods on natural images only, yet the transferability of the announced performance is not guaranteed for applications on other kinds of images. We demonstrate this with an in-depth analysis of existing FSOD methods on aerial images and observed a large performance gap compared to natural images. Small objects, more numerous in aerial images, are the cause for the apparent performance gap between natural and aerial images. As a consequence, we improve FSOD performance on small objects with a carefully designed attention mechanism. In addition, we also propose a scale-adaptive box similarity criterion, that improves the training and evaluation of FSOD methods, particularly for small objects. We also contribute to generic FSOD with two distinct approaches based on metric learning and fine-tuning. Impressive results are achieved with the fine-tuning method, which encourages tackling more complex scenarios such as Cross-Domain FSOD. We conduct preliminary experiments in this direction and obtain promising results. Finally, we address the deployment of the detection models inside COSE's systems. Detection must be done in real-time in extremely large images (more than 100 megapixels), with limited computation power. Leveraging existing optimization tools such as TensorRT, we successfully tackle this engineering challenge.
Flag Sequence Set Design for Low-Complexity Delay-Doppler Estimation
- Authors: Authors: Lingsheng Meng, Yong Liang Guan, Yao Ge, Zilong Liu
- Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2310.10457
- Pdf link: https://arxiv.org/pdf/2310.10457
- Abstract This paper studies Flag sequences for lowcomplexity delay-Doppler estimation by exploiting their distinctive peak-curtain ambiguity functions (AFs). Unlike the existing Flag sequence designs that are limited to prime lengths and periodic auto-AFs, we aim to design Flag sequence sets of arbitrary lengths and with low (nontrivial) periodic/aperiodic auto- and cross-AFs. Since every Flag sequence consists of a Curtain sequence and a Peak sequence, we first investigate the algebraic design of zone-based Curtain sequence sets of arbitrary lengths. Our proposed design gives rise to novel Curtain sequence sets with ideal curtain auto-AFs and low/zero cross-AFs within the delay-Doppler zone of interest. Leveraging these Curtain sequence sets, two optimization problems are formulated to minimize the summed customized weighted integrated sidelobe level (SCWISL) of the Flag sequence set. Accelerated Parallel Partially Majorization-Minimization Algorithms are proposed to jointly optimize the transmit Flag sequences and matched/mismatched reference sequences stored in the receiver. Simulations demonstrate that our proposed Flag sequences lead to improved SCWISL and customized peak-to-max-sidelobe ratio compared with the existing Flag sequences. Additionally, our Flag sequences under Flag method exhibit Mean Squared Errors that approach the Cramer-Rao Lower Bound and the Sampling Bound at high signal-to-noise power ratios.
Adaptive Neural Ranking Framework: Toward Maximized Business Goal for Cascade Ranking Systems
- Authors: Authors: Yunli Wang, Zhiqiang Wang, Jian Yang, Shiyang Wen, Dongying Kong, Han Li, Kun Gai
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2310.10462
- Pdf link: https://arxiv.org/pdf/2310.10462
- Abstract Cascade ranking is widely used for large-scale top-k selection problems in online advertising and recommendation systems, and learning-to-rank is an important way to optimize the models in cascade ranking systems. Previous works on learning-to-rank usually focus on letting the model learn the complete order or pay more attention to the order of top materials, and adopt the corresponding rank metrics as optimization targets. However, these optimization targets can not adapt to various cascade ranking scenarios with varying data complexities and model capabilities; and the existing metric-driven methods such as the Lambda framework can only optimize a rough upper bound of the metric, potentially resulting in performance misalignment. To address these issues, we first propose a novel perspective on optimizing cascade ranking systems by highlighting the adaptability of optimization targets to data complexities and model capabilities. Concretely, we employ multi-task learning framework to adaptively combine the optimization of relaxed and full targets, which refers to metrics Recall@m@k and OAP respectively. Then we introduce a permutation matrix to represent the rank metrics and employ differentiable sorting techniques to obtain a relaxed permutation matrix with controllable approximate error bound. This enables us to optimize both the relaxed and full targets directly and more appropriately using the proposed surrogate losses within the deep learning framework. We named this method as Adaptive Neural Ranking Framework. We use the NeuralSort method to obtain the relaxed permutation matrix and draw on the uncertainty weight method in multi-task learning to optimize the proposed losses jointly. Experiments on a total of 4 public and industrial benchmarks show the effectiveness and generalization of our method, and online experiment shows that our method has significant application value.
Temporally Robust Multi-Agent STL Motion Planning in Continuous Time
- Authors: Authors: Joris Verhagen, Lars Lindemann, Jana Tumova
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2310.10585
- Pdf link: https://arxiv.org/pdf/2310.10585
- Abstract Signal Temporal Logic (STL) is a formal language over continuous-time signals (such as trajectories of a multi-agent system) that allows for the specification of complex spatial and temporal system requirements (such as staying sufficiently close to each other within certain time intervals). To promote robustness in multi-agent motion planning with such complex requirements, we consider motion planning with the goal of maximizing the temporal robustness of their joint STL specification, i.e. maximizing the permissible time shifts of each agent's trajectory while still satisfying the STL specification. Previous methods presented temporally robust motion planning and control in a discrete-time Mixed Integer Linear Programming (MILP) optimization scheme. In contrast, we parameterize the trajectory by continuous B'ezier curves, where the curvature and the time-traversal of the trajectory are parameterized individually. We show an algorithm generating continuous-time temporally robust trajectories and prove soundness of our approach. Moreover, we empirically show that our parametrization realizes this with a considerable speed-up compared to state-of-the-art methods based on constant interval time discretization.
A Tri-Level Optimization Model for Interdependent Infrastructure Network Resilience Against Compound Hazard Events
- Authors: Authors: Matthew R. Oster, Ilya Amburg, Samrat Chatterjee, Daniel A. Eisenberg, Dennis G. Thomas, Feng Pan, Auroop R. Ganguly
- Subjects: Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2310.10587
- Pdf link: https://arxiv.org/pdf/2310.10587
- Abstract Resilient operation of interdependent infrastructures against compound hazard events is essential for maintaining societal well-being. To address consequence assessment challenges in this problem space, we propose a novel tri-level optimization model applied to a proof-of-concept case study with fuel distribution and transportation networks -- encompassing one realistic network; one fictitious, yet realistic network; as well as networks drawn from three synthetic distributions. Mathematically, our approach takes the form of a defender-attacker-defender (DAD) model -- a multi-agent tri-level optimization, comprised of a defender, attacker, and an operator acting in sequence. Here, our notional operator may choose proxy actions to operate an interdependent system comprised of fuel terminals and gas stations (functioning as supplies) and a transportation network with traffic flow (functioning as demand) to minimize unmet demand at gas stations. A notional attacker aims to hypothetically disrupt normal operations by reducing supply at the supply terminals, and the notional defender aims to identify best proxy defense policy options which include hardening supply terminals or allowing alternative distribution methods such as trucking reserve supplies. We solve our DAD formulation at a metropolitan scale and present practical defense policy insights against hypothetical compound hazards. We demonstrate the generalizability of our framework by presenting results for a realistic network; a fictitious, yet realistic network; as well as for three networks drawn from synthetic distributions. Additionally, we demonstrate the scalability of the framework by investigating runtime performance as a function of the network size. Steps for future research are also discussed.
Exploring the Power of Graph Neural Networks in Solving Linear Optimization Problems
- Authors: Authors: Chendi Qian, Didier Chételat, Christopher Morris
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2310.10603
- Pdf link: https://arxiv.org/pdf/2310.10603
- Abstract Recently, machine learning, particularly message-passing graph neural networks (MPNNs), has gained traction in enhancing exact optimization algorithms. For example, MPNNs speed up solving mixed-integer optimization problems by imitating computational intensive heuristics like strong branching, which entails solving multiple linear optimization problems (LPs). Despite the empirical success, the reasons behind MPNNs' effectiveness in emulating linear optimization remain largely unclear. Here, we show that MPNNs can simulate standard interior-point methods for LPs, explaining their practical success. Furthermore, we highlight how MPNNs can serve as a lightweight proxy for solving LPs, adapting to a given problem instance distribution. Empirically, we show that MPNNs solve LP relaxations of standard combinatorial optimization problems close to optimality, often surpassing conventional solvers and competing approaches in solving time.
IW-GAE: Importance weighted group accuracy estimation for improved calibration and model selection in unsupervised domain adaptation
- Authors: Authors: Taejong Joo, Diego Klabjan
- Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2310.10611
- Pdf link: https://arxiv.org/pdf/2310.10611
- Abstract Reasoning about a model's accuracy on a test sample from its confidence is a central problem in machine learning, being connected to important applications such as uncertainty representation, model selection, and exploration. While these connections have been well-studied in the i.i.d. settings, distribution shifts pose significant challenges to the traditional methods. Therefore, model calibration and model selection remain challenging in the unsupervised domain adaptation problem--a scenario where the goal is to perform well in a distribution shifted domain without labels. In this work, we tackle difficulties coming from distribution shifts by developing a novel importance weighted group accuracy estimator. Specifically, we formulate an optimization problem for finding an importance weight that leads to an accurate group accuracy estimation in the distribution shifted domain with theoretical analyses. Extensive experiments show the effectiveness of group accuracy estimation on model calibration and model selection. Our results emphasize the significance of group accuracy estimation for addressing challenges in unsupervised domain adaptation, as an orthogonal improvement direction with improving transferability of accuracy.
Keyword: adam
Farzi Data: Autoregressive Data Distillation
- Authors: Authors: Noveen Sachdeva, Zexue He, Wang-Cheng Kang, Jianmo Ni, Derek Zhiyuan Cheng, Julian McAuley
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
- Arxiv link: https://arxiv.org/abs/2310.09983
- Pdf link: https://arxiv.org/pdf/2310.09983
- Abstract We study data distillation for auto-regressive machine learning tasks, where the input and output have a strict left-to-right causal structure. More specifically, we propose Farzi, which summarizes an event sequence dataset into a small number of synthetic sequences -- Farzi Data -- which are optimized to maintain (if not improve) model performance compared to training on the full dataset. Under the hood, Farzi conducts memory-efficient data distillation by (i) deriving efficient reverse-mode differentiation of the Adam optimizer by leveraging Hessian-Vector Products; and (ii) factorizing the high-dimensional discrete event-space into a latent-space which provably promotes implicit regularization. Empirically, for sequential recommendation and language modeling tasks, we are able to achieve 98-120% of downstream full-data performance when training state-of-the-art models on Farzi Data of size as little as 0.1% of the original dataset. Notably, being able to train better models with significantly less data sheds light on the design of future large auto-regressive models, and opens up new opportunities to further scale up model and data sizes.
Convolution quadratures based on block generalized Adams methods
- Authors: Authors: Ling Liu, Junjie Ma
- Subjects: Numerical Analysis (math.NA)
- Arxiv link: https://arxiv.org/abs/2310.10041
- Pdf link: https://arxiv.org/pdf/2310.10041
- Abstract This paper studies an extension of the classical convolution quadrature, a well-known numerical method for calculation of convolution integrals. In contrast to the existing counterpart, which uses the linear multistep formula or Runge-Kutta method, we employ the block generalized Adams method to discretize the underlying initial value problem. Similar to the convolution quadrature method based on the linear multistep formula, the proposed method can also be implemented on an equispaced grid. In addition, the proposed high-order method is as stable as the convolution quadrature based on the Runge-Kutta method, which indicates that it can accurately solve a wide range of problems without becoming unstable. We provide a detailed convergence analysis for the proposed convolution quadrature method and numerically illustrate our theoretical findings for convolution integrals with smooth and weakly singular kernels.
AdaLomo: Low-memory Optimization with Adaptive Learning Rate
- Authors: Authors: Kai Lv, Hang Yan, Qipeng Guo, Haijun Lv, Xipeng Qiu
- Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
- Arxiv link: https://arxiv.org/abs/2310.10195
- Pdf link: https://arxiv.org/pdf/2310.10195
- Abstract Large language models have achieved remarkable success, but their extensive parameter size necessitates substantial memory for training, thereby setting a high threshold. While the recently proposed low-memory optimization (LOMO) reduces memory footprint, its optimization technique, akin to stochastic gradient descent, is sensitive to hyper-parameters and exhibits suboptimal convergence, failing to match the performance of the prevailing optimizer for large language models, AdamW. Through empirical analysis of the Adam optimizer, we found that, compared to momentum, the adaptive learning rate is more critical for bridging the gap. Building on this insight, we introduce the low-memory optimization with adaptive learning rate (AdaLomo), which offers an adaptive learning rate for each parameter. To maintain memory efficiency, we employ non-negative matrix factorization for the second-order moment estimation in the optimizer state. Additionally, we suggest the use of a grouped update normalization to stabilize convergence. Our experiments with instruction-tuning and further pre-training demonstrate that AdaLomo achieves results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
Time integration schemes based on neural networks for solving partial differential equations on coarse grids
- Authors: Authors: Xinxin Yan, Zhideng Zhou, Xiaohan Cheng, Xiaolei Yang
- Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2310.10308
- Pdf link: https://arxiv.org/pdf/2310.10308
- Abstract The accuracy of solving partial differential equations (PDEs) on coarse grids is greatly affected by the choice of discretization schemes. In this work, we propose to learn time integration schemes based on neural networks which satisfy three distinct sets of mathematical constraints, i.e., unconstrained, semi-constrained with the root condition, and fully-constrained with both root and consistency conditions. We focus on the learning of 3-step linear multistep methods, which we subsequently applied to solve three model PDEs, i.e., the one-dimensional heat equation, the one-dimensional wave equation, and the one-dimensional Burgers' equation. The results show that the prediction error of the learned fully-constrained scheme is close to that of the Runge-Kutta method and Adams-Bashforth method. Compared to the traditional methods, the learned unconstrained and semi-constrained schemes significantly reduce the prediction error on coarse grids. On a grid that is 4 times coarser than the reference grid, the mean square error shows a reduction of up to an order of magnitude for some of the heat equation cases, and a substantial improvement in phase prediction for the wave equation. On a 32 times coarser grid, the mean square error for the Burgers' equation can be reduced by up to 35% to 40%.
Keyword: gradient
PaintHuman: Towards High-fidelity Text-to-3D Human Texturing via Denoised Score Distillation
- Authors: Authors: Jianhui Yu, Hao Zhu, Liming Jiang, Chen Change Loy, Weidong Cai, Wayne Wu
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2310.09458
- Pdf link: https://arxiv.org/pdf/2310.09458
- Abstract Recent advances in zero-shot text-to-3D human generation, which employ the human model prior (eg, SMPL) or Score Distillation Sampling (SDS) with pre-trained text-to-image diffusion models, have been groundbreaking. However, SDS may provide inaccurate gradient directions under the weak diffusion guidance, as it tends to produce over-smoothed results and generate body textures that are inconsistent with the detailed mesh geometry. Therefore, directly leverage existing strategies for high-fidelity text-to-3D human texturing is challenging. In this work, we propose a model called PaintHuman to addresses the challenges from two aspects. We first propose a novel score function, Denoised Score Distillation (DSD), which directly modifies the SDS by introducing negative gradient components to iteratively correct the gradient direction and generate high-quality textures. In addition, we use the depth map as a geometric guidance to ensure the texture is semantically aligned to human mesh surfaces. To guarantee the quality of rendered results, we employ geometry-aware networks to predict surface materials and render realistic human textures. Extensive experiments, benchmarked against state-of-the-art methods, validate the efficacy of our approach.
Extremum seeking in the presence of large delays via time-delay approach to averaging
- Authors: Authors: Xuefei Yang, Emilia Fridman
- Subjects: Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2310.09474
- Pdf link: https://arxiv.org/pdf/2310.09474
- Abstract In this paper, we study gradient-based classical extremum seeking (ES) for uncertain n-dimensional (nD) static quadratic maps in the presence of known large constant distinct input delays and large output constant delay with a small time-varying uncertainty. This uncertainty may appear due to network-based measurements. We present a quantitative analysis via a time-delay approach to averaging. We assume that the Hessian has a nominal known part and norm-bounded uncertainty, the extremum point belongs to a known box, whereas the extremum value to a known interval. By using the orthogonal transformation, we first transform the original static quadratic map into a new one with the Hessian containing a nominal diagonal part. We apply further a time-delay transformation to the resulting ES system and arrive at a time-delay system, which is a perturbation of a linear time-delay system with constant coefficients. Given large delays, we choose appropriate gains to guarantee stability of this linear system. To find a lower bound on the dither frequency for practical stability, we employ variation of constants formula and exploit the delay-dependent positivity of the fundamental solutions of the linear system with their tight exponential bounds. Sampled-data ES in the presence of large distinct input delays is also presented. Explicit conditions in terms of simple scalar inequalities depending on tuning parameters and delay bounds are established to guarantee the practical stability of the ES control systems. We show that given any large delays and initial box, by choosing appropriate gains we can achieve practical stability for fast enough dithers and small enough uncertainties.
Mirage: Model-Agnostic Graph Distillation for Graph Classification
- Authors: Authors: Mridul Gupta, Sahil Manchanda, Sayan Ranu, Hariprasad Kodamana
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2310.09486
- Pdf link: https://arxiv.org/pdf/2310.09486
- Abstract GNNs, like other deep learning models, are data and computation hungry. There is a pressing need to scale training of GNNs on large datasets to enable their usage on low-resource environments. Graph distillation is an effort in that direction with the aim to construct a smaller synthetic training set from the original training data without significantly compromising model performance. While initial efforts are promising, this work is motivated by two key observations: (1) Existing graph distillation algorithms themselves rely on training with the full dataset, which undermines the very premise of graph distillation. (2) The distillation process is specific to the target GNN architecture and hyper-parameters and thus not robust to changes in the modeling pipeline. We circumvent these limitations by designing a distillation algorithm called Mirage for graph classification. Mirage is built on the insight that a message-passing GNN decomposes the input graph into a multiset of computation trees. Furthermore, the frequency distribution of computation trees is often skewed in nature, enabling us to condense this data into a concise distilled summary. By compressing the computation data itself, as opposed to emulating gradient flows on the original training set-a prevalent approach to date-Mirage transforms into an unsupervised and architecture-agnostic distillation algorithm. Extensive benchmarking on real-world datasets underscores Mirage's superiority, showcasing enhanced generalization accuracy, data compression, and distillation efficiency when compared to state-of-the-art baselines.
Reduced Policy Optimization for Continuous Control with Hard Constraints
- Authors: Authors: Shutong Ding, Jingya Wang, Yali Du, Ye Shi
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2310.09574
- Pdf link: https://arxiv.org/pdf/2310.09574
- Abstract Recent advances in constrained reinforcement learning (RL) have endowed reinforcement learning with certain safety guarantees. However, deploying existing constrained RL algorithms in continuous control tasks with general hard constraints remains challenging, particularly in those situations with non-convex hard constraints. Inspired by the generalized reduced gradient (GRG) algorithm, a classical constrained optimization technique, we propose a reduced policy optimization (RPO) algorithm that combines RL with GRG to address general hard constraints. RPO partitions actions into basic actions and nonbasic actions following the GRG method and outputs the basic actions via a policy network. Subsequently, RPO calculates the nonbasic actions by solving equations based on equality constraints using the obtained basic actions. The policy network is then updated by implicitly differentiating nonbasic actions with respect to basic actions. Additionally, we introduce an action projection procedure based on the reduced gradient and apply a modified Lagrangian relaxation technique to ensure inequality constraints are satisfied. To the best of our knowledge, RPO is the first attempt that introduces GRG to RL as a way of efficiently handling both equality and inequality hard constraints. It is worth noting that there is currently a lack of RL environments with complex hard constraints, which motivates us to develop three new benchmarks: two robotics manipulation tasks and a smart grid operation control task. With these benchmarks, RPO achieves better performance than previous constrained RL algorithms in terms of both cumulative reward and constraint violation. We believe RPO, along with the new benchmarks, will open up new opportunities for applying RL to real-world problems with complex constraints.
DPZero: Dimension-Independent and Differentially Private Zeroth-Order Optimization
- Authors: Authors: Liang Zhang, Kiran Koshy Thekumparampil, Sewoong Oh, Niao He
- Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Optimization and Control (math.OC); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2310.09639
- Pdf link: https://arxiv.org/pdf/2310.09639
- Abstract The widespread practice of fine-tuning pretrained large language models (LLMs) on domain-specific data faces two major challenges in memory and privacy. First, as the size of LLMs continue to grow, encompassing billions of parameters, the memory demands of gradient-based training methods via backpropagation become prohibitively high. Second, given the tendency of LLMs to memorize and disclose sensitive training data, the privacy of fine-tuning data must be respected. To this end, we explore the potential of zeroth-order methods in differentially private optimization for fine-tuning LLMs. Zeroth-order methods, which rely solely on forward passes, substantially reduce memory consumption during training. However, directly combining them with standard differential privacy mechanism poses dimension-dependent complexity. To bridge the gap, we introduce DPZero, a novel differentially private zeroth-order algorithm with nearly dimension-independent rates. Our theoretical analysis reveals that its complexity hinges primarily on the problem's intrinsic dimension and exhibits only a logarithmic dependence on the ambient dimension. This renders DPZero a highly practical option for real-world LLMs deployments.
Enhancing Task Performance of Learned Simplified Models via Reinforcement Learning
- Authors: Authors: Hien Bui, Michael Posa
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2310.09714
- Pdf link: https://arxiv.org/pdf/2310.09714
- Abstract In contact-rich tasks, the hybrid, multi-modal nature of contact dynamics poses great challenges in model representation, planning, and control. Recent efforts have attempted to address these challenges via data-driven methods, learning dynamical models in combination with model predictive control. Those methods, while effective, rely solely on minimizing forward prediction errors to hope for better task performance with MPC controllers. This weak correlation can result in data inefficiency as well as limitations to overall performance. In response, we propose a novel strategy: using a policy gradient algorithm to find a simplified dynamics model that explicitly maximizes task performance. Specifically, we parameterize the stochastic policy as the perturbed output of the MPC controller, thus, the learned model representation can directly associate with the policy or task performance. We apply the proposed method to contact-rich tasks where a three-fingered robotic hand manipulates previously unknown objects. Our method significantly enhances task success rate by up to 15% in manipulating diverse objects compared to the existing method while sustaining data efficiency. Our method can solve some tasks with success rates of 70% or higher using under 30 minutes of data. All videos and codes are available at https://sites.google.com/view/lcs-rl.
Provably Fast Convergence of Independent Natural Policy Gradient for Markov Potential Games
- Authors: Authors: Youbang Sun, Tao Liu, Ruida Zhou, P. R. Kumar, Shahin Shahrampour
- Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
- Arxiv link: https://arxiv.org/abs/2310.09727
- Pdf link: https://arxiv.org/pdf/2310.09727
- Abstract This work studies an independent natural policy gradient (NPG) algorithm for the multi-agent reinforcement learning problem in Markov potential games. It is shown that, under mild technical assumptions and the introduction of the suboptimality gap, the independent NPG method with an oracle providing exact policy evaluation asymptotically reaches an $\epsilon$-Nash Equilibrium (NE) within $\mathcal{O}(1/\epsilon)$ iterations. This improves upon the previous best result of $\mathcal{O}(1/\epsilon^2)$ iterations and is of the same order, $\mathcal{O}(1/\epsilon)$, that is achievable for the single-agent case. Empirical results for a synthetic potential game and a congestion game are presented to verify the theoretical bounds.
Model Inversion Attacks on Homogeneous and Heterogeneous Graph Neural Networks
- Authors: Authors: Renyang Liu, Wei Zhou, Jinhong Zhang, Xiaoyuan Liu, Peiyuan Si, Haoran Li
- Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2310.09800
- Pdf link: https://arxiv.org/pdf/2310.09800
- Abstract Recently, Graph Neural Networks (GNNs), including Homogeneous Graph Neural Networks (HomoGNNs) and Heterogeneous Graph Neural Networks (HeteGNNs), have made remarkable progress in many physical scenarios, especially in communication applications. Despite achieving great success, the privacy issue of such models has also received considerable attention. Previous studies have shown that given a well-fitted target GNN, the attacker can reconstruct the sensitive training graph of this model via model inversion attacks, leading to significant privacy worries for the AI service provider. We advocate that the vulnerability comes from the target GNN itself and the prior knowledge about the shared properties in real-world graphs. Inspired by this, we propose a novel model inversion attack method on HomoGNNs and HeteGNNs, namely HomoGMI and HeteGMI. Specifically, HomoGMI and HeteGMI are gradient-descent-based optimization methods that aim to maximize the cross-entropy loss on the target GNN and the $1^{st}$ and $2^{nd}$-order proximities on the reconstructed graph. Notably, to the best of our knowledge, HeteGMI is the first attempt to perform model inversion attacks on HeteGNNs. Extensive experiments on multiple benchmarks demonstrate that the proposed method can achieve better performance than the competitors.
Federated Reinforcement Learning for Resource Allocation in V2X Networks
- Authors: Authors: Kaidi Xu, Shenglong Zhou, Geoffrey Ye Li
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2310.09858
- Pdf link: https://arxiv.org/pdf/2310.09858
- Abstract Resource allocation significantly impacts the performance of vehicle-to-everything (V2X) networks. Most existing algorithms for resource allocation are based on optimization or machine learning (e.g., reinforcement learning). In this paper, we explore resource allocation in a V2X network under the framework of federated reinforcement learning (FRL). On one hand, the usage of RL overcomes many challenges from the model-based optimization schemes. On the other hand, federated learning (FL) enables agents to deal with a number of practical issues, such as privacy, communication overhead, and exploration efficiency. The framework of FRL is then implemented by the inexact alternative direction method of multipliers (ADMM), where subproblems are solved approximately using policy gradients and accelerated by an adaptive step size calculated from their second moments. The developed algorithm, PASM, is proven to be convergent under mild conditions and has a nice numerical performance compared with some baseline methods for solving the resource allocation problem in a V2X network.
Stacked Intelligent Metasurface Performs a 2D DFT in the Wave Domain for DOA Estimation
- Authors: Authors: Jiancheng An, Chau Yuen, Marco Di Renzo, Merouane Debbah, H. Vincent Poor, Lajos Hanzo
- Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2310.09861
- Pdf link: https://arxiv.org/pdf/2310.09861
- Abstract Staked intelligent metasurface (SIM) based techniques are developed to perform two-dimensional (2D) direction-of-arrival (DOA) estimation. In contrast to the conventional designs, an advanced SIM in front of the receiving array automatically performs the 2D discrete Fourier transform (DFT) as the incident waves propagate through it. To arrange for the SIM to carry out this task, we design a gradient descent algorithm for iteratively updating the phase shift of each meta-atom in the SIM to minimize the fitting error between the SIM's response and the 2D DFT matrix. To further improve the DOA estimation accuracy, we configure the phase shifts in the input layer of SIM to generate a set of 2D DFT matrices having orthogonal spatial frequency bins. Extensive numerical simulations verify the capability of a well-trained SIM to perform 2D DFT. Specifically, it is demonstrated that the SIM having an optical computational speed achieves an MSE of $10^{-4}$ in 2D DOA estimation.
Federated Multi-Objective Learning
- Authors: Authors: Haibo Yang, Zhuqing Liu, Jia Liu, Chaosheng Dong, Michinari Momma
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
- Arxiv link: https://arxiv.org/abs/2310.09866
- Pdf link: https://arxiv.org/pdf/2310.09866
- Abstract In recent years, multi-objective optimization (MOO) emerges as a foundational problem underpinning many multi-agent multi-task learning applications. However, existing algorithms in MOO literature remain limited to centralized learning settings, which do not satisfy the distributed nature and data privacy needs of such multi-agent multi-task learning applications. This motivates us to propose a new federated multi-objective learning (FMOL) framework with multiple clients distributively and collaboratively solving an MOO problem while keeping their training data private. Notably, our FMOL framework allows a different set of objective functions across different clients to support a wide range of applications, which advances and generalizes the MOO formulation to the federated learning paradigm for the first time. For this FMOL framework, we propose two new federated multi-objective optimization (FMOO) algorithms called federated multi-gradient descent averaging (FMGDA) and federated stochastic multi-gradient descent averaging (FSMGDA). Both algorithms allow local updates to significantly reduce communication costs, while achieving the {\em same} convergence rates as those of the their algorithmic counterparts in the single-objective federated learning. Our extensive experiments also corroborate the efficacy of our proposed FMOO algorithms.
Lifelong Sequence Generation with Dynamic Module Expansion and Adaptation
- Authors: Authors: Chengwei Qin, Shafiq Joty, Chen Chen
- Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2310.09886
- Pdf link: https://arxiv.org/pdf/2310.09886
- Abstract Lifelong sequence generation (LSG), a problem in continual learning, aims to continually train a model on a sequence of generation tasks to learn constantly emerging new generation patterns while avoiding the forgetting of previous knowledge. Existing LSG methods mainly focus on maintaining old knowledge while paying little attention to knowledge transfer across tasks. In contrast, humans can better learn new tasks by leveraging previously acquired knowledge from similar tasks. Inspired by the learning paradigm of humans, we propose Dynamic Module Expansion and Adaptation (DMEA), which enables the model to dynamically determine the architecture for acquiring new knowledge based on task correlation and select the most similar previous tasks to facilitate adaptation to new tasks. In addition, as the learning process can easily be biased towards the current task which might cause more severe forgetting of previously learned knowledge, we propose dynamic gradient scaling to balance the learning of the current task and replayed tasks. With extensive experiments, we demonstrate that DMEA can consistently outperform existing methods in different LSG settings.
Reformulating NLP tasks to Capture Longitudinal Manifestation of Language Disorders in People with Dementia
- Authors: Authors: Dimitris Gkoumas, Matthew Purver, Maria Liakata
- Subjects: Computation and Language (cs.CL)
- Arxiv link: https://arxiv.org/abs/2310.09897
- Pdf link: https://arxiv.org/pdf/2310.09897
- Abstract Dementia is associated with language disorders which impede communication. Here, we automatically learn linguistic disorder patterns by making use of a moderately-sized pre-trained language model and forcing it to focus on reformulated natural language processing (NLP) tasks and associated linguistic patterns. Our experiments show that NLP tasks that encapsulate contextual information and enhance the gradient signal with linguistic patterns benefit performance. We then use the probability estimates from the best model to construct digital linguistic markers measuring the overall quality in communication and the intensity of a variety of language disorders. We investigate how the digital markers characterize dementia speech from a longitudinal perspective. We find that our proposed communication marker is able to robustly and reliably characterize the language of people with dementia, outperforming existing linguistic approaches; and shows external validity via significant correlation with clinical markers of behaviour. Finally, our proposed linguistic disorder markers provide useful insights into gradual language impairment associated with disease progression.
Unsupervised Discovery of Interpretable Directions in h-space of Pre-trained Diffusion Models
- Authors: Authors: Zijian Zhang, Luping Liu. Zhijie Lin, Yichen Zhu, Zhou Zhao
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2310.09912
- Pdf link: https://arxiv.org/pdf/2310.09912
- Abstract We propose the first unsupervised and learning-based method to identify interpretable directions in the h-space of pre-trained diffusion models. Our method is derived from an existing technique that operates on the GAN latent space. In a nutshell, we employ a shift control module for pre-trained diffusion models to manipulate a sample into a shifted version of itself, followed by a reconstructor to reproduce both the type and the strength of the manipulation. By jointly optimizing them, the model will spontaneously discover disentangled and interpretable directions. To prevent the discovery of meaningless and destructive directions, we employ a discriminator to maintain the fidelity of shifted sample. Due to the iterative generative process of diffusion models, our training requires a substantial amount of GPU VRAM to store numerous intermediate tensors for back-propagating gradient. To address this issue, we first propose a general VRAM-efficient training algorithm based on gradient checkpointing technique to back-propagate any gradient through the whole generative process, with acceptable occupancy of VRAM and sacrifice of training efficiency. Compared with existing related works on diffusion models, our method inherently identifies global and scalable directions, without necessitating any other complicated procedures. Extensive experiments on various datasets demonstrate the effectiveness of our method.
Theoretical Evaluation of Asymmetric Shapley Values for Root-Cause Analysis
- Authors: Authors: Domokos M. Kelen, Mihály Petreczky, Péter Kersch, András A. Benczúr
- Subjects: Machine Learning (cs.LG); Methodology (stat.ME)
- Arxiv link: https://arxiv.org/abs/2310.09961
- Pdf link: https://arxiv.org/pdf/2310.09961
- Abstract In this work, we examine Asymmetric Shapley Values (ASV), a variant of the popular SHAP additive local explanation method. ASV proposes a way to improve model explanations incorporating known causal relations between variables, and is also considered as a way to test for unfair discrimination in model predictions. Unexplored in previous literature, relaxing symmetry in Shapley values can have counter-intuitive consequences for model explanation. To better understand the method, we first show how local contributions correspond to global contributions of variance reduction. Using variance, we demonstrate multiple cases where ASV yields counter-intuitive attributions, arguably producing incorrect results for root-cause analysis. Second, we identify generalized additive models (GAM) as a restricted class for which ASV exhibits desirable properties. We support our arguments by proving multiple theoretical results about the method. Finally, we demonstrate the use of asymmetric attributions on multiple real-world datasets, comparing the results with and without restricted model families using gradient boosting and deep learning models.
Deep Unfolding Network for Image Compressed Sensing by Content-adaptive Gradient Updating and Deformation-invariant Non-local Modeling
- Authors: Authors: Wenxue Cui, Xiaopeng Fan, Jian Zhang, Debin Zhao
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
- Arxiv link: https://arxiv.org/abs/2310.10033
- Pdf link: https://arxiv.org/pdf/2310.10033
- Abstract Inspired by certain optimization solvers, the deep unfolding network (DUN) has attracted much attention in recent years for image compressed sensing (CS). However, there still exist the following two issues: 1) In existing DUNs, most hyperparameters are usually content independent, which greatly limits their adaptability for different input contents. 2) In each iteration, a plain convolutional neural network is usually adopted, which weakens the perception of wider context prior and therefore depresses the expressive ability. In this paper, inspired by the traditional Proximal Gradient Descent (PGD) algorithm, a novel DUN for image compressed sensing (dubbed DUN-CSNet) is proposed to solve the above two issues. Specifically, for the first issue, a novel content adaptive gradient descent network is proposed, in which a well-designed step size generation sub-network is developed to dynamically allocate the corresponding step sizes for different textures of input image by generating a content-aware step size map, realizing a content-adaptive gradient updating. For the second issue, considering the fact that many similar patches exist in an image but have undergone a deformation, a novel deformation-invariant non-local proximal mapping network is developed, which can adaptively build the long-range dependencies between the nonlocal patches by deformation-invariant non-local modeling, leading to a wider perception on context priors. Extensive experiments manifest that the proposed DUN-CSNet outperforms existing state-of-the-art CS methods by large margins.
TpopT: Efficient Trainable Template Optimization on Low-Dimensional Manifolds
- Authors: Authors: Jingkai Yan, Shiyu Wang, Xinyu Rain Wei, Jimmy Wang, Zsuzsanna Márka, Szabolcs Márka, John Wright
- Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2310.10039
- Pdf link: https://arxiv.org/pdf/2310.10039
- Abstract In scientific and engineering scenarios, a recurring task is the detection of low-dimensional families of signals or patterns. A classic family of approaches, exemplified by template matching, aims to cover the search space with a dense template bank. While simple and highly interpretable, it suffers from poor computational efficiency due to unfavorable scaling in the signal space dimensionality. In this work, we study TpopT (TemPlate OPTimization) as an alternative scalable framework for detecting low-dimensional families of signals which maintains high interpretability. We provide a theoretical analysis of the convergence of Riemannian gradient descent for TpopT, and prove that it has a superior dimension scaling to covering. We also propose a practical TpopT framework for nonparametric signal sets, which incorporates techniques of embedding and kernel interpolation, and is further configurable into a trainable network architecture by unrolled optimization. The proposed trainable TpopT exhibits significantly improved efficiency-accuracy tradeoffs for gravitational wave detection, where matched filtering is currently a method of choice. We further illustrate the general applicability of this approach with experiments on handwritten digit data.
SoTTA: Robust Test-Time Adaptation on Noisy Data Streams
- Authors: Authors: Taesik Gong, Yewon Kim, Taeckyung Lee, Sorn Chottananurak, Sung-Ju Lee
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2310.10074
- Pdf link: https://arxiv.org/pdf/2310.10074
- Abstract Test-time adaptation (TTA) aims to address distributional shifts between training and testing data using only unlabeled test data streams for continual model adaptation. However, most TTA methods assume benign test streams, while test samples could be unexpectedly diverse in the wild. For instance, an unseen object or noise could appear in autonomous driving. This leads to a new threat to existing TTA algorithms; we found that prior TTA algorithms suffer from those noisy test samples as they blindly adapt to incoming samples. To address this problem, we present Screening-out Test-Time Adaptation (SoTTA), a novel TTA algorithm that is robust to noisy samples. The key enabler of SoTTA is two-fold: (i) input-wise robustness via high-confidence uniform-class sampling that effectively filters out the impact of noisy samples and (ii) parameter-wise robustness via entropy-sharpness minimization that improves the robustness of model parameters against large gradients from noisy samples. Our evaluation with standard TTA benchmarks with various noisy scenarios shows that our method outperforms state-of-the-art TTA methods under the presence of noisy samples and achieves comparable accuracy to those methods without noisy samples. The source code is available at https://github.com/taeckyung/SoTTA .
Over-the-Air Federated Learning and Optimization
- Authors: Authors: Jingyang Zhu, Yuanming Shi, Yong Zhou, Chunxiao Jiang, Wei Chen, Khaled B. Letaief
- Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Signal Processing (eess.SP)
- Arxiv link: https://arxiv.org/abs/2310.10089
- Pdf link: https://arxiv.org/pdf/2310.10089
- Abstract Federated learning (FL), as an emerging distributed machine learning paradigm, allows a mass of edge devices to collaboratively train a global model while preserving privacy. In this tutorial, we focus on FL via over-the-air computation (AirComp), which is proposed to reduce the communication overhead for FL over wireless networks at the cost of compromising in the learning performance due to model aggregation error arising from channel fading and noise. We first provide a comprehensive study on the convergence of AirComp-based FedAvg (AirFedAvg) algorithms under both strongly convex and non-convex settings with constant and diminishing learning rates in the presence of data heterogeneity. Through convergence and asymptotic analysis, we characterize the impact of aggregation error on the convergence bound and provide insights for system design with convergence guarantees. Then we derive convergence rates for AirFedAvg algorithms for strongly convex and non-convex objectives. For different types of local updates that can be transmitted by edge devices (i.e., local model, gradient, and model difference), we reveal that transmitting local model in AirFedAvg may cause divergence in the training procedure. In addition, we consider more practical signal processing schemes to improve the communication efficiency and further extend the convergence analysis to different forms of model aggregation error caused by these signal processing schemes. Extensive simulation results under different settings of objective functions, transmitted local information, and communication schemes verify the theoretical conclusions.
Block-missing data in linear systems: An unbiased stochastic gradient descent approach
- Authors: Authors: Chelsea Huynh, Anna Ma, Michael Strand
- Subjects: Numerical Analysis (math.NA)
- Arxiv link: https://arxiv.org/abs/2310.10147
- Pdf link: https://arxiv.org/pdf/2310.10147
- Abstract Achieving accurate approximations to solutions of large linear systems is crucial, especially when those systems utilize real-world data. A consequence of using real-world data is that there will inevitably be missingness. Current approaches for dealing with missing data, such as deletion and imputation, can introduce bias. Recent studies proposed an adaptation of stochastic gradient descent (SGD) in specific missing-data models. In this work, we propose a new algorithm, $\ell$-tuple mSGD, for the setting in which data is missing in a block-wise, tuple pattern. We prove that our proposed method uses unbiased estimates of the gradient of the least squares objective in the presence of tuple missing data. We also draw connections between $\ell$-tuple mSGD and previously established SGD-type methods for missing data. Furthermore, we prove our algorithm converges when using updating step sizes and empirically demonstrate the convergence of $\ell$-tuple mSGD on synthetic data. Lastly, we evaluate $\ell$-tuple mSGD applied to real-world continuous glucose monitoring (CGM) device data.
AdaLomo: Low-memory Optimization with Adaptive Learning Rate
- Authors: Authors: Kai Lv, Hang Yan, Qipeng Guo, Haijun Lv, Xipeng Qiu
- Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
- Arxiv link: https://arxiv.org/abs/2310.10195
- Pdf link: https://arxiv.org/pdf/2310.10195
- Abstract Large language models have achieved remarkable success, but their extensive parameter size necessitates substantial memory for training, thereby setting a high threshold. While the recently proposed low-memory optimization (LOMO) reduces memory footprint, its optimization technique, akin to stochastic gradient descent, is sensitive to hyper-parameters and exhibits suboptimal convergence, failing to match the performance of the prevailing optimizer for large language models, AdamW. Through empirical analysis of the Adam optimizer, we found that, compared to momentum, the adaptive learning rate is more critical for bridging the gap. Building on this insight, we introduce the low-memory optimization with adaptive learning rate (AdaLomo), which offers an adaptive learning rate for each parameter. To maintain memory efficiency, we employ non-negative matrix factorization for the second-order moment estimation in the optimizer state. Additionally, we suggest the use of a grouped update normalization to stabilize convergence. Our experiments with instruction-tuning and further pre-training demonstrate that AdaLomo achieves results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
Hamming Encoder: Mining Discriminative k-mers for Discrete Sequence Classification
- Authors: Authors: Junjie Dong, Mudi Jiang, Lianyu Hu, Zengyou He
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2310.10321
- Pdf link: https://arxiv.org/pdf/2310.10321
- Abstract Sequence classification has numerous applications in various fields. Despite extensive studies in the last decades, many challenges still exist, particularly in pattern-based methods. Existing pattern-based methods measure the discriminative power of each feature individually during the mining process, leading to the result of missing some combinations of features with discriminative power. Furthermore, it is difficult to ensure the overall discriminative performance after converting sequences into feature vectors. To address these challenges, we propose a novel approach called Hamming Encoder, which utilizes a binarized 1D-convolutional neural network (1DCNN) architecture to mine discriminative k-mer sets. In particular, we adopt a Hamming distance-based similarity measure to ensure consistency in the feature mining and classification procedure. Our method involves training an interpretable CNN encoder for sequential data and performing a gradient-based search for discriminative k-mer combinations. Experiments show that the Hamming Encoder method proposed in this paper outperforms existing state-of-the-art methods in terms of classification accuracy.
An efficient numerical method for the anisotropic phase field dendritic crystal growth model
- Authors: Authors: Yayu Guo, Mejdi Azaiez, Chuanju Xu
- Subjects: Numerical Analysis (math.NA)
- Arxiv link: https://arxiv.org/abs/2310.10506
- Pdf link: https://arxiv.org/pdf/2310.10506
- Abstract In this paper, we propose and analyze an efficient numerical method for the anisotropic phase field dendritic crystal growth model, which is challenging because we are facing the nonlinear coupling and anisotropic coefficient in the model. The proposed method is a two-step scheme. In the first step, an intermediate solution is computed by using BDF schemes of order up to three for both the phase-field and heat equations. In the second step the intermediate solution is stabilized by multiplying an auxiliary variable. The key of the second step is to stabilize the overall scheme while maintaining the convergence order of the stabilized solution. In order to overcome the difficulty caused by the gradient-dependent anisotropic coefficient and the nonlinear terms, some stabilization terms are added to the BDF schemes in the first step. The second step makes use of a generalized auxiliary variable approach with relaxation. The Fourier spectral method is applied for the spatial discretization. Our analysis shows that the proposed scheme is unconditionally stable and has accuracy in time up to third order. We also provide a sophisticated implementation showing that the computational complexity of our schemes is equivalent to solving two linear equations and some algebraic equations. To the best of our knowledge, this is the cheapest unconditionally stable schemes reported in the literature. Some numerical examples are given to verify the efficiency of the proposed method.
A second-order SO(3)-preserving and energy-stable scheme for orthonormal frame gradient flow model of biaxial nematic liquid crystals
- Authors: Authors: Hanbin Wang, Jie Xu, Zhiguo Yang
- Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
- Arxiv link: https://arxiv.org/abs/2310.10524
- Pdf link: https://arxiv.org/pdf/2310.10524
- Abstract In this paper, we present a novel second-order generalised rotational discrete gradient scheme for numerically approximating the orthonormal frame gradient flow of biaxial nematic liquid crystals. This scheme relies on reformulating the original gradient flow system into an equivalent generalised "rotational" form. A second-order discrete gradient approximation of the energy variation is then devised such that it satisfies an energy difference relation. The proposed numerical scheme has two remarkable properties: (i) it strictly obeys the orthonormal property of the tensor field and (ii) it satisfies the energy dissipation law at the discrete level, regardless of the time step sizes. We provide ample numerical results to validate the accuracy, efficiency, unconditional stability and SO(3)-preserving property of this scheme. In addition, comparisons of the simulation results between the biaxial orthonormal frame gradient flow model and uniaxial Oseen-Frank gradient flow are made to demonstrate the ability of the former to characterize non-axisymmetric local anisotropy.
Learning optimal integration of spatial and temporal information in noisy chemotaxis
- Authors: Authors: Albert Alonso, Julius B. Kirkegaard
- Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Biological Physics (physics.bio-ph)
- Arxiv link: https://arxiv.org/abs/2310.10531
- Pdf link: https://arxiv.org/pdf/2310.10531
- Abstract We investigate the boundary between chemotaxis driven by spatial estimation of gradients and chemotaxis driven by temporal estimation. While it is well known that spatial chemotaxis becomes disadvantageous for small organisms at high noise levels, it is unclear whether there is a discontinuous switch of optimal strategies or a continuous transition exists. Here, we employ deep reinforcement learning to study the possible integration of spatial and temporal information in an a priori unconstrained manner. We parameterize such a combined chemotactic policy by a recurrent neural network and evaluate it using a minimal theoretical model of a chemotactic cell. By comparing with constrained variants of the policy, we show that it converges to purely temporal and spatial strategies at small and large cell sizes, respectively. We find that the transition between the regimes is continuous, with the combined strategy outperforming in the transition region both the constrained variants as well as models that explicitly integrate spatial and temporal information. Finally, by utilizing the attribution method of integrated gradients, we show that the policy relies on a non-trivial combination of spatially and temporally derived gradient information in a ratio that varies dynamically during the chemotactic trajectories.
Microscaling Data Formats for Deep Learning
- Authors: Authors: Bita Darvish Rouhani, Ritchie Zhao, Ankit More, Mathew Hall, Alireza Khodamoradi, Summer Deng, Dhruv Choudhary, Marius Cornea, Eric Dellinger, Kristof Denolf, Stosic Dusan, Venmugil Elango, Maximilian Golub, Alexander Heinecke, Phil James-Roxby, Dharmesh Jani, Gaurav Kolhe, Martin Langhammer, Ada Li, Levi Melnick, Maral Mesmakhosroshahi, Andres Rodriguez, Michael Schulte, Rasoul Shafipour, Lei Shao, Michael Siu, Pradeep Dubey, Paulius Micikevicius, Maxim Naumov, Colin Verilli, Ralph Wittig, Eric Chung
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2310.10537
- Pdf link: https://arxiv.org/pdf/2310.10537
- Abstract Narrow bit-width data formats are key to reducing the computational and storage costs of modern deep learning applications. This paper evaluates Microscaling (MX) data formats that combine a per-block scaling factor with narrow floating-point and integer types for individual elements.MX formats balance the competing needs of hardware efficiency, model accuracy, and user friction. Empirical results on over two dozen benchmarks demonstrate practicality of MX data formats as a drop-in replacement for baseline FP32 for AI inference and training with low user friction. We also show the first instance of training generative language models at sub-8-bit weights, activations, and gradients with minimal accuracy loss and no modifications to the training recipe.
Efficient Dataset Distillation through Alignment with Smooth and High-Quality Expert Trajectories
- Authors: Authors: Jiyuan Shen, Wenzhuo Yang, Kwok-Yan Lam
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2310.10541
- Pdf link: https://arxiv.org/pdf/2310.10541
- Abstract Training a large and state-of-the-art machine learning model typically necessitates the use of large-scale datasets, which, in turn, makes the training and parameter-tuning process expensive and time-consuming. Some researchers opt to distil information from real-world datasets into tiny and compact synthetic datasets while maintaining their ability to train a well-performing model, hence proposing a data-efficient method known as Dataset Distillation (DD). Despite recent progress in this field, existing methods still underperform and cannot effectively replace large datasets. In this paper, unlike previous methods that focus solely on improving the efficacy of student distillation, we are the first to recognize the important interplay between expert and student. We argue the significant impact of expert smoothness when employing more potent expert trajectories in subsequent dataset distillation. Based on this, we introduce the integration of clipping loss and gradient penalty to regulate the rate of parameter changes in expert trajectories. Furthermore, in response to the sensitivity exhibited towards randomly initialized variables during distillation, we propose representative initialization for synthetic dataset and balanced inner-loop loss. Finally, we present two enhancement strategies, namely intermediate matching loss and weight perturbation, to mitigate the potential occurrence of cumulative errors. We conduct extensive experiments on datasets of different scales, sizes, and resolutions. The results demonstrate that the proposed method significantly outperforms prior methods.
Certainty In, Certainty Out: REVQCs for Quantum Machine Learning
- Authors: Authors: Hannah Helgesen, Michael Felsberg, Jan-Åke Larsson
- Subjects: Machine Learning (cs.LG); Quantum Physics (quant-ph)
- Arxiv link: https://arxiv.org/abs/2310.10629
- Pdf link: https://arxiv.org/pdf/2310.10629
- Abstract The field of Quantum Machine Learning (QML) has emerged recently in the hopes of finding new machine learning protocols or exponential speedups for classical ones. Apart from problems with vanishing gradients and efficient encoding methods, these speedups are hard to find because the sampling nature of quantum computers promotes either simulating computations classically or running them many times on quantum computers in order to use approximate expectation values in gradient calculations. In this paper, we make a case for setting high single-sample accuracy as a primary goal. We discuss the statistical theory which enables highly accurate and precise sample inference, and propose a method of reversed training towards this end. We show the effectiveness of this training method by assessing several effective variational quantum circuits (VQCs), trained in both the standard and reversed directions, on random binary subsets of the MNIST and MNIST Fashion datasets, on which our method provides an increase of $10-15%$ in single-sample inference accuracy.
Keyword: super-resolution
FuseSR: Super Resolution for Real-time Rendering through Efficient Multi-resolution Fusion
- Authors: Authors: Zhihua Zhong, Jingsen Zhu, Yuxin Dai, Chuankun Zheng, Yuchi Huo, Guanlin Chen, Hujun Bao, Rui Wang
- Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2310.09726
- Pdf link: https://arxiv.org/pdf/2310.09726
- Abstract The workload of real-time rendering is steeply increasing as the demand for high resolution, high refresh rates, and high realism rises, overwhelming most graphics cards. To mitigate this problem, one of the most popular solutions is to render images at a low resolution to reduce rendering overhead, and then manage to accurately upsample the low-resolution rendered image to the target resolution, a.k.a. super-resolution techniques. Most existing methods focus on exploiting information from low-resolution inputs, such as historical frames. The absence of high frequency details in those LR inputs makes them hard to recover fine details in their high-resolution predictions. In this paper, we propose an efficient and effective super-resolution method that predicts high-quality upsampled reconstructions utilizing low-cost high-resolution auxiliary G-Buffers as additional input. With LR images and HR G-buffers as input, the network requires to align and fuse features at multi resolution levels. We introduce an efficient and effective H-Net architecture to solve this problem and significantly reduce rendering overhead without noticeable quality deterioration. Experiments show that our method is able to produce temporally consistent reconstructions in $4 \times 4$ and even challenging $8 \times 8$ upsampling cases at 4K resolution with real-time performance, with substantially improved quality and significant performance boost compared to existing works.
An Empirical Study of Super-resolution on Low-resolution Micro-expression Recognition
- Authors: Authors: Ling Zhou, Mingpei Wang, Xiaohua Huang, Wenming Zheng, Qirong Mao, Guoying Zhao
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2310.10022
- Pdf link: https://arxiv.org/pdf/2310.10022
- Abstract Micro-expression recognition (MER) in low-resolution (LR) scenarios presents an important and complex challenge, particularly for practical applications such as group MER in crowded environments. Despite considerable advancements in super-resolution techniques for enhancing the quality of LR images and videos, few study has focused on investigate super-resolution for improving LR MER. The scarcity of investigation can be attributed to the inherent difficulty in capturing the subtle motions of micro-expressions, even in original-resolution MER samples, which becomes even more challenging in LR samples due to the loss of distinctive features. Furthermore, a lack of systematic benchmarking and thorough analysis of super-resolution-assisted MER methods has been noted. This paper tackles these issues by conducting a series of benchmark experiments that integrate both super-resolution (SR) and MER methods, guided by an in-depth literature survey. Specifically, we employ seven cutting-edge state-of-the-art (SOTA) MER techniques and evaluate their performance on samples generated from 13 SOTA SR techniques, thereby addressing the problem of super-resolution in MER. Through our empirical study, we uncover the primary challenges associated with SR-assisted MER and identify avenues to tackle these challenges by leveraging recent advancements in both SR and MER methodologies. Our analysis provides insights for progressing toward more efficient SR-assisted MER.
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
- Authors: Authors: Jia-Wei Liu, Yan-Pei Cao, Jay Zhangjie Wu, Weijia Mao, Yuchao Gu, Rui Zhao, Jussi Keppo, Ying Shan, Mike Zheng Shou
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2310.10624
- Pdf link: https://arxiv.org/pdf/2310.10624
- Abstract Despite remarkable research advances in diffusion-based video editing, existing methods are limited to short-length videos due to the contradiction between long-range consistency and frame-wise editing. Recent approaches attempt to tackle this challenge by introducing video-2D representations to degrade video editing to image editing. However, they encounter significant difficulties in handling large-scale motion- and view-change videos especially for human-centric videos. This motivates us to introduce the dynamic Neural Radiance Fields (NeRF) as the human-centric video representation to ease the video editing problem to a 3D space editing task. As such, editing can be performed in the 3D spaces and propagated to the entire video via the deformation field. To provide finer and direct controllable editing, we propose the image-based 3D space editing pipeline with a set of effective designs. These include multi-view multi-pose Score Distillation Sampling (SDS) from both 2D personalized diffusion priors and 3D diffusion priors, reconstruction losses on the reference image, text-guided local parts super-resolution, and style transfer for 3D background space. Extensive experiments demonstrate that our method, dubbed as DynVideo-E, significantly outperforms SOTA approaches on two challenging datasets by a large margin of 50% ~ 95% in terms of human preference. Compelling video comparisons are provided in the project page https://showlab.github.io/DynVideo-E/. Our code and data will be released to the community.