arxiv-updates icon indicating copy to clipboard operation
arxiv-updates copied to clipboard

New submissions for Wed, 27 Dec 23

Open zoq opened this issue 1 year ago • 0 comments

Keyword: sgd

ZO-AdaMU Optimizer: Adapting Perturbation by the Momentum and Uncertainty in Zeroth-order Optimization

  • Authors: Authors: Shuoran Jiang, Qingcai Chen, Youchen Pan, Yang Xiang, Yukang Lin, Xiangping Wu, Chuanyi Liu, Xiaobao Song
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.15184
  • Pdf link: https://arxiv.org/pdf/2312.15184
  • Abstract Lowering the memory requirement in full-parameter training on large models has become a hot research area. MeZO fine-tunes the large language models (LLMs) by just forward passes in a zeroth-order SGD optimizer (ZO-SGD), demonstrating excellent performance with the same GPU memory usage as inference. However, the simulated perturbation stochastic approximation for gradient estimate in MeZO leads to severe oscillations and incurs a substantial time overhead. Moreover, without momentum regularization, MeZO shows severe over-fitting problems. Lastly, the perturbation-irrelevant momentum on ZO-SGD does not improve the convergence rate. This study proposes ZO-AdaMU to resolve the above problems by adapting the simulated perturbation with momentum in its stochastic approximation. Unlike existing adaptive momentum methods, we relocate momentum on simulated perturbation in stochastic gradient approximation. Our convergence analysis and experiments prove this is a better way to improve convergence stability and rate in ZO-SGD. Extensive experiments demonstrate that ZO-AdaMU yields better generalization for LLMs fine-tuning across various NLP tasks than MeZO and its momentum variants.

On the Trajectories of SGD Without Replacement

  • Authors: Authors: Pierfrancesco Beneventano
  • Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2312.16143
  • Pdf link: https://arxiv.org/pdf/2312.16143
  • Abstract This article examines the implicit regularization effect of Stochastic Gradient Descent (SGD). We consider the case of SGD without replacement, the variant typically used to optimize large-scale neural networks. We analyze this algorithm in a more realistic regime than typically considered in theoretical works on SGD, as, e.g., we allow the product of the learning rate and Hessian to be $O(1)$. Our core theoretical result is that optimizing with SGD without replacement is locally equivalent to making an additional step on a novel regularizer. This implies that the trajectory of SGD without replacement diverges from both noise-injected GD and SGD with replacement (in which batches are sampled i.i.d.). Indeed, the two SGDs travel flat regions of the loss landscape in distinct directions and at different speeds. In expectation, SGD without replacement may escape saddles significantly faster and present a smaller variance. Moreover, we find that SGD implicitly regularizes the trace of the noise covariance in the eigendirections of small and negative Hessian eigenvalues. This coincides with penalizing a weighted trace of the Fisher Matrix and the Hessian on several vision tasks, thus encouraging sparsity in the spectrum of the Hessian of the loss in line with empirical observations from prior work. We also propose an explanation for why SGD does not train at the edge of stability (as opposed to GD).

Keyword: optimization

Multi-Criteria Client Selection and Scheduling with Fairness Guarantee for Federated Learning Service

  • Authors: Authors: Meiying Zhang, Huan Zhao, Sheldon Ebron, Ruitao Xie, Kan Yang
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.14941
  • Pdf link: https://arxiv.org/pdf/2312.14941
  • Abstract Federated Learning (FL) enables multiple clients to train machine learning models collaboratively without sharing the raw training data. However, for a given FL task, how to select a group of appropriate clients fairly becomes a challenging problem due to budget restrictions and client heterogeneity. In this paper, we propose a multi-criteria client selection and scheduling scheme with a fairness guarantee, comprising two stages: 1) preliminary client pool selection, and 2) per-round client scheduling. Specifically, we first define a client selection metric informed by several criteria, such as client resources, data quality, and client behaviors. Then, we formulate the initial client pool selection problem into an optimization problem that aims to maximize the overall scores of selected clients within a given budget and propose a greedy algorithm to solve it. To guarantee fairness, we further formulate the per-round client scheduling problem and propose a heuristic algorithm to divide the client pool into several subsets such that every client is selected at least once while guaranteeing that the `integrated' dataset in a subset is close to an independent and identical distribution (iid). Our experimental results show that our scheme can improve the model quality especially when data are non-iid.

LLM Interactive Optimization of Open Source Python Libraries -- Case Studies and Generalization

  • Authors: Authors: Andreas Florath, Franz Kiraly
  • Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2312.14949
  • Pdf link: https://arxiv.org/pdf/2312.14949
  • Abstract With the advent of large language models (LLMs) like GPT-3, a natural question is the extent to which these models can be utilized for source code optimization. This paper presents methodologically stringent case studies applied to well-known open source python libraries pillow and numpy. We find that contemporary LLM ChatGPT-4 (state September and October 2023) is surprisingly adept at optimizing energy and compute efficiency. However, this is only the case in interactive use, with a human expert in the loop. Aware of experimenter bias, we document our qualitative approach in detail, and provide transcript and source code. We start by providing a detailed description of our approach in conversing with the LLM to optimize the _getextrema function in the pillow library, and a quantitative evaluation of the performance improvement. To demonstrate qualitative replicability, we report further attempts on another locus in the pillow library, and one code locus in the numpy library, to demonstrate generalization within and beyond a library. In all attempts, the performance improvement is significant (factor up to 38). We have also not omitted reporting of failed attempts (there were none). We conclude that LLMs are a promising tool for code optimization in open source libraries, but that the human expert in the loop is essential for success. Nonetheless, we were surprised by how few iterations were required to achieve substantial performance improvements that were not obvious to the expert in the loop. We would like bring attention to the qualitative nature of this study, more robust quantitative studies would need to introduce a layer of selecting experts in a representative sample -- we invite the community to collaborate.

Unraveling the Temporal Dynamics of the Unet in Diffusion Models

  • Authors: Authors: Vidya Prasad, Chen Zhu-Tian, Anna Vilanova, Hanspeter Pfister, Nicola Pezzotti, Hendrik Strobelt
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.14965
  • Pdf link: https://arxiv.org/pdf/2312.14965
  • Abstract Diffusion models have garnered significant attention since they can effectively learn complex multivariate Gaussian distributions, resulting in diverse, high-quality outcomes. They introduce Gaussian noise into training data and reconstruct the original data iteratively. Central to this iterative process is a single Unet, adapting across time steps to facilitate generation. Recent work revealed the presence of composition and denoising phases in this generation process, raising questions about the Unets' varying roles. Our study dives into the dynamic behavior of Unets within denoising diffusion probabilistic models (DDPM), focusing on (de)convolutional blocks and skip connections across time steps. We propose an analytical method to systematically assess the impact of time steps and core Unet components on the final output. This method eliminates components to study causal relations and investigate their influence on output changes. The main purpose is to understand the temporal dynamics and identify potential shortcuts during inference. Our findings provide valuable insights into the various generation phases during inference and shed light on the Unets' usage patterns across these phases. Leveraging these insights, we identify redundancies in GLIDE (an improved DDPM) and improve inference time by ~27% with minimal degradation in output quality. Our ultimate goal is to guide more informed optimization strategies for inference and influence new model designs.

Latents2Semantics: Leveraging the Latent Space of Generative Models for Localized Style Manipulation of Face Images

  • Authors: Authors: Snehal Singh Tomar, A.N. Rajagopalan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.15037
  • Pdf link: https://arxiv.org/pdf/2312.15037
  • Abstract With the metaverse slowly becoming a reality and given the rapid pace of developments toward the creation of digital humans, the need for a principled style editing pipeline for human faces is bound to increase manifold. We cater to this need by introducing the Latents2Semantics Autoencoder (L2SAE), a Generative Autoencoder model that facilitates highly localized editing of style attributes of several Regions of Interest (ROIs) in face images. The L2SAE learns separate latent representations for encoded images' structure and style information. Thus, allowing for structure-preserving style editing of the chosen ROIs. The encoded structure representation is a multichannel 2D tensor with reduced spatial dimensions, which captures both local and global structure properties. The style representation is a 1D tensor that captures global style attributes. In our framework, we slice the structure representation to build strong and disentangled correspondences with different ROIs. Consequentially, style editing of the chosen ROIs amounts to a simple combination of (a) the ROI-mask generated from the sliced structure representation and (b) the decoded image with global style changes, generated from the manipulated (using Gaussian noise) global style and unchanged structure tensor. Style editing sans additional human supervision is a significant win over SOTA style editing pipelines because most existing works require additional human effort (supervision) post-training for attributing semantic meaning to style edits. We also do away with iterative-optimization-based inversion or determining controllable latent directions post-training, which requires additional computationally expensive operations. We provide qualitative and quantitative results for the same over multiple applications, such as selective style editing and swapping using test images sampled from several datasets.

Fix-Con: Automatic Fault Localization and Repair of Deep Learning Model Conversions

  • Authors: Authors: Nikolaos Louloudakis, Perry Gibson, José Cano, Ajitha Rajan
  • Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.15101
  • Pdf link: https://arxiv.org/pdf/2312.15101
  • Abstract Converting deep learning models between frameworks is a common step to maximize model compatibility across devices and leverage optimization features that may be exclusively provided in one deep learning framework. However, this conversion process may be riddled with bugs, making the converted models either undeployable or problematic, considerably degrading their prediction correctness. We propose an automated approach for fault localization and repair, Fix-Con, during model conversion between deep learning frameworks. Fix-Con is capable of detecting and fixing faults introduced in model input, parameters, hyperparameters, and the model graph during conversion. Fix-Con uses a set of fault types mined from surveying conversion issues raised to localize potential conversion faults in the converted target model, and then repairs them appropriately, e.g. replacing the parameters of the target model with those from the source model. This is done iteratively for every image in the dataset with output label differences between the source model and the converted target model until all differences are resolved. We evaluate the effectiveness of Fix-Con in fixing model conversion bugs of three widely used image recognition models converted across four different deep learning frameworks. Overall, Fix-Con was able to either completely repair, or significantly improve the performance of 14 out of the 15 erroneous conversion cases.

Less or More From Teacher: Exploiting Trilateral Geometry For Knowledge Distillation

  • Authors: Authors: Chengming Hu, Haolun Wu, Xuan Li, Chen Ma, Xi Chen, Jun Yan, Boyu Wang, Xue Liu
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.15112
  • Pdf link: https://arxiv.org/pdf/2312.15112
  • Abstract Knowledge distillation aims to train a compact student network using soft supervision from a larger teacher network and hard supervision from ground truths. However, determining an optimal knowledge fusion ratio that balances these supervisory signals remains challenging. Prior methods generally resort to a constant or heuristic-based fusion ratio, which often falls short of a proper balance. In this study, we introduce a novel adaptive method for learning a sample-wise knowledge fusion ratio, exploiting both the correctness of teacher and student, as well as how well the student mimics the teacher on each sample. Our method naturally leads to the intra-sample trilateral geometric relations among the student prediction ($S$), teacher prediction ($T$), and ground truth ($G$). To counterbalance the impact of outliers, we further extend to the inter-sample relations, incorporating the teacher's global average prediction $\bar{T}$ for samples within the same class. A simple neural network then learns the implicit mapping from the intra- and inter-sample relations to an adaptive, sample-wise knowledge fusion ratio in a bilevel-optimization manner. Our approach provides a simple, practical, and adaptable solution for knowledge distillation that can be employed across various architectures and model sizes. Extensive experiments demonstrate consistent improvements over other loss re-weighting methods on image classification, attack detection, and click-through rate prediction.

Gradient Shaping for Multi-Constraint Safe Reinforcement Learning

  • Authors: Authors: Yihang Yao, Zuxin Liu, Zhepeng Cen, Peide Huang, Tingnan Zhang, Wenhao Yu, Ding Zhao
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.15127
  • Pdf link: https://arxiv.org/pdf/2312.15127
  • Abstract Online safe reinforcement learning (RL) involves training a policy that maximizes task efficiency while satisfying constraints via interacting with the environments. In this paper, our focus lies in addressing the complex challenges associated with solving multi-constraint (MC) safe RL problems. We approach the safe RL problem from the perspective of Multi-Objective Optimization (MOO) and propose a unified framework designed for MC safe RL algorithms. This framework highlights the manipulation of gradients derived from constraints. Leveraging insights from this framework and recognizing the significance of \textit{redundant} and \textit{conflicting} constraint conditions, we introduce the Gradient Shaping (GradS) method for general Lagrangian-based safe RL algorithms to improve the training efficiency in terms of both reward and constraint satisfaction. Our extensive experimentation demonstrates the effectiveness of our proposed method in encouraging exploration and learning a policy that improves both safety and reward performance across various challenging MC safe RL tasks as well as good scalability to the number of constraints.

Scale Optimization Using Evolutionary Reinforcement Learning for Object Detection on Drone Imagery

  • Authors: Authors: Jialu Zhang, Xiaoying Yang, Wentao He, Jianfeng Ren, Qian Zhang, Titian Zhao, Ruibin Bai, Xiangjian He, Jiang Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.15219
  • Pdf link: https://arxiv.org/pdf/2312.15219
  • Abstract Object detection in aerial imagery presents a significant challenge due to large scale variations among objects. This paper proposes an evolutionary reinforcement learning agent, integrated within a coarse-to-fine object detection framework, to optimize the scale for more effective detection of objects in such images. Specifically, a set of patches potentially containing objects are first generated. A set of rewards measuring the localization accuracy, the accuracy of predicted labels, and the scale consistency among nearby patches are designed in the agent to guide the scale optimization. The proposed scale-consistency reward ensures similar scales for neighboring objects of the same category. Furthermore, a spatial-semantic attention mechanism is designed to exploit the spatial semantic relations between patches. The agent employs the proximal policy optimization strategy in conjunction with the evolutionary strategy, effectively utilizing both the current patch status and historical experience embedded in the agent. The proposed model is compared with state-of-the-art methods on two benchmark datasets for object detection on drone imagery. It significantly outperforms all the compared methods.

A Survey on Large Language Models for Software Engineering

  • Authors: Authors: Quanjun Zhang, Chunrong Fang, Yang Xie, Yaxin Zhang, Yun Yang, Weisong Sun, Shengcheng Yu, Zhenyu Chen
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2312.15223
  • Pdf link: https://arxiv.org/pdf/2312.15223
  • Abstract Software Engineering (SE) is the systematic design, development, and maintenance of software applications, underpinning the digital infrastructure of our modern mainworld. Very recently, the SE community has seen a rapidly increasing number of techniques employing Large Language Models (LLMs) to automate a broad range of SE tasks. Nevertheless, existing information of the applications, effects, and possible limitations of LLMs within SE is still not well-studied. In this paper, we provide a systematic survey to summarize the current state-of-the-art research in the LLM-based SE community. We summarize 30 representative LLMs of Source Code across three model architectures, 15 pre-training objectives across four categories, and 16 downstream tasks across five categories. We then present a detailed summarization of the recent SE studies for which LLMs are commonly utilized, including 155 studies for 43 specific code-related tasks across four crucial phases within the SE workflow. Besides, we summarize existing attempts to empirically evaluate LLMs in SE, such as benchmarks, empirical studies, and exploration of SE education. We also discuss several critical aspects of optimization and applications of LLMs in SE, such as security attacks, model tuning, and model compression. Finally, we highlight several challenges and potential opportunities on applying LLMs for future SE studies, such as exploring domain LLMs and constructing clean evaluation datasets. Overall, our work can help researchers gain a comprehensive understanding about the achievements of the existing LLM-based SE studies and promote the practical application of these techniques. Our artifacts are publicly available and will continuously updated at the living repository: \url{https://github.com/iSEngLab/AwesomeLLM4SE}.

Regularized PolyKervNets: Optimizing Expressiveness and Efficiency for Private Inference in Deep Neural Networks

  • Authors: Authors: Toluwani Aremu
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.15229
  • Pdf link: https://arxiv.org/pdf/2312.15229
  • Abstract Private computation of nonlinear functions, such as Rectified Linear Units (ReLUs) and max-pooling operations, in deep neural networks (DNNs) poses significant challenges in terms of storage, bandwidth, and time consumption. To address these challenges, there has been a growing interest in utilizing privacy-preserving techniques that leverage polynomial activation functions and kernelized convolutions as alternatives to traditional ReLUs. However, these alternative approaches often suffer from a trade-off between achieving faster private inference (PI) and sacrificing model accuracy. In particular, when applied to much deeper networks, these methods encounter training instabilities, leading to issues like exploding gradients (resulting in NaNs) or suboptimal approximations. In this study, we focus on PolyKervNets, a technique known for offering improved dynamic approximations in smaller networks but still facing instabilities in larger and more complex networks. Our primary objective is to empirically explore optimization-based training recipes to enhance the performance of PolyKervNets in larger networks. By doing so, we aim to potentially eliminate the need for traditional nonlinear activation functions, thereby advancing the state-of-the-art in privacy-preserving deep neural network architectures. Code can be found on GitHub at: \url{https://github.com/tolusophy/PolyKervNets/}

Sensing-Enhanced Secure Communication: Joint Time Allocation and Beamforming Design

  • Authors: Authors: Dongfang Xu, Yiming Xu, Zhiqiang Wei, Shenghui Song, Derrick Wing Kwan Ng
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2312.15231
  • Pdf link: https://arxiv.org/pdf/2312.15231
  • Abstract The integration of sensing and communication enables wireless communication systems to serve environment-aware applications. In this paper, we propose to leverage sensing to enhance physical layer security (PLS) in multiuser communication systems in the presence of a suspicious target. To this end, we develop a two-phase framework to first estimate the location of the potential eavesdropper by sensing and then utilize the estimated information to enhance PLS for communication. In particular, in the first phase, a dual-functional radar and communication (DFRC) base station (BS) exploits a sensing signal to mitigate the sensing information uncertainty of the potential eavesdropper. Then, in the second phase, to facilitate joint sensing and secure communication, the DFRC BS employs beamforming and artificial noise to enhance secure communication. The design objective is to maximize the system sum rate while alleviating the information leakage by jointly optimizing the time allocation and beamforming policy. Capitalizing on monotonic optimization theory, we develop a two-layer globally optimal algorithm to reveal the performance upper bound of the considered system. Simulation results show that the proposed scheme achieves a significant sum rate gain over two baseline schemes that adopt existing techniques. Moreover, our results unveil that ISAC is a promising paradigm for enhancing secure communication in wireless networks.

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

  • Authors: Authors: Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi Jin, Tianqi Chen, Zhihao Jia
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2312.15234
  • Pdf link: https://arxiv.org/pdf/2312.15234
  • Abstract In the rapidly evolving landscape of artificial intelligence (AI), generative large language models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However, the computational intensity and memory consumption of deploying these models present substantial challenges in terms of serving efficiency, particularly in scenarios demanding low latency and high throughput. This survey addresses the imperative need for efficient LLM serving methodologies from a machine learning system (MLSys) research perspective, standing at the crux of advanced AI innovations and practical system optimizations. We provide in-depth analysis, covering a spectrum of solutions, ranging from cutting-edge algorithmic modifications to groundbreaking changes in system designs. The survey aims to provide a comprehensive understanding of the current state and future directions in efficient LLM serving, offering valuable insights for researchers and practitioners in overcoming the barriers of effective LLM deployment, thereby reshaping the future of AI.

Fluid Antenna Array Enhanced Over-the-Air Computation

  • Authors: Authors: Deyou Zhang, Sicong Ye, Ming Xiao, Kezhi Wang, Marco Di Renzo, Mikael Skoglund
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2312.15244
  • Pdf link: https://arxiv.org/pdf/2312.15244
  • Abstract Over-the-air computation (AirComp) has emerged as a promising technology for fast wireless data aggregation by harnessing the superposition property of wireless multiple-access channels. This paper investigates a fluid antenna (FA) array-enhanced AirComp system, employing the new degrees of freedom achieved by antenna movements. Specifically, we jointly optimize the transceiver design and antenna position vector (APV) to minimize the mean squared error (MSE) between target and estimated function values. To tackle the resulting highly non-convex problem, we adopt an alternating optimization technique to decompose it into three subproblems. These subproblems are then iteratively solved until convergence, leading to a locally optimal solution. Numerical results show that FA arrays with the proposed transceiver and APV design significantly outperform the traditional fixed-position antenna arrays in terms of MSE.

Efficient Deformable Tissue Reconstruction via Orthogonal Neural Plane

  • Authors: Authors: Chen Yang, Kailing Wang, Yuehao Wang, Qi Dou, Xiaokang Yang, Wei Shen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2312.15253
  • Pdf link: https://arxiv.org/pdf/2312.15253
  • Abstract Intraoperative imaging techniques for reconstructing deformable tissues in vivo are pivotal for advanced surgical systems. Existing methods either compromise on rendering quality or are excessively computationally intensive, often demanding dozens of hours to perform, which significantly hinders their practical application. In this paper, we introduce Fast Orthogonal Plane (Forplane), a novel, efficient framework based on neural radiance fields (NeRF) for the reconstruction of deformable tissues. We conceptualize surgical procedures as 4D volumes, and break them down into static and dynamic fields comprised of orthogonal neural planes. This factorization iscretizes the four-dimensional space, leading to a decreased memory usage and faster optimization. A spatiotemporal importance sampling scheme is introduced to improve performance in regions with tool occlusion as well as large motions and accelerate training. An efficient ray marching method is applied to skip sampling among empty regions, significantly improving inference speed. Forplane accommodates both binocular and monocular endoscopy videos, demonstrating its extensive applicability and flexibility. Our experiments, carried out on two in vivo datasets, the EndoNeRF and Hamlyn datasets, demonstrate the effectiveness of our framework. In all cases, Forplane substantially accelerates both the optimization process (by over 100 times) and the inference process (by over 15 times) while maintaining or even improving the quality across a variety of non-rigid deformations. This significant performance improvement promises to be a valuable asset for future intraoperative surgical applications. The code of our project is now available at https://github.com/Loping151/ForPlane.

Self-Supervised Depth Completion Guided by 3D Perception and Geometry Consistency

  • Authors: Authors: Yu Cai, Tianyu Shen, Shi-Sheng Huang, Hua Huang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.15263
  • Pdf link: https://arxiv.org/pdf/2312.15263
  • Abstract Depth completion, aiming to predict dense depth maps from sparse depth measurements, plays a crucial role in many computer vision related applications. Deep learning approaches have demonstrated overwhelming success in this task. However, high-precision depth completion without relying on the ground-truth data, which are usually costly, still remains challenging. The reason lies on the ignorance of 3D structural information in most previous unsupervised solutions, causing inaccurate spatial propagation and mixed-depth problems. To alleviate the above challenges, this paper explores the utilization of 3D perceptual features and multi-view geometry consistency to devise a high-precision self-supervised depth completion method. Firstly, a 3D perceptual spatial propagation algorithm is constructed with a point cloud representation and an attention weighting mechanism to capture more reasonable and favorable neighboring features during the iterative depth propagation process. Secondly, the multi-view geometric constraints between adjacent views are explicitly incorporated to guide the optimization of the whole depth completion model in a self-supervised manner. Extensive experiments on benchmark datasets of NYU-Depthv2 and VOID demonstrate that the proposed model achieves the state-of-the-art depth completion performance compared with other unsupervised methods, and competitive performance compared with previous supervised methods.

Ultra Reliable Low Latency Routing in LEO Satellite Constellations: A Stochastic Geometry Approach

  • Authors: Authors: Ruibo Wang, Mustafa A. Kishk, Mohamed-Slim Alouini
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2312.15281
  • Pdf link: https://arxiv.org/pdf/2312.15281
  • Abstract In recent years, LEO satellite constellations have become envisioned as a core component of the next-generation wireless communication networks. The successive establishment of mega satellite constellations has triggered further demands for satellite communication advanced features: high reliability and low latency. In this article, we first establish a multi-objective optimization problem that simultaneously maximizes reliability and minimizes latency, then we solve it by two methods. According to the optimal solution, ideal upper bounds for reliability and latency performance of LEO satellite routing can be derived. Next, we design an algorithm for relay satellite subset selection, which can approach the ideal upper bounds in terms of performance. Furthermore, we derive analytical expressions for satellite availability, coverage probability, and latency under the stochastic geometry (SG) framework, and the accuracy is verified by Monte Carlo simulation. In the numerical results, we study the routing performance of three existing mega constellations and the impact of different constellation parameter configurations on performance. By comparing with existing routing strategies, we demonstrate the advantages of our proposed routing strategy and extend the scope of our research.

Optimal Decision Tree with Noisy Outcomes

  • Authors: Authors: Su Jia, Fatemeh Navidi, Viswanath Nagarajan, R. Ravi
  • Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2312.15357
  • Pdf link: https://arxiv.org/pdf/2312.15357
  • Abstract In pool-based active learning, the learner is given an unlabeled data set and aims to efficiently learn the unknown hypothesis by querying the labels of the data points. This can be formulated as the classical Optimal Decision Tree (ODT) problem: Given a set of tests, a set of hypotheses, and an outcome for each pair of test and hypothesis, our objective is to find a low-cost testing procedure (i.e., decision tree) that identifies the true hypothesis. This optimization problem has been extensively studied under the assumption that each test generates a deterministic outcome. However, in numerous applications, for example, clinical trials, the outcomes may be uncertain, which renders the ideas from the deterministic setting invalid. In this work, we study a fundamental variant of the ODT problem in which some test outcomes are noisy, even in the more general case where the noise is persistent, i.e., repeating a test gives the same noisy output. Our approximation algorithms provide guarantees that are nearly best possible and hold for the general case of a large number of noisy outcomes per test or per hypothesis where the performance degrades continuously with this number. We numerically evaluated our algorithms for identifying toxic chemicals and learning linear classifiers, and observed that our algorithms have costs very close to the information-theoretic minimum.

Battery-Care Resource Allocation and Task Offloading in Multi-Agent Post-Disaster MEC Environment

  • Authors: Authors: Yiwei Tang, Hualong Huang, Wenhan Zhan, Geyong Min, Zhekai Duan, Yuchuan Lei
  • Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2312.15380
  • Pdf link: https://arxiv.org/pdf/2312.15380
  • Abstract Being an up-and-coming application scenario of mobile edge computing (MEC), the post-disaster rescue suffers multitudinous computing-intensive tasks but unstably guaranteed network connectivity. In rescue environments, quality of service (QoS), such as task execution delay, energy consumption and battery state of health (SoH), is of significant meaning. This paper studies a multi-user post-disaster MEC environment with unstable 5G communication, where device-to-device (D2D) link communication and dynamic voltage and frequency scaling (DVFS) are adopted to balance each user's requirement for task delay and energy consumption. A battery degradation evaluation approach to prolong battery lifetime is also presented. The distributed optimization problem is formulated into a mixed cooperative-competitive (MCC) multi-agent Markov decision process (MAMDP) and is tackled with recurrent multi-agent Proximal Policy Optimization (rMAPPO). Extensive simulations and comprehensive comparisons with other representative algorithms clearly demonstrate the effectiveness of the proposed rMAPPO-based offloading scheme.

Enhancing Computation Pushdown for Cloud OLAP Databases

  • Authors: Authors: Yifei Yang, Xiangyao Yu, Marco Serafini, Ashraf Aboulnaga, Michael Stonebraker
  • Subjects: Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2312.15405
  • Pdf link: https://arxiv.org/pdf/2312.15405
  • Abstract Network is a major bottleneck in modern cloud databases that adopt a storage-disaggregation architecture. Computation pushdown is a promising solution to tackle this issue, which offloads some computation tasks to the storage layer to reduce network traffic. Existing cloud OLAP systems statically decide whether to push down computation during the query optimization phase and do not consider the storage layer's computational capacity and load. Besides, there is a lack of a general principle that determines which operators are amenable for pushdown. Existing systems design and implement pushdown features empirically, which ends up picking a limited set of pushdown operators respectively. In this paper, we first design Adaptive pushdown as a new mechanism to avoid throttling the storage-layer computation during pushdown, which pushes the request back to the computation layer at runtime if the storage-layer computational resource is insufficient. Moreover, we derive a general principle to identify pushdown-amenable computational tasks, by summarizing common patterns of pushdown capabilities in existing systems. We propose two new pushdown operators, namely, selection bitmap and distributed data shuffle. Evaluation results on TPC-H show that Adaptive pushdown can achieve up to 1.9x speedup over both No pushdown and Eager pushdown baselines, and the new pushdown operators can further accelerate query execution by up to 3.0x.

CARSS: Cooperative Attention-guided Reinforcement Subpath Synthesis for Solving Traveling Salesman Problem

  • Authors: Authors: Yuchen Shi, Congying Han, Tiande Guo
  • Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2312.15412
  • Pdf link: https://arxiv.org/pdf/2312.15412
  • Abstract This paper introduces CARSS (Cooperative Attention-guided Reinforcement Subpath Synthesis), a novel approach to address the Traveling Salesman Problem (TSP) by leveraging cooperative Multi-Agent Reinforcement Learning (MARL). CARSS decomposes the TSP solving process into two distinct yet synergistic steps: "subpath generation" and "subpath merging." In the former, a cooperative MARL framework is employed to iteratively generate subpaths using multiple agents. In the latter, these subpaths are progressively merged to form a complete cycle. The algorithm's primary objective is to enhance efficiency in terms of training memory consumption, testing time, and scalability, through the adoption of a multi-agent divide and conquer paradigm. Notably, attention mechanisms play a pivotal role in feature embedding and parameterization strategies within CARSS. The training of the model is facilitated by the independent REINFORCE algorithm. Empirical experiments reveal CARSS's superiority compared to single-agent alternatives: it demonstrates reduced GPU memory utilization, accommodates training graphs nearly 2.5 times larger, and exhibits the potential for scaling to even more extensive problem sizes. Furthermore, CARSS substantially reduces testing time and optimization gaps by approximately 50% for TSP instances of up to 1000 vertices, when compared to standard decoding methods.

Generalizing SDP-Based Barrier Certificate Synthesis to Unbounded Domains by Dropping Archimedean Condition

  • Authors: Authors: Hao Wu, Shenghua Feng, Ting Gan, Jie Wang, Bican Xia, Naijun Zhan
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2312.15416
  • Pdf link: https://arxiv.org/pdf/2312.15416
  • Abstract Barrier certificates, which serve as differential invariants that witness system safety, play a crucial role in the verification of cyber-physical systems (CPS). Prevailing computational methods for synthesizing barrier certificates are based on semidefinite programming (SDP) by exploiting Putinar Positivstellensatz. Consequently, these approaches are limited by Archimedean condition, which requires all variables to be bounded, i.e., systems are defined over bounded domains. For the unbounded case, unfortunately, these methods become conservative and even fail to identify potential barrier certificates. In this paper, we address this limitation by presenting a new computational method. The main technique we use is the homogenization approach, which was proposed in optimization community recently, to transform an unbounded optimization problem to a bounded one. Our method can be applied to various definitions of barrier certificates, thus expanding the scope of barrier certificate synthesis in the general sense. Experimental results demonstrate that our approach is more effective while maintaining a comparable level of efficiency.

Semi-Bandit Learning for Monotone Stochastic Optimization

  • Authors: Authors: Arpit Agarwal, Rohan Ghuge, Viswanath Nagarajan
  • Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2312.15427
  • Pdf link: https://arxiv.org/pdf/2312.15427
  • Abstract Stochastic optimization is a widely used approach for optimization under uncertainty, where uncertain input parameters are modeled by random variables. Exact or approximation algorithms have been obtained for several fundamental problems in this area. However, a significant limitation of this approach is that it requires full knowledge of the underlying probability distributions. Can we still get good (approximation) algorithms if these distributions are unknown, and the algorithm needs to learn them through repeated interactions? In this paper, we resolve this question for a large class of "monotone" stochastic problems, by providing a generic online learning algorithm with $\sqrt{T \log T}$ regret relative to the best approximation algorithm (under known distributions). Importantly, our online algorithm works in a semi-bandit setting, where in each period, the algorithm only observes samples from the r.v.s that were actually probed. Our framework applies to several fundamental problems in stochastic optimization such as prophet inequality, Pandora's box, stochastic knapsack, stochastic matchings and stochastic submodular optimization.

Residual Learning for Image Point Descriptors

  • Authors: Authors: Rashik Shrestha, Ajad Chhatkuli, Menelaos Kanakis, Luc Van Gool
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2312.15471
  • Pdf link: https://arxiv.org/pdf/2312.15471
  • Abstract Local image feature descriptors have had a tremendous impact on the development and application of computer vision methods. It is therefore unsurprising that significant efforts are being made for learning-based image point descriptors. However, the advantage of learned methods over handcrafted methods in real applications is subtle and more nuanced than expected. Moreover, handcrafted descriptors such as SIFT and SURF still perform better point localization in Structure-from-Motion (SfM) compared to many learned counterparts. In this paper, we propose a very simple and effective approach to learning local image descriptors by using a hand-crafted detector and descriptor. Specifically, we choose to learn only the descriptors, supported by handcrafted descriptors while discarding the point localization head. We optimize the final descriptor by leveraging the knowledge already present in the handcrafted descriptor. Such an approach of optimization allows us to discard learning knowledge already present in non-differentiable functions such as the hand-crafted descriptors and only learn the residual knowledge in the main network branch. This offers 50X convergence speed compared to the standard baseline architecture of SuperPoint while at inference the combined descriptor provides superior performance over the learned and hand-crafted descriptors. This is done with minor increase in the computations over the baseline learned descriptor. Our approach has potential applications in ensemble learning and learning with non-differentiable functions. We perform experiments in matching, camera localization and Structure-from-Motion in order to showcase the advantages of our approach.

Guess What Quantum Computing Can Do for Test Case Optimization

  • Authors: Authors: Xinyi Wang, Shaukat Ali, Tao Yue, Paolo Arcaini
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2312.15547
  • Pdf link: https://arxiv.org/pdf/2312.15547
  • Abstract In the near term, quantum approximate optimization algorithms (QAOAs) hold great potential to solve combinatorial optimization problems. These are hybrid algorithms, i.e., a combination of quantum and classical algorithms. Several proof-of-concept applications of QAOAs for solving combinatorial problems, such as portfolio optimization, energy optimization in power systems, and job scheduling, have been demonstrated. However, whether QAOAs can efficiently solve optimization problems from classical software engineering, such as test optimization, remains unstudied. To this end, we present the first effort to formulate a software test case optimization problem as a QAOA problem and solve it on quantum computer simulators. To solve bigger test optimization problems that require many qubits, which are unavailable these days, we integrate a problem decomposition strategy with the QAOA. We performed an empirical evaluation with five test case optimization problems and four industrial datasets from ABB, Google, and Orona to compare various configurations of our approach, assess its decomposition strategy of handling large datasets, and compare its performance with classical algorithms (i.e., Genetic Algorithm (GA) and Random Search). Based on the evaluation results, we recommend the best configuration of our approach for test case optimization problems. Also, we demonstrate that our strategy can reach the same effectiveness as GA and outperform GA in two out of five test case optimization problems we conducted.

What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning

  • Authors: Authors: Wei Liu, Weihao Zeng, Keqing He, Yong Jiang, Junxian He
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.15685
  • Pdf link: https://arxiv.org/pdf/2312.15685
  • Abstract Instruction tuning is a standard technique employed to align large language models to end tasks and user preferences after the initial pretraining phase. Recent research indicates the critical role of data engineering in instruction tuning -- when appropriately selected, only limited data is necessary to achieve superior performance. However, we still lack a principled understanding of what makes good instruction tuning data for alignment, and how we should select data automatically and effectively. In this work, we delve deeply into automatic data selection strategies for alignment. We start with controlled studies to measure data across three dimensions: complexity, quality, and diversity, along which we examine existing methods and introduce novel techniques for enhanced data measurement. Subsequently, we propose a simple strategy to select data samples based on the measurement. We present deita (short for Data-Efficient Instruction Tuning for Alignment), a series of models fine-tuned from LLaMA and Mistral models using data samples automatically selected with our proposed approach. Empirically, deita performs better or on par with the state-of-the-art open-source alignment models with only 6K SFT training data samples -- over 10x less than the data used in the baselines. When further trained with direct preference optimization (DPO), deita-Mistral-7B + DPO trained with 6K SFT and 10K DPO samples achieve 7.55 MT-Bench and 90.06% AlpacaEval scores. We anticipate this work to provide tools on automatic data selection, facilitating data-efficient alignment. We release our models as well as the selected datasets for future researches to effectively align models more efficiently.

TimesURL: Self-supervised Contrastive Learning for Universal Time Series Representation Learning

  • Authors: Authors: Jiexi Liu, Songcan Chen
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.15709
  • Pdf link: https://arxiv.org/pdf/2312.15709
  • Abstract Learning universal time series representations applicable to various types of downstream tasks is challenging but valuable in real applications. Recently, researchers have attempted to leverage the success of self-supervised contrastive learning (SSCL) in Computer Vision(CV) and Natural Language Processing(NLP) to tackle time series representation. Nevertheless, due to the special temporal characteristics, relying solely on empirical guidance from other domains may be ineffective for time series and difficult to adapt to multiple downstream tasks. To this end, we review three parts involved in SSCL including 1) designing augmentation methods for positive pairs, 2) constructing (hard) negative pairs, and 3) designing SSCL loss. For 1) and 2), we find that unsuitable positive and negative pair construction may introduce inappropriate inductive biases, which neither preserve temporal properties nor provide sufficient discriminative features. For 3), just exploring segment- or instance-level semantics information is not enough for learning universal representation. To remedy the above issues, we propose a novel self-supervised framework named TimesURL. Specifically, we first introduce a frequency-temporal-based augmentation to keep the temporal property unchanged. And then, we construct double Universums as a special kind of hard negative to guide better contrastive learning. Additionally, we introduce time reconstruction as a joint optimization objective with contrastive learning to capture both segment-level and instance-level information. As a result, TimesURL can learn high-quality universal representations and achieve state-of-the-art performance in 6 different downstream tasks, including short- and long-term forecasting, imputation, classification, anomaly detection and transfer learning.

Frequency-Domain Identification of Discrete-Time Systems using Sum-of-Rational Optimization

  • Authors: Authors: Mohamed Abdalmoaty, Jared Miller, Mingzhou Yin, Roy S. Smith
  • Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2312.15722
  • Pdf link: https://arxiv.org/pdf/2312.15722
  • Abstract We propose a computationally tractable method for the identification of stable canonical discrete-time rational transfer function models, using frequency domain data. The problem is formulated as a global non-convex optimization problem whose objective function is the sum of weighted squared residuals at each observed frequency datapoint. Stability is enforced using a polynomial matrix inequality constraint. The problem is solved globally by a moment-sum-of-squares hierarchy of semidefinite programs through a framework for sum-of-rational-functions optimization. Convergence of the moment-sum-of-squares program is guaranteed as the bound on the degree of the sum-of-squares polynomials approaches infinity. The performance of the proposed method is demonstrated using numerical simulation examples.

Improving the Accuracy and Interpretability of Neural Networks for Wind Power Forecasting

  • Authors: Authors: Wenlong Liao, Fernando Porte-Agel, Jiannong Fang, Birgitte Bak-Jensen, Zhe Yang, Gonghao Zhang
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.15741
  • Pdf link: https://arxiv.org/pdf/2312.15741
  • Abstract Deep neural networks (DNNs) are receiving increasing attention in wind power forecasting due to their ability to effectively capture complex patterns in wind data. However, their forecasted errors are severely limited by the local optimal weight issue in optimization algorithms, and their forecasted behavior also lacks interpretability. To address these two challenges, this paper firstly proposes simple but effective triple optimization strategies (TriOpts) to accelerate the training process and improve the model performance of DNNs in wind power forecasting. Then, permutation feature importance (PFI) and local interpretable model-agnostic explanation (LIME) techniques are innovatively presented to interpret forecasted behaviors of DNNs, from global and instance perspectives. Simulation results show that the proposed TriOpts not only drastically improve the model generalization of DNNs for both the deterministic and probabilistic wind power forecasting, but also accelerate the training process. Besides, the proposed PFI and LIME techniques can accurately estimate the contribution of each feature to wind power forecasting, which helps to construct feature engineering and understand how to obtain forecasted values for a given sample.

Dynamic MIMO Architecture Design for Near-Field Communications

  • Authors: Authors: Zheng Zhang, Yuanwei Liu, Zhaolin Wang, Jian Chen, Tony Q.S. Quek
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2312.15757
  • Pdf link: https://arxiv.org/pdf/2312.15757
  • Abstract A novel dynamic hybrid beamforming architecture is proposed to achieve the spatial multiplexing-power consumption tradeoff for near-field multiple-input multiple-output (MIMO) networks, where each radio frequency (RF) chain is connected to each antenna using a couple of independent phase shifters to reduce the number of required RF chains. Based on this architecture, an optimization problem is formulated that maximizes the sum of achievable rates while minimizing the hardware power consumption. Both continuous and discrete phase shifters are considered. 1) For continuous phase shifters, a weighted minimum mean-square error-based two-stage (WMMSE-TS) algorithm is proposed, where the same performance as the optimal fully-digital beamformer can be achieved by the proposed hybrid beamformer even if the number of RF chains equals the number of data streams. 2) For discrete phase shifters, a penalty-based layered iterative (PLI) algorithm is proposed. The closed-form analog and baseband digital beamformers are derived in each iteration. Simulation results demonstrate that: 1) the proposed dynamic beamforming architecture outperforms the conventional fixed hybrid beamforming architecture in terms of spatial multiplexing-power consumption tradeoff, and 2) the proposed algorithms achieve better performance than the other baseline schemes.

Simultaneous Optimal System and Controller Design for Multibody Systems with Joint Friction using Direct Sensitivities

  • Authors: Authors: Adwait Verulkar, Corina Sandu, Adrian Sandu, Daniel Dopico
  • Subjects: Systems and Control (eess.SY); Computational Engineering, Finance, and Science (cs.CE); Dynamical Systems (math.DS); Numerical Analysis (math.NA); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2312.15771
  • Pdf link: https://arxiv.org/pdf/2312.15771
  • Abstract Real-world multibody systems are often subject to phenomena like friction, joint clearances, and external events. These phenomena can significantly impact the optimal design of the system and its controller. This work addresses the gradient-based optimization methodology for multibody dynamic systems with joint friction using a direct sensitivity approach for gradient computation. After a thorough review of various friction models developed over the years, the Brown McPhee model has been found to be the most suitable for the study due to its accuracy for dynamic simulation and its compatibility with sensitivity analysis. The methodology supports co-design of the system and its controller, which is especially relevant for applications like robotics and servo-mechanical systems where the actuation and the design are highly dependent on each other. Numerical results are obtained using a new implementation of the MBSVT (Multi-Body Systems at Virginia Tech) software package; MBSVT 2.0 is reprogrammed in Julia for ease of implementation while maintaining high computational efficiency. Three case studies are provided to demonstrate the attractive properties of simultaneous optimal design and control approach for certain applications.

Quantum-Assisted Online Task Offloading and Resource Allocation in MEC-Enabled Satellite-Aerial-Terrestrial Integrated Networks

  • Authors: Authors: Yu Zhang, Yanmin Gong, Lei Fan, Yu Wang, Zhu Han, Yuanxiong Guo
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2312.15808
  • Pdf link: https://arxiv.org/pdf/2312.15808
  • Abstract In the era of Internet of Things (IoT), multi-access edge computing (MEC)-enabled satellite-aerial-terrestrial integrated network (SATIN) has emerged as a promising technology to provide massive IoT devices with seamless and reliable communication and computation services. This paper investigates the cooperation of low Earth orbit (LEO) satellites, high altitude platforms (HAPs), and terrestrial base stations (BSs) to provide relaying and computation services for vastly distributed IoT devices. Considering the uncertainty in dynamic SATIN systems, we formulate a stochastic optimization problem to minimize the time-average expected service delay by jointly optimizing resource allocation and task offloading while satisfying the energy constraints. To solve the formulated problem, we first develop a Lyapunov-based online control algorithm to decompose it into multiple one-slot problems. Since each one-slot problem is a large-scale mixed-integer nonlinear program (MINLP) that is intractable for classical computers, we further propose novel hybrid quantum-classical generalized Benders' decomposition (HQCGBD) algorithms to solve the problem efficiently by leveraging quantum advantages in parallel computing. Numerical results validate the effectiveness of the proposed MEC-enabled SATIN schemes.

BalMCTS: Balancing Objective Function and Search Nodes in MCTS for Constraint Optimization Problems

  • Authors: Authors: Yingkai Xiao, Jingjin Liu, Hankz Hankui Zhuo
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2312.15864
  • Pdf link: https://arxiv.org/pdf/2312.15864
  • Abstract Constraint Optimization Problems (COP) pose intricate challenges in combinatorial problems usually addressed through Branch and Bound (B&B) methods, which involve maintaining priority queues and iteratively selecting branches to search for solutions. However, conventional approaches take a considerable amount of time to find optimal solutions, and it is also crucial to quickly identify a near-optimal feasible solution in a shorter time. In this paper, we aim to investigate the effectiveness of employing a depth-first search algorithm for solving COP, specifically focusing on identifying optimal or near-optimal solutions within top $n$ solutions. Hence, we propose a novel heuristic neural network algorithm based on MCTS, which, by simultaneously conducting search and training, enables the neural network to effectively serve as a heuristic during Backtracking. Furthermore, our approach incorporates encoding COP problems and utilizing graph neural networks to aggregate information about variables and constraints, offering more appropriate variables for assignments. Experimental results on stochastic COP instances demonstrate that our method identifies feasible solutions with a gap of less than 17.63% within the initial 5 feasible solutions. Moreover, when applied to attendant Constraint Satisfaction Problem (CSP) instances, our method exhibits a remarkable reduction of less than 5% in searching nodes compared to state-of-the-art approaches.

Black-Box Tuning of Vision-Language Models with Effective Gradient Approximation

  • Authors: Authors: Zixian Guo, Yuxiang Wei, Ming Liu, Zhilong Ji, Jinfeng Bai, Yiwen Guo, Wangmeng Zuo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.15901
  • Pdf link: https://arxiv.org/pdf/2312.15901
  • Abstract Parameter-efficient fine-tuning (PEFT) methods have provided an effective way for adapting large vision-language models to specific tasks or scenarios. Typically, they learn a very small scale of parameters for pre-trained models in a white-box formulation, which assumes model architectures to be known and parameters to be accessible. However, large models are often not open-source due to considerations of preventing abuse or commercial factors, hence posing a barrier to the deployment of white-box PEFT methods. To alleviate the dependence on model accessibility, we introduce collaborative black-box tuning (CBBT) for both textual prompt optimization and output feature adaptation for black-box models. Specifically, considering that the backpropagation gradients are blocked, we approximate the gradients of textual prompts by analyzing the predictions with perturbed prompts. Secondly, a lightweight adapter is deployed over the output feature of the inaccessible model, further facilitating the model adaptation process. Empowered with these designs, our CBBT is extensively evaluated on eleven downstream benchmarks and achieves remarkable improvements compared to existing black-box VL adaptation methods. Code is released at https://github.com/guozix/cbbt.

Cross Initialization for Personalized Text-to-Image Generation

  • Authors: Authors: Lianyu Pang, Jian Yin, Haoran Xie, Qiping Wang, Qing Li, Xudong Mao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.15905
  • Pdf link: https://arxiv.org/pdf/2312.15905
  • Abstract Recently, there has been a surge in face personalization techniques, benefiting from the advanced capabilities of pretrained text-to-image diffusion models. Among these, a notable method is Textual Inversion, which generates personalized images by inverting given images into textual embeddings. However, methods based on Textual Inversion still struggle with balancing the trade-off between reconstruction quality and editability. In this study, we examine this issue through the lens of initialization. Upon closely examining traditional initialization methods, we identified a significant disparity between the initial and learned embeddings in terms of both scale and orientation. The scale of the learned embedding can be up to 100 times greater than that of the initial embedding. Such a significant change in the embedding could increase the risk of overfitting, thereby compromising the editability. Driven by this observation, we introduce a novel initialization method, termed Cross Initialization, that significantly narrows the gap between the initial and learned embeddings. This method not only improves both reconstruction and editability but also reduces the optimization steps from 5000 to 320. Furthermore, we apply a regularization term to keep the learned embedding close to the initial embedding. We show that when combined with Cross Initialization, this regularization term can effectively improve editability. We provide comprehensive empirical evidence to demonstrate the superior performance of our method compared to the baseline methods. Notably, in our experiments, Cross Initialization is the only method that successfully edits an individual's facial expression. Additionally, a fast version of our method allows for capturing an input image in roughly 26 seconds, while surpassing the baseline methods in terms of both reconstruction and editability. Code will be made publicly available.

Align on the Fly: Adapting Chatbot Behavior to Established Norms

  • Authors: Authors: Chunpu Xu, Steffi Chern, Ethan Chern, Ge Zhang, Zekun Wang, Ruibo Liu, Jing Li, Jie Fu, Pengfei Liu
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2312.15907
  • Pdf link: https://arxiv.org/pdf/2312.15907
  • Abstract In this paper, we aim to align large language models with the ever-changing, complex, and diverse human values (e.g., social norms) across time and locations. This presents a challenge to existing alignment techniques, such as supervised fine-tuning, which internalize values within model parameters. To overcome this, we propose an On-the-fly Preference Optimization (OPO) method, which is a real-time alignment that works in a streaming way. It employs an external memory to store established rules for alignment, which can constrain LLMs' behaviors without further training, allowing for convenient updates and customization of human values. We also introduce a scalable evaluation to assess the proposed method more effectively. Experimental results on both human-annotated and auto-generated questions from legal and moral domains indicate the effectiveness of the proposed OPO method. Our code and data are released at https://github.com/GAIR-NLP/OPO.

Hybrid Precoder Design for Angle-of-Departure Estimation with Limited-Resolution Phase Shifters

  • Authors: Authors: Huiping Huang, Musa Furkan Keskin, Henk Wymeersch, Xuesong Cai, Linlong Wu, Johan Thunberg, Fredrik Tufvesson
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2312.15921
  • Pdf link: https://arxiv.org/pdf/2312.15921
  • Abstract Hybrid analog-digital beamforming stands out as a key enabler for future communication systems with a massive number of antennas. In this paper, we investigate the hybrid precoder design problem for angle-of-departure (AoD) estimation, where we take into account the practical constraint on the limited resolution of phase shifters. Our goal is to design a radio-frequency (RF) precoder and a base-band (BB) precoder to estimate AoD of the user with a high accuracy. To this end, we propose a two-step strategy where we first obtain the fully digital precoder that minimizes the angle error bound, and then the resulting digital precoder is decomposed into an RF precoder and a BB precoder, based on the alternating optimization and the alternating direction method of multipliers. Besides, we derive the quantization error upper bound and analyse the convergence behavior of the proposed algorithm. Numerical results demonstrate the superior performance of the proposed method over state-of-the-art baselines.

ECHO: Efficient Dataset Condensation by Higher-Order Distribution Alignment

  • Authors: Authors: Hansong Zhang, Shikun Li, Pengju Wang, Dan Zeng, Shiming Ge
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.15927
  • Pdf link: https://arxiv.org/pdf/2312.15927
  • Abstract In the era of deep learning, training deep neural networks often requires extensive data, leading to substantial costs. Dataset condensation addresses this by learning a small synthetic set that preserves essential information from the original large-scale dataset. Nowadays, optimization-oriented methods dominate dataset condensation for state-of-the-art (SOTA) results, but their computationally intensive bi-level optimization hinders practicality with large datasets. To enhance efficiency, as alternative solutions, Distribution-Matching (DM)-based methods reduce costs by aligning the representation distributions of real and synthetic examples. However, current DM-based methods still yield less comparable results to SOTA optimization-oriented methods. In this paper, we argue that existing DM-based methods overlook the higher-order alignment of the distributions, which may lead to sub-optimal matching results. Inspired by this, we propose a new DM-based method named as Efficient Dataset Condensation by Higher-Order Distribution Alignment (ECHO). Specifically, rather than only aligning the first-order moment of the representation distributions as previous methods, we learn synthetic examples via further aligning the higher-order moments of the representation distributions of real and synthetic examples based on the classical theory of reproducing kernel Hilbert space. Experiments demonstrate the proposed method achieves a significant performance boost while maintaining efficiency across various scenarios.

Optimistic and Pessimistic Actor in RL:Decoupling Exploration and Utilization

  • Authors: Authors: Jingpu Yang, Qirui Zhao, Helin Wang, Yuxiao Huang, Zirui Song, Miao Fang
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.15965
  • Pdf link: https://arxiv.org/pdf/2312.15965
  • Abstract Deep neural network(DNN) generalization is limited by the over-reliance of current offline reinforcement learning techniques on conservative processing of existing datasets. This method frequently results in algorithms that settle for suboptimal solutions that only adjust to a certain dataset. Similarly, in online reinforcement learning, the previously imposed punitive pessimism also deprives the model of its exploratory potential. Our research proposes a novel framework, Optimistic and Pessimistic Actor Reinforcement Learning (OPARL). OPARL employs a unique dual-actor approach: an optimistic actor dedicated to exploration and a pessimistic actor focused on utilization, thereby effectively differentiating between exploration and utilization strategies. This unique combination in reinforcement learning methods fosters a more balanced and efficient approach. It enables the optimization of policies that focus on actions yielding high rewards through pessimistic utilization strategies, while also ensuring extensive state coverage via optimistic exploration. Experiments and theoretical study demonstrates OPARL improves agents' capacities for application and exploration. In the most tasks of DMControl benchmark and Mujoco environment, OPARL performed better than state-of-the-art methods. Our code has released on https://github.com/yydsok/OPARL

Assigning Stationary Distributions to Sparse Stochastic Matrices

  • Authors: Authors: Nicolas Gillis, Paul Van Dooren
  • Subjects: Numerical Analysis (math.NA); Optimization and Control (math.OC); Probability (math.PR); Computation (stat.CO)
  • Arxiv link: https://arxiv.org/abs/2312.16011
  • Pdf link: https://arxiv.org/pdf/2312.16011
  • Abstract The target stationary distribution problem (TSDP) is the following: given an irreducible stochastic matrix $G$ and a target stationary distribution $\hat \mu$, construct a minimum norm perturbation, $\Delta$, such that $\hat G = G+\Delta$ is also stochastic and has the prescribed target stationary distribution, $\hat \mu$. In this paper, we revisit the TSDP under a constraint on the support of $\Delta$, that is, on the set of non-zero entries of $\Delta$. This is particularly meaningful in practice since one cannot typically modify all entries of $G$. We first show how to construct a feasible solution $\hat G$ that has essentially the same support as the matrix $G$. Then we show how to compute globally optimal and sparse solutions using the component-wise $\ell_1$ norm and linear optimization. We propose an efficient implementation that relies on a column-generation approach which allows us to solve sparse problems of size up to $10^5 \times 10^5$ in a few minutes. We illustrate the proposed algorithms with several numerical experiments.

Robust Neural Pruning with Gradient Sampling Optimization for Residual Neural Networks

  • Authors: Authors: Juyoung Yun
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.16020
  • Pdf link: https://arxiv.org/pdf/2312.16020
  • Abstract In this study, we explore an innovative approach for neural network optimization, focusing on the application of gradient sampling techniques, similar to those in StochGradAdam, during the pruning process. Our primary objective is to maintain high accuracy levels in pruned models, a critical challenge in resource-limited scenarios. Our extensive experiments reveal that models optimized with gradient sampling techniques are more effective at preserving accuracy during pruning compared to those using traditional optimization methods. This finding underscores the significance of gradient sampling in facilitating robust learning and enabling networks to retain crucial information even after substantial reduction in their complexity. We validate our approach across various datasets and neural architectures, demonstrating its broad applicability and effectiveness. The paper also delves into the theoretical aspects, explaining how gradient sampling techniques contribute to the robustness of models during pruning. Our results suggest a promising direction for creating efficient neural networks that do not compromise on accuracy, even in environments with constrained computational resources.

On the Hardness of Minimum Embedded Order Dependency Validation

  • Authors: Authors: Alejandro Ramos, Takuya Uemura, Daichi Amagata, Ryo Shirai, Takahiro Hara
  • Subjects: Databases (cs.DB)
  • Arxiv link: https://arxiv.org/abs/2312.16033
  • Pdf link: https://arxiv.org/pdf/2312.16033
  • Abstract Order Dependencies (ODs) have many applications, such as query optimization, data integration, and data cleaning. Although many works addressed the problem of discovering OD (and its variants), they do not consider datasets with missing values, a standard observation in real-world datasets. This paper introduces the novel notion of Embedded ODs (eODs) to deal with missing values. The intuition of eODs is to confirm ODs only on tuples with no missing values on a given embedding (a set of attributes). In this paper, we address the problem of validating a given eOD. If the eOD holds, we return true. Otherwise, we search for an updated embedding such that the updated eOD holds. If such embedding does not exist, we return false. A trivial requirement is to consider an embedding such that the number of ignored tuples is minimized. We show that it is NP-complete to compute such embedding. We therefore propose an efficient heuristic algorithm for validating embedded ODs. We conduct experiments on real-world datasets, and the results confirm the efficiency of our algorithm.

An extended asymmetric sigmoid with Perceptron (SIGTRON) for imbalanced linear classification

  • Authors: Authors: Hyenkyun Woo
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2312.16043
  • Pdf link: https://arxiv.org/pdf/2312.16043
  • Abstract This article presents a new polynomial parameterized sigmoid called SIGTRON, which is an extended asymmetric sigmoid with Perceptron, and its companion convex model called SIGTRON-imbalanced classification (SIC) model that employs a virtual SIGTRON-induced convex loss function. In contrast to the conventional $\pi$-weighted cost-sensitive learning model, the SIC model does not have an external $\pi$-weight on the loss function but has internal parameters in the virtual SIGTRON-induced loss function. As a consequence, when the given training dataset is close to the well-balanced condition, we show that the proposed SIC model is more adaptive to variations of the dataset, such as the inconsistency of the scale-class-imbalance ratio between the training and test datasets. This adaptation is achieved by creating a skewed hyperplane equation. Additionally, we present a quasi-Newton optimization(L-BFGS) framework for the virtual convex loss by developing an interval-based bisection line search. Empirically, we have observed that the proposed approach outperforms $\pi$-weighted convex focal loss and balanced classifier LIBLINEAR(logistic regression, SVM, and L2SVM) in terms of test classification accuracy with $51$ two-class and $67$ multi-class datasets. In binary classification problems, where the scale-class-imbalance ratio of the training dataset is not significant but the inconsistency exists, a group of SIC models with the best test accuracy for each dataset (TOP$1$) outperforms LIBSVM(C-SVC with RBF kernel), a well-known kernel-based classifier.

Algebraic Positional Encodings

  • Authors: Authors: Konstantinos Kogkalidis, Jean-Philippe Bernardy, Vikas Garg
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2312.16045
  • Pdf link: https://arxiv.org/pdf/2312.16045
  • Abstract We introduce a novel positional encoding strategy for Transformer-style models, addressing the shortcomings of existing, often ad hoc, approaches. Our framework provides a flexible mapping from the algebraic specification of a domain to an interpretation as orthogonal operators. This design preserves the algebraic characteristics of the source domain, ensuring that the model upholds the desired structural properties. Our scheme can accommodate various structures, including sequences, grids and trees, as well as their compositions. We conduct a series of experiments to demonstrate the practical applicability of our approach. Results suggest performance on par with or surpassing the current state-of-the-art, without hyperparameter optimizations or ``task search'' of any kind. Code will be made available at \url{github.com/konstantinosKokos/UnitaryPE}.

Quantum-Hybrid Stereo Matching With Nonlinear Regularization and Spatial Pyramids

  • Authors: Authors: Cameron Braunstein (1 and 2), Eddy Ilg (1), Vladislav Golyanik (2) ((1) Saarland University, SIC, (2) MPI for Informatics, SIC)
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.16118
  • Pdf link: https://arxiv.org/pdf/2312.16118
  • Abstract Quantum visual computing is advancing rapidly. This paper presents a new formulation for stereo matching with nonlinear regularizers and spatial pyramids on quantum annealers as a maximum a posteriori inference problem that minimizes the energy of a Markov Random Field. Our approach is hybrid (i.e., quantum-classical) and is compatible with modern D-Wave quantum annealers, i.e., it includes a quadratic unconstrained binary optimization (QUBO) objective. Previous quantum annealing techniques for stereo matching are limited to using linear regularizers, and thus, they do not exploit the fundamental advantages of the quantum computing paradigm in solving combinatorial optimization problems. In contrast, our method utilizes the full potential of quantum annealing for stereo matching, as nonlinear regularizers create optimization problems which are NP-hard. On the Middlebury benchmark, we achieve an improved root mean squared accuracy over the previous state of the art in quantum stereo matching of 2% and 22.5% when using different solvers.

A bi-objective $ε$-constrained framework for quality-cost optimization in language model ensembles

  • Authors: Authors: Aditi Singla, Aditya Singh, Kanishk Kukreja
  • Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2312.16119
  • Pdf link: https://arxiv.org/pdf/2312.16119
  • Abstract We propose an ensembling framework that uses diverse open-sourced Large Language Models (LLMs) to achieve high response quality while maintaining cost efficiency. We formulate a bi-objective optimization problem to represent the quality-cost tradeoff and then introduce an additional budget constraint that reduces the problem to a straightforward 0/1 knapsack problem. We empirically demonstrate that our framework outperforms the existing ensembling approaches in response quality while significantly reducing costs.

Large Language Model Situational Awareness Based Planning

  • Authors: Authors: Liman Wang, Hanyang Zhong
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2312.16127
  • Pdf link: https://arxiv.org/pdf/2312.16127
  • Abstract This work pioneers evaluating emergent planning capabilities based on situational awareness in large language models. We contribute (i) novel benchmarks and metrics for standardized assessment; (ii) a unique dataset to spur progress; and (iii) demonstrations that prompting and multi-agent schemes significantly enhance planning performance in context-sensitive planning tasks. Positioning this within a situated agent and automated planning research, we highlight inherent reliability challenges--efficiently mapping world states to actions without environmental guidance remains open despite simulated domain advances. Although out-of-scope, limitations around validation methodology and data availability indicate exciting directions, including fine-tuning on expanded planning corpora and optimizations for triggering fast latent planning. By conclusively demonstrating current methods' promise and limitations via rigorous comparison, we catalyze investigating reliable goal-directed reasoning for situated agents.

Keyword: adam

ZO-AdaMU Optimizer: Adapting Perturbation by the Momentum and Uncertainty in Zeroth-order Optimization

  • Authors: Authors: Shuoran Jiang, Qingcai Chen, Youchen Pan, Yang Xiang, Yukang Lin, Xiangping Wu, Chuanyi Liu, Xiaobao Song
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.15184
  • Pdf link: https://arxiv.org/pdf/2312.15184
  • Abstract Lowering the memory requirement in full-parameter training on large models has become a hot research area. MeZO fine-tunes the large language models (LLMs) by just forward passes in a zeroth-order SGD optimizer (ZO-SGD), demonstrating excellent performance with the same GPU memory usage as inference. However, the simulated perturbation stochastic approximation for gradient estimate in MeZO leads to severe oscillations and incurs a substantial time overhead. Moreover, without momentum regularization, MeZO shows severe over-fitting problems. Lastly, the perturbation-irrelevant momentum on ZO-SGD does not improve the convergence rate. This study proposes ZO-AdaMU to resolve the above problems by adapting the simulated perturbation with momentum in its stochastic approximation. Unlike existing adaptive momentum methods, we relocate momentum on simulated perturbation in stochastic gradient approximation. Our convergence analysis and experiments prove this is a better way to improve convergence stability and rate in ZO-SGD. Extensive experiments demonstrate that ZO-AdaMU yields better generalization for LLMs fine-tuning across various NLP tasks than MeZO and its momentum variants.

Robust Neural Pruning with Gradient Sampling Optimization for Residual Neural Networks

  • Authors: Authors: Juyoung Yun
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.16020
  • Pdf link: https://arxiv.org/pdf/2312.16020
  • Abstract In this study, we explore an innovative approach for neural network optimization, focusing on the application of gradient sampling techniques, similar to those in StochGradAdam, during the pruning process. Our primary objective is to maintain high accuracy levels in pruned models, a critical challenge in resource-limited scenarios. Our extensive experiments reveal that models optimized with gradient sampling techniques are more effective at preserving accuracy during pruning compared to those using traditional optimization methods. This finding underscores the significance of gradient sampling in facilitating robust learning and enabling networks to retain crucial information even after substantial reduction in their complexity. We validate our approach across various datasets and neural architectures, demonstrating its broad applicability and effectiveness. The paper also delves into the theoretical aspects, explaining how gradient sampling techniques contribute to the robustness of models during pruning. Our results suggest a promising direction for creating efficient neural networks that do not compromise on accuracy, even in environments with constrained computational resources.

Keyword: gradient

Gradient Shaping for Multi-Constraint Safe Reinforcement Learning

  • Authors: Authors: Yihang Yao, Zuxin Liu, Zhepeng Cen, Peide Huang, Tingnan Zhang, Wenhao Yu, Ding Zhao
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.15127
  • Pdf link: https://arxiv.org/pdf/2312.15127
  • Abstract Online safe reinforcement learning (RL) involves training a policy that maximizes task efficiency while satisfying constraints via interacting with the environments. In this paper, our focus lies in addressing the complex challenges associated with solving multi-constraint (MC) safe RL problems. We approach the safe RL problem from the perspective of Multi-Objective Optimization (MOO) and propose a unified framework designed for MC safe RL algorithms. This framework highlights the manipulation of gradients derived from constraints. Leveraging insights from this framework and recognizing the significance of \textit{redundant} and \textit{conflicting} constraint conditions, we introduce the Gradient Shaping (GradS) method for general Lagrangian-based safe RL algorithms to improve the training efficiency in terms of both reward and constraint satisfaction. Our extensive experimentation demonstrates the effectiveness of our proposed method in encouraging exploration and learning a policy that improves both safety and reward performance across various challenging MC safe RL tasks as well as good scalability to the number of constraints.

ZO-AdaMU Optimizer: Adapting Perturbation by the Momentum and Uncertainty in Zeroth-order Optimization

  • Authors: Authors: Shuoran Jiang, Qingcai Chen, Youchen Pan, Yang Xiang, Yukang Lin, Xiangping Wu, Chuanyi Liu, Xiaobao Song
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.15184
  • Pdf link: https://arxiv.org/pdf/2312.15184
  • Abstract Lowering the memory requirement in full-parameter training on large models has become a hot research area. MeZO fine-tunes the large language models (LLMs) by just forward passes in a zeroth-order SGD optimizer (ZO-SGD), demonstrating excellent performance with the same GPU memory usage as inference. However, the simulated perturbation stochastic approximation for gradient estimate in MeZO leads to severe oscillations and incurs a substantial time overhead. Moreover, without momentum regularization, MeZO shows severe over-fitting problems. Lastly, the perturbation-irrelevant momentum on ZO-SGD does not improve the convergence rate. This study proposes ZO-AdaMU to resolve the above problems by adapting the simulated perturbation with momentum in its stochastic approximation. Unlike existing adaptive momentum methods, we relocate momentum on simulated perturbation in stochastic gradient approximation. Our convergence analysis and experiments prove this is a better way to improve convergence stability and rate in ZO-SGD. Extensive experiments demonstrate that ZO-AdaMU yields better generalization for LLMs fine-tuning across various NLP tasks than MeZO and its momentum variants.

Look-ahead Search on Top of Policy Networks in Imperfect Information Games

  • Authors: Authors: Ondrej Kubicek, Neil Burch, Viliam Lisy
  • Subjects: Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2312.15220
  • Pdf link: https://arxiv.org/pdf/2312.15220
  • Abstract Conducting additional search during test time is often used to improve the performance of reinforcement learning algorithms. Performing search in adversarial games with imperfect information is notoriously difficult and often requires a complicated training process. We present an algorithm that uses an arbitrary policy-gradient algorithm that learns from sampled trajectories in the setting of fully adversarial two-player games with imperfect information. Alongside the training of the policy network, the algorithm trains an additional critic network, which provides multiple expected values if both players follow one of a fixed set of transformations of the policy given by the policy network. These values are then used for depth-limited search. We show how the values from this critic can create a value function for imperfect information games. Moreover, they can be used to compute the summary statistics necessary to start the search from an arbitrary decision point in the game. The presented algorithm is scalable to very large games since it does not require any search in the training time. Furthermore, given sufficient computational resources, our algorithm may choose whether to use search or play the strategy according to the trained policy network anywhere in the game. We evaluate the algorithm's performance when trained alongside Regularized Nash Dynamics, and we compare the performance of using the search against the policy network in the standard benchmark game of Leduc hold'em, multiple variants of imperfect information Goofspiel, and in a game of Battleships.

Regularized PolyKervNets: Optimizing Expressiveness and Efficiency for Private Inference in Deep Neural Networks

  • Authors: Authors: Toluwani Aremu
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.15229
  • Pdf link: https://arxiv.org/pdf/2312.15229
  • Abstract Private computation of nonlinear functions, such as Rectified Linear Units (ReLUs) and max-pooling operations, in deep neural networks (DNNs) poses significant challenges in terms of storage, bandwidth, and time consumption. To address these challenges, there has been a growing interest in utilizing privacy-preserving techniques that leverage polynomial activation functions and kernelized convolutions as alternatives to traditional ReLUs. However, these alternative approaches often suffer from a trade-off between achieving faster private inference (PI) and sacrificing model accuracy. In particular, when applied to much deeper networks, these methods encounter training instabilities, leading to issues like exploding gradients (resulting in NaNs) or suboptimal approximations. In this study, we focus on PolyKervNets, a technique known for offering improved dynamic approximations in smaller networks but still facing instabilities in larger and more complex networks. Our primary objective is to empirically explore optimization-based training recipes to enhance the performance of PolyKervNets in larger networks. By doing so, we aim to potentially eliminate the need for traditional nonlinear activation functions, thereby advancing the state-of-the-art in privacy-preserving deep neural network architectures. Code can be found on GitHub at: \url{https://github.com/tolusophy/PolyKervNets/}

Instrumental Variables based DREM for Online Asymptotic Identification of Perturbed Linear Systems

  • Authors: Authors: Anton Glushchenko, Konstantin Lastochkin
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2312.15631
  • Pdf link: https://arxiv.org/pdf/2312.15631
  • Abstract Existing online continuous-time parameter estimation laws provide exact (asymptotic/exponential or finite/fixed time) identification of dynamical linear/nonlinear systems parameters only if the external perturbations are equaled to zero or represented as a vanishing and absolutely integrable function of time. However, real systems are generally affected by non-zero and non-decaying disturbances, in the presence of which the above-mentioned identification approaches ensure only boundedness of a parameter estimation error. The main goal of this study is to close this gap and develop a novel online continuous-time parameter estimator, which guarantees exact asymptotic identification of unknown parameters of linear systems in the presence of unknown but bounded perturbations. To achieve the aforementioned goal, it is proposed to augment the deeply investigated Dynamic Regressor Extension and Mixing (DREM) procedure with the novel Instrumental Variables (IV) based extension scheme with averaging. Such an approach allows one to obtain a set of scalar regression equations with asymptotically vanishing perturbation even if the initial disturbance that affects the plant is only bounded. It is rigorously proved that a gradient estimation law designed on the basis of such scalar regressions ensures online asymptotic identification of the parameters of the perturbed linear systems if the disturbance and control input do not include signals with common frequencies, which is a weak assumption for applications. Theoretical results are illustrated and supported with adequate numerical simulations.

Unconditionally stable exponential integrator schemes for the 2D Cahn-Hilliard equation

  • Authors: Authors: Xinyu Cheng
  • Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
  • Arxiv link: https://arxiv.org/abs/2312.15656
  • Pdf link: https://arxiv.org/pdf/2312.15656
  • Abstract Phase field models are gradient flows with their energy naturally dissipating in time. In order to preserve this property, many numerical schemes have been well-studied. In this paper we consider a well-known method, namely the exponential integrator method (EI). In the literature a few works studied several EI schemes for various phase field models and proved the energy dissipation by either requiring a strong Lipschitz condition on the nonlinear source term or certain $L^\infty$ bounds on the numerical solutions (maximum principle). However for phase field models such as the (non-local) Cahn-Hilliard equation, the maximum principle no longer exists. As a result, solving such models via EI schemes remains open for a long time. In this paper we aim to give a systematic approach on applying EI-type schemes to such models by solving the Cahn-Hilliard equation with a first order EI scheme and showing the energy dissipation. In fact second order EI schemes can be handled similarly and we leave the discussion in a subsequent paper. To our best knowledge, this is the first work to handle phase field models without assuming any strong Lipschitz condition or $L^\infty$ boundedness. Furthermore, we will analyze the $L^2$ error and present some numerical simulations to demonstrate the dynamics.

TAPE: Leveraging Agent Topology for Cooperative Multi-Agent Policy Gradient

  • Authors: Authors: Xingzhou Lou, Junge Zhang, Timothy J. Norman, Kaiqi Huang, Yali Du
  • Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2312.15667
  • Pdf link: https://arxiv.org/pdf/2312.15667
  • Abstract Multi-Agent Policy Gradient (MAPG) has made significant progress in recent years. However, centralized critics in state-of-the-art MAPG methods still face the centralized-decentralized mismatch (CDM) issue, which means sub-optimal actions by some agents will affect other agent's policy learning. While using individual critics for policy updates can avoid this issue, they severely limit cooperation among agents. To address this issue, we propose an agent topology framework, which decides whether other agents should be considered in policy gradient and achieves compromise between facilitating cooperation and alleviating the CDM issue. The agent topology allows agents to use coalition utility as learning objective instead of global utility by centralized critics or local utility by individual critics. To constitute the agent topology, various models are studied. We propose Topology-based multi-Agent Policy gradiEnt (TAPE) for both stochastic and deterministic MAPG methods. We prove the policy improvement theorem for stochastic TAPE and give a theoretical explanation for the improved cooperation among agents. Experiment results on several benchmarks show the agent topology is able to facilitate agent cooperation and alleviate CDM issue respectively to improve performance of TAPE. Finally, multiple ablation studies and a heuristic graph search algorithm are devised to show the efficacy of the agent topology.

Simultaneous Optimal System and Controller Design for Multibody Systems with Joint Friction using Direct Sensitivities

  • Authors: Authors: Adwait Verulkar, Corina Sandu, Adrian Sandu, Daniel Dopico
  • Subjects: Systems and Control (eess.SY); Computational Engineering, Finance, and Science (cs.CE); Dynamical Systems (math.DS); Numerical Analysis (math.NA); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2312.15771
  • Pdf link: https://arxiv.org/pdf/2312.15771
  • Abstract Real-world multibody systems are often subject to phenomena like friction, joint clearances, and external events. These phenomena can significantly impact the optimal design of the system and its controller. This work addresses the gradient-based optimization methodology for multibody dynamic systems with joint friction using a direct sensitivity approach for gradient computation. After a thorough review of various friction models developed over the years, the Brown McPhee model has been found to be the most suitable for the study due to its accuracy for dynamic simulation and its compatibility with sensitivity analysis. The methodology supports co-design of the system and its controller, which is especially relevant for applications like robotics and servo-mechanical systems where the actuation and the design are highly dependent on each other. Numerical results are obtained using a new implementation of the MBSVT (Multi-Body Systems at Virginia Tech) software package; MBSVT 2.0 is reprogrammed in Julia for ease of implementation while maintaining high computational efficiency. Three case studies are provided to demonstrate the attractive properties of simultaneous optimal design and control approach for certain applications.

High Efficiency Inference Accelerating Algorithm for NOMA-based Mobile Edge Computing

  • Authors: Authors: Xin Yuan, Ning Li, Tuo Zhang, Muqing Li, Yuwen Chen, Jose Fernan Martinez Ortega, Song Guo
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2312.15850
  • Pdf link: https://arxiv.org/pdf/2312.15850
  • Abstract Splitting the inference model between device, edge server, and cloud can improve the performance of EI greatly. Additionally, the non-orthogonal multiple access (NOMA), which is the key supporting technologies of B5G/6G, can achieve massive connections and high spectrum efficiency. Motivated by the benefits of NOMA, integrating NOMA with model split in MEC to reduce the inference latency further becomes attractive. However, the NOMA based communication during split inference has not been properly considered in previous works. Therefore, in this paper, we integrate the NOMA into split inference in MEC, and propose the effective communication and computing resource allocation algorithm to accelerate the model inference at edge. Specifically, when the mobile user has a large model inference task needed to be calculated in the NOMA-based MEC, it will take the energy consumption of both device and edge server and the inference latency into account to find the optimal model split strategy, subchannel allocation strategy (uplink and downlink), and transmission power allocation strategy (uplink and downlink). Since the minimum inference delay and energy consumption cannot be satisfied simultaneously, and the variables of subchannel allocation and model split are discrete, the gradient descent (GD) algorithm is adopted to find the optimal tradeoff between them. Moreover, the loop iteration GD approach (Li-GD) is proposed to reduce the complexity of GD algorithm that caused by the parameter discrete. Additionally, the properties of the proposed algorithm are also investigated, which demonstrate the effectiveness of the proposed algorithms.

Towards Squeezing-Averse Virtual Try-On via Sequential Deformation

  • Authors: Authors: Sang-Heon Shim, Jiwoo Chung, Jae-Pil Heo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.15861
  • Pdf link: https://arxiv.org/pdf/2312.15861
  • Abstract In this paper, we first investigate a visual quality degradation problem observed in recent high-resolution virtual try-on approach. The tendency is empirically found that the textures of clothes are squeezed at the sleeve, as visualized in the upper row of Fig.1(a). A main reason for the issue arises from a gradient conflict between two popular losses, the Total Variation (TV) and adversarial losses. Specifically, the TV loss aims to disconnect boundaries between the sleeve and torso in a warped clothing mask, whereas the adversarial loss aims to combine between them. Such contrary objectives feedback the misaligned gradients to a cascaded appearance flow estimation, resulting in undesirable squeezing artifacts. To reduce this, we propose a Sequential Deformation (SD-VITON) that disentangles the appearance flow prediction layers into TV objective-dominant (TVOB) layers and a task-coexistence (TACO) layer. Specifically, we coarsely fit the clothes onto a human body via the TVOB layers, and then keep on refining via the TACO layer. In addition, the bottom row of Fig.1(a) shows a different type of squeezing artifacts around the waist. To address it, we further propose that we first warp the clothes into a tucked-out shirts style, and then partially erase the texture from the warped clothes without hurting the smoothness of the appearance flows. Experimental results show that our SD-VITON successfully resolves both types of artifacts and outperforms the baseline methods. Source code will be available at https://github.com/SHShim0513/SD-VITON.

Black-Box Tuning of Vision-Language Models with Effective Gradient Approximation

  • Authors: Authors: Zixian Guo, Yuxiang Wei, Ming Liu, Zhilong Ji, Jinfeng Bai, Yiwen Guo, Wangmeng Zuo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.15901
  • Pdf link: https://arxiv.org/pdf/2312.15901
  • Abstract Parameter-efficient fine-tuning (PEFT) methods have provided an effective way for adapting large vision-language models to specific tasks or scenarios. Typically, they learn a very small scale of parameters for pre-trained models in a white-box formulation, which assumes model architectures to be known and parameters to be accessible. However, large models are often not open-source due to considerations of preventing abuse or commercial factors, hence posing a barrier to the deployment of white-box PEFT methods. To alleviate the dependence on model accessibility, we introduce collaborative black-box tuning (CBBT) for both textual prompt optimization and output feature adaptation for black-box models. Specifically, considering that the backpropagation gradients are blocked, we approximate the gradients of textual prompts by analyzing the predictions with perturbed prompts. Secondly, a lightweight adapter is deployed over the output feature of the inaccessible model, further facilitating the model adaptation process. Empowered with these designs, our CBBT is extensively evaluated on eleven downstream benchmarks and achieves remarkable improvements compared to existing black-box VL adaptation methods. Code is released at https://github.com/guozix/cbbt.

Filtered data based estimators for stochastic processes driven by colored noise

  • Authors: Authors: Grigorios A. Pavliotis, Sebastian Reich, Andrea Zanoni
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2312.15975
  • Pdf link: https://arxiv.org/pdf/2312.15975
  • Abstract We consider the problem of estimating unknown parameters in stochastic differential equations driven by colored noise, given continuous-time observations. Colored noise is modelled as a sequence of mean zero Gaussian stationary processes with an exponential autocorrelation function, with decreasing correlation time. Our goal is to infer parameters in the limit equation, driven by white noise, given observations of the colored noise dynamics. As in the case of parameter estimation for multiscale diffusions, the observations are only compatible with the data in the white noise limit, and classic estimators become biased, implying the need of preprocessing the data. We consider both the maximum likelihood and the stochastic gradient descent in continuous time estimators, and we propose modified versions of these methods, in which the observations are filtered using an exponential filter. Both stochastic differential equations with additive and multiplicative noise are considered. We provide a convergence analysis for our novel estimators in the limit of infinite data, and in the white noise limit, showing that the estimators are asymptotically unbiased. We consider in detail the case of multiplicative colored noise, in particular when the L'evy area correction drift appears in the limiting white noise equation. A series of numerical experiments corroborates our theoretical results.

Adaptive Kalman-based hybrid car following strategy using TD3 and CACC

  • Authors: Authors: Yuqi Zheng, Ruidong Yan, Bin Jia, Rui Jiang, Adriana TAPUS, Xiaojing Chen, Shiteng Zheng, Ying Shang
  • Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2312.15993
  • Pdf link: https://arxiv.org/pdf/2312.15993
  • Abstract In autonomous driving, the hybrid strategy of deep reinforcement learning and cooperative adaptive cruise control (CACC) can fully utilize the advantages of the two algorithms and significantly improve the performance of car following. However, it is challenging for the traditional hybrid strategy based on fixed coefficients to adapt to mixed traffic flow scenarios, which may decrease the performance and even lead to accidents. To address the above problems, a hybrid car following strategy based on an adaptive Kalman Filter is proposed by regarding CACC and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms. Different from traditional hybrid strategy based on fixed coefficients, the Kalman gain H, using as an adaptive coefficient, is derived from multi-timestep predictions and Monte Carlo Tree Search. At the end of study, simulation results with 4157745 timesteps indicate that, compared with the TD3 and HCFS algorithms, the proposed algorithm in this study can substantially enhance the safety of car following in mixed traffic flow without compromising the comfort and efficiency.

Robust Neural Pruning with Gradient Sampling Optimization for Residual Neural Networks

  • Authors: Authors: Juyoung Yun
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.16020
  • Pdf link: https://arxiv.org/pdf/2312.16020
  • Abstract In this study, we explore an innovative approach for neural network optimization, focusing on the application of gradient sampling techniques, similar to those in StochGradAdam, during the pruning process. Our primary objective is to maintain high accuracy levels in pruned models, a critical challenge in resource-limited scenarios. Our extensive experiments reveal that models optimized with gradient sampling techniques are more effective at preserving accuracy during pruning compared to those using traditional optimization methods. This finding underscores the significance of gradient sampling in facilitating robust learning and enabling networks to retain crucial information even after substantial reduction in their complexity. We validate our approach across various datasets and neural architectures, demonstrating its broad applicability and effectiveness. The paper also delves into the theoretical aspects, explaining how gradient sampling techniques contribute to the robustness of models during pruning. Our results suggest a promising direction for creating efficient neural networks that do not compromise on accuracy, even in environments with constrained computational resources.

On the Trajectories of SGD Without Replacement

  • Authors: Authors: Pierfrancesco Beneventano
  • Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2312.16143
  • Pdf link: https://arxiv.org/pdf/2312.16143
  • Abstract This article examines the implicit regularization effect of Stochastic Gradient Descent (SGD). We consider the case of SGD without replacement, the variant typically used to optimize large-scale neural networks. We analyze this algorithm in a more realistic regime than typically considered in theoretical works on SGD, as, e.g., we allow the product of the learning rate and Hessian to be $O(1)$. Our core theoretical result is that optimizing with SGD without replacement is locally equivalent to making an additional step on a novel regularizer. This implies that the trajectory of SGD without replacement diverges from both noise-injected GD and SGD with replacement (in which batches are sampled i.i.d.). Indeed, the two SGDs travel flat regions of the loss landscape in distinct directions and at different speeds. In expectation, SGD without replacement may escape saddles significantly faster and present a smaller variance. Moreover, we find that SGD implicitly regularizes the trace of the noise covariance in the eigendirections of small and negative Hessian eigenvalues. This coincides with penalizing a weighted trace of the Fisher Matrix and the Hessian on several vision tasks, thus encouraging sparsity in the spectrum of the Hessian of the loss in line with empirical observations from prior work. We also propose an explanation for why SGD does not train at the edge of stability (as opposed to GD).

Keyword: super-resolution

There is no result

zoq avatar Dec 27 '23 07:12 zoq