arxiv-updates icon indicating copy to clipboard operation
arxiv-updates copied to clipboard

New submissions for Thu, 18 Jan 24

Open zoq opened this issue 1 year ago • 0 comments

Keyword: sgd

Asynchronous Local-SGD Training for Language Modeling

  • Authors: Authors: Bo Liu, Rachita Chhaparia, Arthur Douillard, Satyen Kale, Andrei A. Rusu, Jiajun Shen, Arthur Szlam, Marc'Aurelio Ranzato
  • Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2401.09135
  • Pdf link: https://arxiv.org/pdf/2401.09135
  • Abstract Local stochastic gradient descent (Local-SGD), also referred to as federated averaging, is an approach to distributed optimization where each device performs more than one SGD update per communication. This work presents an empirical study of {\it asynchronous} Local-SGD for training language models; that is, each worker updates the global parameters as soon as it has finished its SGD steps. We conduct a comprehensive investigation by examining how worker hardware heterogeneity, model size, number of workers, and optimizer could impact the learning performance. We find that with naive implementations, asynchronous Local-SGD takes more iterations to converge than its synchronous counterpart despite updating the (global) model parameters more frequently. We identify momentum acceleration on the global parameters when worker gradients are stale as a key challenge. We propose a novel method that utilizes a delayed Nesterov momentum update and adjusts the workers' local training steps based on their computation speed. This approach, evaluated with models up to 150M parameters on the C4 dataset, matches the performance of synchronous Local-SGD in terms of perplexity per update step, and significantly surpasses it in terms of wall clock time.

Keyword: optimization

Representation Learning in a Decomposed Encoder Design for Bio-inspired Hebbian Learning

  • Authors: Authors: Achref Jaziri, Sina Ditzel, Iuliia Pliushch, Visvanathan Ramesh
  • Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.08603
  • Pdf link: https://arxiv.org/pdf/2401.08603
  • Abstract Modern data-driven machine learning system designs exploit inductive biases on architectural structure, invariance and equivariance requirements, task specific loss functions, and computational optimization tools. Previous works have illustrated that inductive bias in the early layers of the encoder in the form of human specified quasi-invariant filters can serve as a powerful inductive bias to attain better robustness and transparency in learned classifiers. This paper explores this further in the context of representation learning with local plasticity rules i.e. bio-inspired Hebbian learning . We propose a modular framework trained with a bio-inspired variant of contrastive predictive coding (Hinge CLAPP Loss). Our framework is composed of parallel encoders each leveraging a different invariant visual descriptor as an inductive bias. We evaluate the representation learning capacity of our system in a classification scenario on image data of various difficulties (GTSRB, STL10, CODEBRIM) as well as video data (UCF101). Our findings indicate that this form of inductive bias can be beneficial in closing the gap between models with local plasticity rules and backpropagation models as well as learning more robust representations in general.

Conditional Flood Fill Method in Logic Synthesis

  • Authors: Authors: Shitian Yang, Junyue Jiang, Yilai Liang, Xiaoyang Chu
  • Subjects: Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2401.08625
  • Pdf link: https://arxiv.org/pdf/2401.08625
  • Abstract In the field of Electronic Design Automation (EDA), logic synthesis plays a pivotal role in optimizing hardware resources. Traditional logic synthesis algorithms, such as the Quine-McCluskey method, face challenges in scalability and efficiency, particularly for higher-dimension problems. This paper introduces a novel heuristic algorithm based on Conditional Flood Fill Method aimed at addressing these limitations. Our method employs count-based adjacent element handling and introduces nine new theorems to guide the logic synthesis process. Experimental results validate the efficacy of our approach, showing significant improvements in computational efficiency and scalability compared to existing algorithms. The algorithm holds potential for future advancements in circuit development and Boolean function optimization.

Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning

  • Authors: Authors: Maxence Faldor, Félix Chalumeau, Manon Flageat, Antoine Cully
  • Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2401.08632
  • Pdf link: https://arxiv.org/pdf/2401.08632
  • Abstract A fundamental trait of intelligence involves finding novel and creative solutions to address a given challenge or to adapt to unforeseen situations. Reflecting this, Quality-Diversity optimization is a family of Evolutionary Algorithms, that generates collections of both diverse and high-performing solutions. Among these, MAP-Elites is a prominent example, that has been successfully applied to a variety of domains, including evolutionary robotics. However, MAP-Elites performs a divergent search with random mutations originating from Genetic Algorithms, and thus, is limited to evolving populations of low-dimensional solutions. PGA-MAP-Elites overcomes this limitation using a gradient-based variation operator inspired by deep reinforcement learning which enables the evolution of large neural networks. Although high-performing in many environments, PGA-MAP-Elites fails on several tasks where the convergent search of the gradient-based variation operator hinders diversity. In this work, we present three contributions: (1) we enhance the Policy Gradient variation operator with a descriptor-conditioned critic that reconciles diversity search with gradient-based methods, (2) we leverage the actor-critic training to learn a descriptor-conditioned policy at no additional cost, distilling the knowledge of the population into one single versatile policy that can execute a diversity of behaviors, (3) we exploit the descriptor-conditioned actor by injecting it in the population, despite network architecture differences. Our method, DCG-MAP-Elites, achieves equal or higher QD score and coverage compared to all baselines on seven challenging continuous control locomotion tasks.

An Efficient Dynamic Transaction Storage Mechanism for Sustainable High Throughput Bitcoin

  • Authors: Authors: Xiongfei Zhao, Gerui Zhang, Yain-Whar Si
  • Subjects: Networking and Internet Architecture (cs.NI); Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2401.08652
  • Pdf link: https://arxiv.org/pdf/2401.08652
  • Abstract As coin-based rewards dwindle, transaction fees play an important role as mining incentives in Bitcoin. In this paper, we propose a novel mechanism called Efficient Dynamic Transaction Storage (EDTS) for dynamically allocating transactions among blocks to achieve efficient storage utilization. By leveraging a combination of Cuckoo Filter and Dynamic Transaction Storage (DTS) strategies, EDTS is able to improve the scalability while remaining sustainable even after the Bitcoin enters a transaction-fee regime. In addition to preventing deviant mining behaviors under the transaction-fee regime, EDTS can also provide differentiated transmission priorities based on transaction fees while allowing the investors to engage in pledging more transaction fees. In EDTS, we applied the multi-objective optimization algorithm U-NSGA-III to find the best DTS strategy and its corresponding attributes. Experimental results show that the EDTS mechanism together with the optimized DTS strategy can achieve a throughput of 325.3 TPS. The experimental results reveal that the scalability improvement of EDTS is superior to the performance of Bitcoin NG, which is the best known on-chain scaling solution, while maintaining the sustainability under the transaction-fee regime.

Risk-anticipatory autonomous driving strategies considering vehicles' weights, based on hierarchical deep reinforcement learning

  • Authors: Authors: Di Chen, Hao Li, Zhicheng Jin, Huizhao Tu
  • Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.08661
  • Pdf link: https://arxiv.org/pdf/2401.08661
  • Abstract Autonomous vehicles (AVs) have the potential to prevent accidents caused by drivers' error and reduce road traffic risks. Due to the nature of heavy vehicles, whose collisions cause more serious crashes, the weights of vehicles need to be considered when making driving strategies aimed at reducing the potential risks and their consequences in the context of autonomous driving. This study develops an autonomous driving strategy based on risk anticipation, considering the weights of surrounding vehicles and using hierarchical deep reinforcement learning. A risk indicator integrating surrounding vehicles' weights, based on the risk field theory, is proposed and incorporated into autonomous driving decisions. A hybrid action space is designed to allow for left lane changes, right lane changes and car-following, which enables AVs to act more freely and realistically whenever possible. To solve the above hybrid decision-making problem, a hierarchical proximal policy optimization (HPPO) algorithm is developed and an attention mechanism is incorporated, providing great advantages in maintaining stable performance. An indicator, potential collision energy in conflicts (PCEC), is newly proposed to evaluate the performance of the developed AV driving strategy from both the perspectives of the likelihood and the consequences of potential accidents. An application is carried out and the simulation results demonstrate that our model provides driving strategies that reduce both the likelihood and consequences of potential accidents, at the same time maintaining driving efficiency. The developed method is especially meaningful for AVs driving on highways, where heavy vehicles make up a high proportion of the traffic.

Zero-Shot RTL Code Generation with Attention Sink Augmented Large Language Models

  • Authors: Authors: Selim Sandal, Ismail Akturk
  • Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Programming Languages (cs.PL); Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2401.08683
  • Pdf link: https://arxiv.org/pdf/2401.08683
  • Abstract The design and optimization of hardware have traditionally been resource-intensive, demanding considerable expertise and dependence on established design automation tools. This paper discusses the possibility of exploiting large language models to streamline the code generation process in hardware design. In contrast to earlier studies, this paper aims to use large language models that accepts high-level design specifications through a single prompt to generate corresponding Register-Transfer Level (RTL) code. The ability to use large language models on RTL code generation not only expedites design iteration cycles but also facilitates the exploration of design spaces that have computational challenges for conventional techniques. Through our evaluation, we demonstrate the shortcoming of existing attention mechanisms, and present the abilities of language models to produce functional, optimized, and industry-standard compliant RTL code when a novel attention mechanism is used. These findings underscore the expanding role of large language models in shaping the future landscape of architectural exploration and automation in hardware design.

Hierarchical Source-to-Post-Route QoR Prediction in High-Level Synthesis with GNNs

  • Authors: Authors: Mingzhe Gao, Jieru Zhao, Zhe Lin, Minyi Guo
  • Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.08696
  • Pdf link: https://arxiv.org/pdf/2401.08696
  • Abstract High-level synthesis (HLS) notably speeds up the hardware design process by avoiding RTL programming. However, the turnaround time of HLS increases significantly when post-route quality of results (QoR) are considered during optimization. To tackle this issue, we propose a hierarchical post-route QoR prediction approach for FPGA HLS, which features: (1) a modeling flow that directly estimates latency and post-route resource usage from C/C++ programs; (2) a graph construction method that effectively represents the control and data flow graph of source code and effects of HLS pragmas; and (3) a hierarchical GNN training and prediction method capable of capturing the impact of loop hierarchies. Experimental results show that our method presents a prediction error of less than 10% for different types of QoR metrics, which gains tremendous improvement compared with the state-of-the-art GNN methods. By adopting our proposed methodology, the runtime for design space exploration in HLS is shortened to tens of minutes and the achieved ADRS is reduced to 6.91% on average.

Decoupled Prototype Learning for Reliable Test-Time Adaptation

  • Authors: Authors: Guowei Wang, Changxing Ding, Wentao Tan, Mingkui Tan
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.08703
  • Pdf link: https://arxiv.org/pdf/2401.08703
  • Abstract Test-time adaptation (TTA) is a task that continually adapts a pre-trained source model to the target domain during inference. One popular approach involves fine-tuning model with cross-entropy loss according to estimated pseudo-labels. However, its performance is significantly affected by noisy pseudo-labels. This study reveals that minimizing the classification error of each sample causes the cross-entropy loss's vulnerability to label noise. To address this issue, we propose a novel Decoupled Prototype Learning (DPL) method that features prototype-centric loss computation. First, we decouple the optimization of class prototypes. For each class prototype, we reduce its distance with positive samples and enlarge its distance with negative samples in a contrastive manner. This strategy prevents the model from overfitting to noisy pseudo-labels. Second, we propose a memory-based strategy to enhance DPL's robustness for the small batch sizes often encountered in TTA. We update each class's pseudo-feature from a memory in a momentum manner and insert an additional DPL loss. Finally, we introduce a consistency regularization-based approach to leverage samples with unconfident pseudo-labels. This approach transfers feature styles of samples with unconfident pseudo-labels to those with confident pseudo-labels. Thus, more reliable samples for TTA are created. The experimental results demonstrate that our methods achieve state-of-the-art performance on domain generalization benchmarks, and reliably improve the performance of self-training-based methods on image corruption benchmarks. The code will be released.

Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks

  • Authors: Authors: Chenyu Zhang, Lanjun Wang, Anan Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2401.08725
  • Pdf link: https://arxiv.org/pdf/2401.08725
  • Abstract Recent developments in text-to-image models, particularly Stable Diffusion, have marked significant achievements in various applications. With these advancements, there are growing safety concerns about the vulnerability of the model that malicious entities exploit to generate targeted harmful images. However, the existing methods in the vulnerability of the model mainly evaluate the alignment between the prompt and generated images, but fall short in revealing the vulnerability associated with targeted image generation. In this study, we formulate the problem of targeted adversarial attack on Stable Diffusion and propose a framework to generate adversarial prompts. Specifically, we design a gradient-based embedding optimization method to craft reliable adversarial prompts that guide stable diffusion to generate specific images. Furthermore, after obtaining successful adversarial prompts, we reveal the mechanisms that cause the vulnerability of the model. Extensive experiments on two targeted attack tasks demonstrate the effectiveness of our method in targeted attacks. The code can be obtained in https://github.com/datar001/Revealing-Vulnerabilities-in-Stable-Diffusion-via-Targeted-Attacks.

Stochastic Subnetwork Annealing: A Regularization Technique for Fine Tuning Pruned Subnetworks

  • Authors: Authors: Tim Whitaker, Darrell Whitley
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.08830
  • Pdf link: https://arxiv.org/pdf/2401.08830
  • Abstract Pruning methods have recently grown in popularity as an effective way to reduce the size and computational complexity of deep neural networks. Large numbers of parameters can be removed from trained models with little discernible loss in accuracy after a small number of continued training epochs. However, pruning too many parameters at once often causes an initial steep drop in accuracy which can undermine convergence quality. Iterative pruning approaches mitigate this by gradually removing a small number of parameters over multiple epochs. However, this can still lead to subnetworks that overfit local regions of the loss landscape. We introduce a novel and effective approach to tuning subnetworks through a regularization technique we call Stochastic Subnetwork Annealing. Instead of removing parameters in a discrete manner, we instead represent subnetworks with stochastic masks where each parameter has a probabilistic chance of being included or excluded on any given forward pass. We anneal these probabilities over time such that subnetwork structure slowly evolves as mask values become more deterministic, allowing for a smoother and more robust optimization of subnetworks at high levels of sparsity.

Efficient Neural Representation of Volumetric Data using Coordinate-Based Networks

  • Authors: Authors: Sudarshan Devkota, Sumanta Pattanaik
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2401.08840
  • Pdf link: https://arxiv.org/pdf/2401.08840
  • Abstract In this paper, we propose an efficient approach for the compression and representation of volumetric data utilizing coordinate-based networks and multi-resolution hash encoding. Efficient compression of volumetric data is crucial for various applications, such as medical imaging and scientific simulations. Our approach enables effective compression by learning a mapping between spatial coordinates and intensity values. We compare different encoding schemes and demonstrate the superiority of multi-resolution hash encoding in terms of compression quality and training efficiency. Furthermore, we leverage optimization-based meta-learning, specifically using the Reptile algorithm, to learn weight initialization for neural representations tailored to volumetric data, enabling faster convergence during optimization. Additionally, we compare our approach with state-of-the-art methods to showcase improved image quality and compression ratios. These findings highlight the potential of coordinate-based networks and multi-resolution hash encoding for an efficient and accurate representation of volumetric data, paving the way for advancements in large-scale data visualization and other applications.

cedar: Composable and Optimized Machine Learning Input Data Pipelines

  • Authors: Authors: Mark Zhao, Emanuel Adamiak, Christos Kozyrakis
  • Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2401.08895
  • Pdf link: https://arxiv.org/pdf/2401.08895
  • Abstract The input data pipeline is an essential component of each machine learning (ML) training job. It is responsible for reading massive amounts of training data, processing batches of samples using complex of transformations, and loading them onto training nodes at low latency and high throughput. Performant input data systems are becoming increasingly critical, driven by skyrocketing data volumes and training throughput demands. Unfortunately, current input data systems cannot fully leverage key performance optimizations, resulting in hugely inefficient infrastructures that require significant resources -- or worse -- underutilize expensive accelerators. To address these demands, we present cedar, a programming model and framework that allows users to easily build, optimize, and execute input data pipelines. cedar presents an easy-to-use programming interface, allowing users to define input data pipelines using composable operators that support arbitrary ML frameworks and libraries. Meanwhile, cedar transparently applies a complex and extensible set of optimization techniques (e.g., offloading, caching, prefetching, fusion, and reordering). It then orchestrates processing across a customizable set of local and distributed compute resources in order to maximize processing performance and efficiency, all without user input. On average across six diverse input data pipelines, cedar achieves a 2.49x, 1.87x, 2.18x, and 2.74x higher performance compared to tf.data, tf.data service, Ray Data, and PyTorch's DataLoader, respectively.

Bridging State and History Representations: Understanding Self-Predictive RL

  • Authors: Authors: Tianwei Ni, Benjamin Eysenbach, Erfan Seyedsalehi, Michel Ma, Clement Gehring, Aditya Mahajan, Pierre-Luc Bacon
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2401.08898
  • Pdf link: https://arxiv.org/pdf/2401.08898
  • Abstract Representations are at the core of all deep reinforcement learning (RL) methods for both Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). Many representation learning methods and theoretical frameworks have been developed to understand what constitutes an effective representation. However, the relationships between these methods and the shared properties among them remain unclear. In this paper, we show that many of these seemingly distinct methods and frameworks for state and history abstractions are, in fact, based on a common idea of self-predictive abstraction. Furthermore, we provide theoretical insights into the widely adopted objectives and optimization, such as the stop-gradient technique, in learning self-predictive representations. These findings together yield a minimalist algorithm to learn self-predictive representations for states and histories. We validate our theories by applying our algorithm to standard MDPs, MDPs with distractors, and POMDPs with sparse rewards. These findings culminate in a set of practical guidelines for RL practitioners.

3D Human Pose Analysis via Diffusion Synthesis

  • Authors: Authors: Haorui Ji, Hongdong Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2401.08930
  • Pdf link: https://arxiv.org/pdf/2401.08930
  • Abstract Diffusion models have demonstrated remarkable success in generative modeling. In this paper, we propose PADS (Pose Analysis by Diffusion Synthesis), a novel framework designed to address various challenges in 3D human pose analysis through a unified pipeline. Central to PADS are two distinctive strategies: i) learning a task-agnostic pose prior using a diffusion synthesis process to effectively capture the kinematic constraints in human pose data, and ii) unifying multiple pose analysis tasks like estimation, completion, denoising, etc, as instances of inverse problems. The learned pose prior will be treated as a regularization imposing on task-specific constraints, guiding the optimization process through a series of conditional denoising steps. PADS represents the first diffusion-based framework for tackling general 3D human pose analysis within the inverse problem framework. Its performance has been validated on different benchmarks, signaling the adaptability and robustness of this pipeline.

ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization

  • Authors: Authors: Weiyao Wang, Pierre Gleize, Hao Tang, Xingyu Chen, Kevin J Liang, Matt Feiszli
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2401.08937
  • Pdf link: https://arxiv.org/pdf/2401.08937
  • Abstract Neural Radiance Fields (NeRF) exhibit remarkable performance for Novel View Synthesis (NVS) given a set of 2D images. However, NeRF training requires accurate camera pose for each input view, typically obtained by Structure-from-Motion (SfM) pipelines. Recent works have attempted to relax this constraint, but they still often rely on decent initial poses which they can refine. Here we aim at removing the requirement for pose initialization. We present Incremental CONfidence (ICON), an optimization procedure for training NeRFs from 2D video frames. ICON only assumes smooth camera motion to estimate initial guess for poses. Further, ICON introduces ``confidence": an adaptive measure of model quality used to dynamically reweight gradients. ICON relies on high-confidence poses to learn NeRF, and high-confidence 3D structure (as encoded by NeRF) to learn poses. We show that ICON, without prior pose initialization, achieves superior performance in both CO3D and HO3D versus methods which use SfM pose.

PINSAT: Parallelized Interleaving of Graph Search and Trajectory Optimization for Kinodynamic Motion Planning

  • Authors: Authors: Ramkumar Natarajan, Shohin Mukherjee, Howie Choset, Maxim Likhachev
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2401.08948
  • Pdf link: https://arxiv.org/pdf/2401.08948
  • Abstract Trajectory optimization is a widely used technique in robot motion planning for letting the dynamics and constraints on the system shape and synthesize complex behaviors. Several previous works have shown its benefits in high-dimensional continuous state spaces and under differential constraints. However, long time horizons and planning around obstacles in non-convex spaces pose challenges in guaranteeing convergence or finding optimal solutions. As a result, discrete graph search planners and sampling-based planers are preferred when facing obstacle-cluttered environments. A recently developed algorithm called INSAT effectively combines graph search in the low-dimensional subspace and trajectory optimization in the full-dimensional space for global kinodynamic planning over long horizons. Although INSAT successfully reasoned about and solved complex planning problems, the numerous expensive calls to an optimizer resulted in large planning times, thereby limiting its practical use. Inspired by the recent work on edge-based parallel graph search, we present PINSAT, which introduces systematic parallelization in INSAT to achieve lower planning times and higher success rates, while maintaining significantly lower costs over relevant baselines. We demonstrate PINSAT by evaluating it on 6 DoF kinodynamic manipulation planning with obstacles.

A Unified NOMA Framework in Beam-Hopping Satellite Communication Systems

  • Authors: Authors: Xuyang Zhang, Xinwei Yue, Tian Li, Zhihao Han, Yafei Wang, Yong Ding, Rongke Liu
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2401.08956
  • Pdf link: https://arxiv.org/pdf/2401.08956
  • Abstract This paper investigates the application of a unified non-orthogonal multiple access framework in beam hopping (U-NOMA-BH) based satellite communication systems. More specifically, the proposed U-NOMA-BH framework can be applied to code-domain NOMA based BH (CD-NOMA-BH) and power-domain NOMA based BH (PD-NOMA-BH) systems. To satisfy dynamic-uneven traffic demands, we formulate the optimization problem to minimize the square of discrete difference by jointly optimizing power allocation, carrier assignment and beam scheduling. The non-convexity of the objective function and the constraint condition is solved through Dinkelbach's transform and variable relaxation. As a further development, the closed-from and asymptotic expressions of outage probability are derived for CD/PD-NOMA-BH systems. Based on approximated results, the diversity orders of a pair of users are obtained in detail. In addition, the system throughput of U-NOMA-BH is discussed in delay-limited transmission mode. Numerical results verify that: i) The gap between traffic requests of CD/PD-NOMA-BH systems appears to be more closely compared with orthogonal multiple access based BH (OMA-BH); ii) The CD-NOMA-BH system is capable of providing the enhanced traffic request and capacity provision; and iii) The outage behaviors of CD/PD-NOMA-BH are better than that of OMA-BH.

Real-time generative design of diverse, "truly" optimized structures with controllable structural complexities

  • Authors: Authors: Zongliang Dua, Xinyu Ma, Wenyu Hao, Yuan Liang, Xiaoyu Zhang, Hongzhi Luo, Xu Guo
  • Subjects: Computational Engineering, Finance, and Science (cs.CE)
  • Arxiv link: https://arxiv.org/abs/2401.08981
  • Pdf link: https://arxiv.org/pdf/2401.08981
  • Abstract Compared with traditional design methods, generative design significantly attracts engineers in various disciplines. In thiswork, howto achieve the real-time generative design of optimized structures with various diversities and controllable structural complexities is investigated. To this end, a modified Moving Morphable Component (MMC) method together with novel strategies are adopted to generate high-quality dataset. The complexity level of optimized structures is categorized by the topological invariant. By improving the cost function, the WGAN is trained to produce optimized designs with the input of loading position and complexity level in real time. It is found that, diverse designs with a clear load transmission path and crisp boundary, even not requiring further optimization and different from any reference in the dataset, can be generated by the proposed model. This method holds great potential for future applications of machine learning enhanced intelligent design.

MicroNAS: Zero-Shot Neural Architecture Search for MCUs

  • Authors: Authors: Ye Qiao, Haocheng Xu, Yifan Zhang, Sitao Huang
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2401.08996
  • Pdf link: https://arxiv.org/pdf/2401.08996
  • Abstract Neural Architecture Search (NAS) effectively discovers new Convolutional Neural Network (CNN) architectures, particularly for accuracy optimization. However, prior approaches often require resource-intensive training on super networks or extensive architecture evaluations, limiting practical applications. To address these challenges, we propose MicroNAS, a hardware-aware zero-shot NAS framework designed for microcontroller units (MCUs) in edge computing. MicroNAS considers target hardware optimality during the search, utilizing specialized performance indicators to identify optimal neural architectures without high computational costs. Compared to previous works, MicroNAS achieves up to 1104x improvement in search efficiency and discovers models with over 3.23x faster MCU inference while maintaining similar accuracy

An Improved Virtual Force Approach for UAV Deployment and Resource Allocation in Emergency Communications

  • Authors: Authors: Hongying Guo, Li Wang, Ruoguang Li, Luyang Hou, Lianming Xu, Aiguo Fei
  • Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2401.09013
  • Pdf link: https://arxiv.org/pdf/2401.09013
  • Abstract In this paper, we consider an unmanned aerial vehicle (UAV)-enabled emergency communication system, which establishes temporary communication link with users equipment (UEs) in a typical disaster environment with mountainous forest and obstacles. Towards this end, a joint deployment, power allocation, and user association optimization problem is formulated to maximize the total transmission rate, while considering the demand of each UE and the disaster environment characteristics. Then, an alternating optimization algorithm is proposed by integrating coalition game and virtual force approach which captures the impact of the demand priority of UEs and the obstacles to the flight path and consumed power. Simulation results demonstrate that the computation time consumed by our proposed algorithm is only $5.6%$ of the traditional heuristic algorithms, which validates its effectiveness in disaster scenarios.

Improved Consensus ADMM for Cooperative Motion Planning of Large-Scale Connected Autonomous Vehicles with Limited Communication

  • Authors: Authors: Haichao Liu, Zhenmin Huang, Zicheng Zhu, Yulin Li, Shaojie Shen, Jun Ma
  • Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2401.09032
  • Pdf link: https://arxiv.org/pdf/2401.09032
  • Abstract This paper investigates a cooperative motion planning problem for large-scale connected autonomous vehicles (CAVs) under limited communications, which addresses the challenges of high communication and computing resource requirements. Our proposed methodology incorporates a parallel optimization algorithm with improved consensus ADMM considering a more realistic locally connected topology network, and time complexity of O(N) is achieved by exploiting the sparsity in the dual update process. To further enhance the computational efficiency, we employ a lightweight evolution strategy for the dynamic connectivity graph of CAVs, and each sub-problem split from the consensus ADMM only requires managing a small group of CAVs. The proposed method implemented with the receding horizon scheme is validated thoroughly, and comparisons with existing numerical solvers and approaches demonstrate the efficiency of our proposed algorithm. Also, simulations on large-scale cooperative driving tasks involving 80 vehicles are performed in the high-fidelity CARLA simulator, which highlights the remarkable computational efficiency, scalability, and effectiveness of our proposed development. Demonstration videos are available at https://henryhcliu.github.io/icadmm_cmp_carla.

UOEP: User-Oriented Exploration Policy for Enhancing Long-Term User Experiences in Recommender Systems

  • Authors: Authors: Changshuo Zhang, Sirui Chen, Xiao Zhang, Sunhao Dai, Weijie Yu, Jun Xu
  • Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2401.09034
  • Pdf link: https://arxiv.org/pdf/2401.09034
  • Abstract Reinforcement learning (RL) has gained traction for enhancing user long-term experiences in recommender systems by effectively exploring users' interests. However, modern recommender systems exhibit distinct user behavioral patterns among tens of millions of items, which increases the difficulty of exploration. For example, user behaviors with different activity levels require varying intensity of exploration, while previous studies often overlook this aspect and apply a uniform exploration strategy to all users, which ultimately hurts user experiences in the long run. To address these challenges, we propose User-Oriented Exploration Policy (UOEP), a novel approach facilitating fine-grained exploration among user groups. We first construct a distributional critic which allows policy optimization under varying quantile levels of cumulative reward feedbacks from users, representing user groups with varying activity levels. Guided by this critic, we devise a population of distinct actors aimed at effective and fine-grained exploration within its respective user group. To simultaneously enhance diversity and stability during the exploration process, we further introduce a population-level diversity regularization term and a supervision module. Experimental results on public recommendation datasets demonstrate that our approach outperforms all other baselines in terms of long-term performance, validating its user-oriented exploration effectiveness. Meanwhile, further analyses reveal our approach's benefits of improved performance for low-activity users as well as increased fairness among users.

On Optimization of Next-Generation Microservice-Based Core Networks

  • Authors: Authors: Andrea Tassi, Daniel Warren, Yue Wang, Deval Bhamare, Rasoul Behravesh
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2401.09062
  • Pdf link: https://arxiv.org/pdf/2401.09062
  • Abstract Next-generation mobile core networks are required to be scalable and capable of efficiently utilizing heterogeneous bare metal resources that may include edge servers. To this end, microservice-based solutions where control plane procedures are deconstructed in their fundamental building blocks are gaining momentum. This letter proposes an optimization framework delivering the partitioning and mapping of large-scale microservice graphs onto heterogeneous bare metal deployments while minimizing the total network traffic among servers. An efficient heuristic strategy for solving the optimization problem is also provided. Simulation results show that, with the proposed framework, a microservice-based core can consistently support the requested load in heterogeneous bare metal deployments even when alternative architecture fails. Besides, our framework ensures an overall reduction in the control plane-related network traffic if compared to current core architectures.

Performance Bounds and Optimization for CSI-Ratio based Bi-static Doppler Sensing in ISAC Systems

  • Authors: Authors: Yanmo Hu, Kai Wu, J. Andrew Zhang, Weibo Deng, Y. Jay Guo
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2401.09064
  • Pdf link: https://arxiv.org/pdf/2401.09064
  • Abstract Bi-static sensing is crucial for exploring the potential of networked sensing capabilities in integrated sensing and communications (ISAC). However, it suffers from the challenging clock asynchronism issue. CSI ratio-based sensing is an effective means to address the issue. Its performance bounds, particular for Doppler sensing, have not been fully understood yet. This work endeavors to fill the research gap. Focusing on a single dynamic path in high-SNR scenarios, we derive the closed-form CRB. Then, through analyzing the mutual interference between dynamic and static paths, we simplify the CRB results by deriving close approximations, further unveiling new insights of the impact of numerous physical parameters on Doppler sensing. Moreover, utilizing the new CRB and analyses, we propose novel waveform optimization strategies for noise- and interference-limited sensing scenarios, which are also empowered by closed-form and efficient solutions. Extensive simulation results are provided to validate the preciseness of the derived CRB results and analyses, with the aid of the maximum-likelihood estimator. The results also demonstrate the substantial enhanced Doppler sensing accuracy and the sensing capabilities for low-speed target achieved by the proposed waveform design.

DTMM: Deploying TinyML Models on Extremely Weak IoT Devices with Pruning

  • Authors: Authors: Lixiang Han, Zhen Xiao, Zhenjiang Li
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2401.09068
  • Pdf link: https://arxiv.org/pdf/2401.09068
  • Abstract DTMM is a library designed for efficient deployment and execution of machine learning models on weak IoT devices such as microcontroller units (MCUs). The motivation for designing DTMM comes from the emerging field of tiny machine learning (TinyML), which explores extending the reach of machine learning to many low-end IoT devices to achieve ubiquitous intelligence. Due to the weak capability of embedded devices, it is necessary to compress models by pruning enough weights before deploying. Although pruning has been studied extensively on many computing platforms, two key issues with pruning methods are exacerbated on MCUs: models need to be deeply compressed without significantly compromising accuracy, and they should perform efficiently after pruning. Current solutions only achieve one of these objectives, but not both. In this paper, we find that pruned models have great potential for efficient deployment and execution on MCUs. Therefore, we propose DTMM with pruning unit selection, pre-execution pruning optimizations, runtime acceleration, and post-execution low-cost storage to fill the gap for efficient deployment and execution of pruned models. It can be integrated into commercial ML frameworks for practical deployment, and a prototype system has been developed. Extensive experiments on various models show promising gains compared to state-of-the-art methods.

A five field formulation for flow simulations in porous media with fractures and barriers via an optimization based domain decomposition method

  • Authors: Authors: Stefano Scialò
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2401.09072
  • Pdf link: https://arxiv.org/pdf/2401.09072
  • Abstract The present work deals with the numerical resolution of coupled 3D-2D problems arising from the simulation of fluid flow in fractured porous media modeled via the Discrete Fracture and Matrix (DFM) model. According to the DFM model, fractures are represented as planar interfaces immersed in a 3D porous matrix and can behave as preferential flow paths, in the case of conductive fractures, or can actually be a barrier for the flow, when, instead, the permeability in the normal-to-fracture direction is small compared to the permeability of the matrix. Consequently, the pressure solution in a DFM can be discontinuous across a barrier, as a result of the geometrical dimensional reduction operated on the fracture. The present work is aimed at developing a numerical scheme suitable for the simulation of the flow in a DFM with fractures and barriers, using a mesh for the 3D matrix non conforming to the fractures and that is ready for domain decomposition. This is achieved starting from a PDE-constrained optimization method, currently available in literature only for conductive fractures in a DFM. First, a novel formulation of the optimization problem is defined to account for non permeable fractures. These are described by a filtration-like coupling at the interface with the surrounding porous matrix. Also the extended finite element method with discontinuous enrichment functions is used to reproduce the pressure solution in the matrix around a barrier. The method is presented here in its simplest form, for clarity of exposition, i.e. considering the case of a single fracture in a 3D domain, also providing a proof of the well posedness of the resulting discrete problem. Four validation examples are proposed to show the viability and the effectiveness of the method.

SM$^3$: Self-Supervised Multi-task Modeling with Multi-view 2D Images for Articulated Objects

  • Authors: Authors: Haowen Wang, Zhen Zhao, Zhao Jin, Zhengping Che, Liang Qiao, Yakun Huang, Zhipeng Fan, Xiuquan Qiao, Jian Tang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2401.09133
  • Pdf link: https://arxiv.org/pdf/2401.09133
  • Abstract Reconstructing real-world objects and estimating their movable joint structures are pivotal technologies within the field of robotics. Previous research has predominantly focused on supervised approaches, relying on extensively annotated datasets to model articulated objects within limited categories. However, this approach falls short of effectively addressing the diversity present in the real world. To tackle this issue, we propose a self-supervised interaction perception method, referred to as SM$^3$, which leverages multi-view RGB images captured before and after interaction to model articulated objects, identify the movable parts, and infer the parameters of their rotating joints. By constructing 3D geometries and textures from the captured 2D images, SM$^3$ achieves integrated optimization of movable part and joint parameters during the reconstruction process, obviating the need for annotations. Furthermore, we introduce the MMArt dataset, an extension of PartNet-Mobility, encompassing multi-view and multi-modal data of articulated objects spanning diverse categories. Evaluations demonstrate that SM$^3$ surpasses existing benchmarks across various categories and objects, while its adaptability in real-world scenarios has been thoroughly validated.

Asynchronous Local-SGD Training for Language Modeling

  • Authors: Authors: Bo Liu, Rachita Chhaparia, Arthur Douillard, Satyen Kale, Andrei A. Rusu, Jiajun Shen, Arthur Szlam, Marc'Aurelio Ranzato
  • Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2401.09135
  • Pdf link: https://arxiv.org/pdf/2401.09135
  • Abstract Local stochastic gradient descent (Local-SGD), also referred to as federated averaging, is an approach to distributed optimization where each device performs more than one SGD update per communication. This work presents an empirical study of {\it asynchronous} Local-SGD for training language models; that is, each worker updates the global parameters as soon as it has finished its SGD steps. We conduct a comprehensive investigation by examining how worker hardware heterogeneity, model size, number of workers, and optimizer could impact the learning performance. We find that with naive implementations, asynchronous Local-SGD takes more iterations to converge than its synchronous counterpart despite updating the (global) model parameters more frequently. We identify momentum acceleration on the global parameters when worker gradients are stale as a key challenge. We propose a novel method that utilizes a delayed Nesterov momentum update and adjusts the workers' local training steps based on their computation speed. This approach, evaluated with models up to 150M parameters on the C4 dataset, matches the performance of synchronous Local-SGD in terms of perplexity per update step, and significantly surpasses it in terms of wall clock time.

Scalable Resource Provisioning for Multi-user Communications in Next Generation Networks

  • Authors: Authors: Augusto Neto, Eduardo Cerqueira, Marilia Curado, Edmundo Monteiro, Paulo Mendes
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2401.09231
  • Pdf link: https://arxiv.org/pdf/2401.09231
  • Abstract The great demand for real-time multimedia sessions encompassing groups of users (multi-user), associated with the limitations of the current Internet in providing quality assurance, has raised challenges for defining the best mechanisms to deploy the Next Generation of Networks (NGN). There is a consensus that an efficient and scalable provisioning of network resources is crucial for the success of the NGN, mainly in what concerns access networks. Previous solutions for the control of multi-user sessions rely mostly on uncoordinated actions to allocate per-flow bandwidth and multicast trees. This paper introduces a Multiuser Aggregated Resource Allocation mechanism (MARA) that coordinates the control of class-based bandwidth and multicast resources in a scalable manner. In comparison with previous work, MARA significantly reduces signaling, state and processing overhead. The performance benefits of MARA are analyzed though simulations, which successfully demonstrated the significant optimization in the network performance.

Bridging the Gap Between General and Down-Closed Convex Sets in Submodular Maximization

  • Authors: Authors: Loay Mualem, Murad Tukan, Moran Fledman
  • Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2401.09251
  • Pdf link: https://arxiv.org/pdf/2401.09251
  • Abstract Optimization of DR-submodular functions has experienced a notable surge in significance in recent times, marking a pivotal development within the domain of non-convex optimization. Motivated by real-world scenarios, some recent works have delved into the maximization of non-monotone DR-submodular functions over general (not necessarily down-closed) convex set constraints. Up to this point, these works have all used the minimum $\ell_\infty$ norm of any feasible solution as a parameter. Unfortunately, a recent hardness result due to Mualem & Feldman~\cite{mualem2023resolving} shows that this approach cannot yield a smooth interpolation between down-closed and non-down-closed constraints. In this work, we suggest novel offline and online algorithms that provably provide such an interpolation based on a natural decomposition of the convex body constraint into two distinct convex bodies: a down-closed convex body and a general convex body. We also empirically demonstrate the superiority of our proposed algorithms across three offline and two online applications.

A First-Order Multi-Gradient Algorithm for Multi-Objective Bi-Level Optimization

  • Authors: Authors: Feiyang Ye, Baijiong Lin, Xiaofeng Cao, Yu Zhang, Ivor Tsang
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.09257
  • Pdf link: https://arxiv.org/pdf/2401.09257
  • Abstract In this paper, we study the Multi-Objective Bi-Level Optimization (MOBLO) problem, where the upper-level subproblem is a multi-objective optimization problem and the lower-level subproblem is for scalar optimization. Existing gradient-based MOBLO algorithms need to compute the Hessian matrix, causing the computational inefficient problem. To address this, we propose an efficient first-order multi-gradient method for MOBLO, called FORUM. Specifically, we reformulate MOBLO problems as a constrained multi-objective optimization (MOO) problem via the value-function approach. Then we propose a novel multi-gradient aggregation method to solve the challenging constrained MOO problem. Theoretically, we provide the complexity analysis to show the efficiency of the proposed method and a non-asymptotic convergence result. Empirically, extensive experiments demonstrate the effectiveness and efficiency of the proposed FORUM method in different learning problems. In particular, it achieves state-of-the-art performance on three multi-task learning benchmark datasets.

Adaptive Regret for Bandits Made Possible: Two Queries Suffice

  • Authors: Authors: Zhou Lu, Qiuyi Zhang, Xinyi Chen, Fred Zhang, David Woodruff, Elad Hazan
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.09278
  • Pdf link: https://arxiv.org/pdf/2401.09278
  • Abstract Fast changing states or volatile environments pose a significant challenge to online optimization, which needs to perform rapid adaptation under limited observation. In this paper, we give query and regret optimal bandit algorithms under the strict notion of strongly adaptive regret, which measures the maximum regret over any contiguous interval $I$. Due to its worst-case nature, there is an almost-linear $\Omega(|I|^{1-\epsilon})$ regret lower bound, when only one query per round is allowed [Daniely el al, ICML 2015]. Surprisingly, with just two queries per round, we give Strongly Adaptive Bandit Learner (StABL) that achieves $\tilde{O}(\sqrt{n|I|})$ adaptive regret for multi-armed bandits with $n$ arms. The bound is tight and cannot be improved in general. Our algorithm leverages a multiplicative update scheme of varying stepsizes and a carefully chosen observation distribution to control the variance. Furthermore, we extend our results and provide optimal algorithms in the bandit convex optimization setting. Finally, we empirically demonstrate the superior performance of our algorithms under volatile environments and for downstream tasks, such as algorithm selection for hyperparameter optimization.

Synthesizing Hardware-Software Leakage Contracts for RISC-V Open-Source Processors

  • Authors: Authors: Gideon Mohr, Marco Guarnieri, Jan Reineke
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2401.09383
  • Pdf link: https://arxiv.org/pdf/2401.09383
  • Abstract Microarchitectural attacks compromise security by exploiting software-visible artifacts of microarchitectural optimizations such as caches and speculative execution. Defending against such attacks at the software level requires an appropriate abstraction at the instruction set architecture (ISA) level that captures microarchitectural leakage. Hardware-software leakage contracts have recently been proposed as such an abstraction. In this paper, we propose a semi-automatic methodology for synthesizing hardware-software leakage contracts for open-source microarchitectures. For a given ISA, our approach relies on human experts to (a) capture the space of possible contracts in the form of contract templates and (b) devise a test-case generation strategy to explore a microarchitecture's potential leakage. For a given implementation of an ISA, these two ingredients are then used to automatically synthesize the most precise leakage contract that is satisfied by the microarchitecture. We have instantiated this methodology for the RISC-V ISA and applied it to the Ibex and CVA6 open-source processors. Our experiments demonstrate the practical applicability of the methodology and uncover subtle and unexpected leaks.

Keyword: adam

MADA: Meta-Adaptive Optimizers through hyper-gradient Descent

  • Authors: Authors: Kaan Ozkara, Can Karakus, Parameswaran Raman, Mingyi Hong, Shoham Sabach, Branislav Kveton, Volkan Cevher
  • Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2401.08893
  • Pdf link: https://arxiv.org/pdf/2401.08893
  • Abstract Since Adam was introduced, several novel adaptive optimizers for deep learning have been proposed. These optimizers typically excel in some tasks but may not outperform Adam uniformly across all tasks. In this work, we introduce Meta-Adaptive Optimizers (MADA), a unified optimizer framework that can generalize several known optimizers and dynamically learn the most suitable one during training. The key idea in MADA is to parameterize the space of optimizers and search through it using hyper-gradient descent. Numerical results suggest that MADA is robust against sub-optimally tuned hyper-parameters, and outperforms Adam, Lion, and Adan with their default hyper-parameters, often even with optimized hyper-parameters. We also propose AVGrad, a variant of AMSGrad where the maximum operator is replaced with averaging, and observe that it performs better within MADA. Finally, we provide a convergence analysis to show that interpolation of optimizers (specifically, AVGrad and Adam) can improve their error bounds (up to constants), hinting at an advantage for meta-optimizers.

Keyword: gradient

Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning

  • Authors: Authors: Maxence Faldor, Félix Chalumeau, Manon Flageat, Antoine Cully
  • Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2401.08632
  • Pdf link: https://arxiv.org/pdf/2401.08632
  • Abstract A fundamental trait of intelligence involves finding novel and creative solutions to address a given challenge or to adapt to unforeseen situations. Reflecting this, Quality-Diversity optimization is a family of Evolutionary Algorithms, that generates collections of both diverse and high-performing solutions. Among these, MAP-Elites is a prominent example, that has been successfully applied to a variety of domains, including evolutionary robotics. However, MAP-Elites performs a divergent search with random mutations originating from Genetic Algorithms, and thus, is limited to evolving populations of low-dimensional solutions. PGA-MAP-Elites overcomes this limitation using a gradient-based variation operator inspired by deep reinforcement learning which enables the evolution of large neural networks. Although high-performing in many environments, PGA-MAP-Elites fails on several tasks where the convergent search of the gradient-based variation operator hinders diversity. In this work, we present three contributions: (1) we enhance the Policy Gradient variation operator with a descriptor-conditioned critic that reconciles diversity search with gradient-based methods, (2) we leverage the actor-critic training to learn a descriptor-conditioned policy at no additional cost, distilling the knowledge of the population into one single versatile policy that can execute a diversity of behaviors, (3) we exploit the descriptor-conditioned actor by injecting it in the population, despite network architecture differences. Our method, DCG-MAP-Elites, achieves equal or higher QD score and coverage compared to all baselines on seven challenging continuous control locomotion tasks.

Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks

  • Authors: Authors: Chenyu Zhang, Lanjun Wang, Anan Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2401.08725
  • Pdf link: https://arxiv.org/pdf/2401.08725
  • Abstract Recent developments in text-to-image models, particularly Stable Diffusion, have marked significant achievements in various applications. With these advancements, there are growing safety concerns about the vulnerability of the model that malicious entities exploit to generate targeted harmful images. However, the existing methods in the vulnerability of the model mainly evaluate the alignment between the prompt and generated images, but fall short in revealing the vulnerability associated with targeted image generation. In this study, we formulate the problem of targeted adversarial attack on Stable Diffusion and propose a framework to generate adversarial prompts. Specifically, we design a gradient-based embedding optimization method to craft reliable adversarial prompts that guide stable diffusion to generate specific images. Furthermore, after obtaining successful adversarial prompts, we reveal the mechanisms that cause the vulnerability of the model. Extensive experiments on two targeted attack tasks demonstrate the effectiveness of our method in targeted attacks. The code can be obtained in https://github.com/datar001/Revealing-Vulnerabilities-in-Stable-Diffusion-via-Targeted-Attacks.

Bag of Tricks to Boost Adversarial Transferability

  • Authors: Authors: Zeliang Zhang, Rongyi Zhu, Wei Yao, Xiaosen Wang, Chenliang Xu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.08734
  • Pdf link: https://arxiv.org/pdf/2401.08734
  • Abstract Deep neural networks are widely known to be vulnerable to adversarial examples. However, vanilla adversarial examples generated under the white-box setting often exhibit low transferability across different models. Since adversarial transferability poses more severe threats to practical applications, various approaches have been proposed for better transferability, including gradient-based, input transformation-based, and model-related attacks, \etc. In this work, we find that several tiny changes in the existing adversarial attacks can significantly affect the attack performance, \eg, the number of iterations and step size. Based on careful studies of existing adversarial attacks, we propose a bag of tricks to enhance adversarial transferability, including momentum initialization, scheduled step size, dual example, spectral-based input transformation, and several ensemble strategies. Extensive experiments on the ImageNet dataset validate the high effectiveness of our proposed tricks and show that combining them can further boost adversarial transferability. Our work provides practical insights and techniques to enhance adversarial transferability, and offers guidance to improve the attack performance on the real-world application through simple adjustments.

Robust Localization of Key Fob Using Channel Impulse Response of Ultra Wide Band Sensors for Keyless Entry Systems

  • Authors: Authors: Abhiram Kolli, Filippo Casamassima, Horst Possegger, Horst Bischof
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2401.08863
  • Pdf link: https://arxiv.org/pdf/2401.08863
  • Abstract Using neural networks for localization of key fob within and surrounding a car as a security feature for keyless entry is fast emerging. In this paper we study: 1) the performance of pre-computed features of neural networks based UWB (ultra wide band) localization classification forming the baseline of our experiments. 2) Investigate the inherent robustness of various neural networks; therefore, we include the study of robustness of the adversarial examples without any adversarial training in this work. 3) Propose a multi-head self-supervised neural network architecture which outperforms the baseline neural networks without any adversarial training. The model's performance improved by 67% at certain ranges of adversarial magnitude for fast gradient sign method and 37% each for basic iterative method and projected gradient descent method.

MADA: Meta-Adaptive Optimizers through hyper-gradient Descent

  • Authors: Authors: Kaan Ozkara, Can Karakus, Parameswaran Raman, Mingyi Hong, Shoham Sabach, Branislav Kveton, Volkan Cevher
  • Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2401.08893
  • Pdf link: https://arxiv.org/pdf/2401.08893
  • Abstract Since Adam was introduced, several novel adaptive optimizers for deep learning have been proposed. These optimizers typically excel in some tasks but may not outperform Adam uniformly across all tasks. In this work, we introduce Meta-Adaptive Optimizers (MADA), a unified optimizer framework that can generalize several known optimizers and dynamically learn the most suitable one during training. The key idea in MADA is to parameterize the space of optimizers and search through it using hyper-gradient descent. Numerical results suggest that MADA is robust against sub-optimally tuned hyper-parameters, and outperforms Adam, Lion, and Adan with their default hyper-parameters, often even with optimized hyper-parameters. We also propose AVGrad, a variant of AMSGrad where the maximum operator is replaced with averaging, and observe that it performs better within MADA. Finally, we provide a convergence analysis to show that interpolation of optimizers (specifically, AVGrad and Adam) can improve their error bounds (up to constants), hinting at an advantage for meta-optimizers.

Bridging State and History Representations: Understanding Self-Predictive RL

  • Authors: Authors: Tianwei Ni, Benjamin Eysenbach, Erfan Seyedsalehi, Michel Ma, Clement Gehring, Aditya Mahajan, Pierre-Luc Bacon
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2401.08898
  • Pdf link: https://arxiv.org/pdf/2401.08898
  • Abstract Representations are at the core of all deep reinforcement learning (RL) methods for both Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). Many representation learning methods and theoretical frameworks have been developed to understand what constitutes an effective representation. However, the relationships between these methods and the shared properties among them remain unclear. In this paper, we show that many of these seemingly distinct methods and frameworks for state and history abstractions are, in fact, based on a common idea of self-predictive abstraction. Furthermore, we provide theoretical insights into the widely adopted objectives and optimization, such as the stop-gradient technique, in learning self-predictive representations. These findings together yield a minimalist algorithm to learn self-predictive representations for states and histories. We validate our theories by applying our algorithm to standard MDPs, MDPs with distractors, and POMDPs with sparse rewards. These findings culminate in a set of practical guidelines for RL practitioners.

Characterising Gradients for Unsupervised Accuracy Estimation under Distribution Shift

  • Authors: Authors: Renchunzi Xie, Ambroise Odonnat, Vasilii Feofanov, Ievgen Redko, Jianfeng Zhang, Bo An
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.08909
  • Pdf link: https://arxiv.org/pdf/2401.08909
  • Abstract Estimating test accuracy without access to the ground-truth test labels under varying test environments is a challenging, yet extremely important problem in the safe deployment of machine learning algorithms. Existing works rely on the information from either the outputs or the extracted features of neural networks to formulate an estimation score correlating with the ground-truth test accuracy. In this paper, we investigate--both empirically and theoretically--how the information provided by the gradients can be predictive of the ground-truth test accuracy even under a distribution shift. Specifically, we use the norm of classification-layer gradients, backpropagated from the cross-entropy loss after only one gradient step over test data. Our key idea is that the model should be adjusted with a higher magnitude of gradients when it does not generalize to the test dataset with a distribution shift. We provide theoretical insights highlighting the main ingredients of such an approach ensuring its empirical success. Extensive experiments conducted on diverse distribution shifts and model structures demonstrate that our method significantly outperforms state-of-the-art algorithms.

ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization

  • Authors: Authors: Weiyao Wang, Pierre Gleize, Hao Tang, Xingyu Chen, Kevin J Liang, Matt Feiszli
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2401.08937
  • Pdf link: https://arxiv.org/pdf/2401.08937
  • Abstract Neural Radiance Fields (NeRF) exhibit remarkable performance for Novel View Synthesis (NVS) given a set of 2D images. However, NeRF training requires accurate camera pose for each input view, typically obtained by Structure-from-Motion (SfM) pipelines. Recent works have attempted to relax this constraint, but they still often rely on decent initial poses which they can refine. Here we aim at removing the requirement for pose initialization. We present Incremental CONfidence (ICON), an optimization procedure for training NeRFs from 2D video frames. ICON only assumes smooth camera motion to estimate initial guess for poses. Further, ICON introduces ``confidence": an adaptive measure of model quality used to dynamically reweight gradients. ICON relies on high-confidence poses to learn NeRF, and high-confidence 3D structure (as encoded by NeRF) to learn poses. We show that ICON, without prior pose initialization, achieves superior performance in both CO3D and HO3D versus methods which use SfM pose.

Fast parallel sampling under isoperimetry

  • Authors: Authors: Nima Anari, Sinho Chewi, Thuy-Duong Vuong
  • Subjects: Data Structures and Algorithms (cs.DS); Statistics Theory (math.ST); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2401.09016
  • Pdf link: https://arxiv.org/pdf/2401.09016
  • Abstract We show how to sample in parallel from a distribution $\pi$ over $\mathbb R^d$ that satisfies a log-Sobolev inequality and has a smooth log-density, by parallelizing the Langevin (resp. underdamped Langevin) algorithms. We show that our algorithm outputs samples from a distribution $\hat\pi$ that is close to $\pi$ in Kullback--Leibler (KL) divergence (resp. total variation (TV) distance), while using only $\log(d)^{O(1)}$ parallel rounds and $\widetilde{O}(d)$ (resp. $\widetilde O(\sqrt d)$) gradient evaluations in total. This constitutes the first parallel sampling algorithms with TV distance guarantees. For our main application, we show how to combine the TV distance guarantees of our algorithms with prior works and obtain RNC sampling-to-counting reductions for families of discrete distribution on the hypercube ${\pm 1}^n$ that are closed under exponential tilts and have bounded covariance. Consequently, we obtain an RNC sampler for directed Eulerian tours and asymmetric determinantal point processes, resolving open questions raised in prior works.

Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation

  • Authors: Authors: Tong Xie, Haoyu Li, Andrew Bai, Cho-Jui Hsieh
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.09031
  • Pdf link: https://arxiv.org/pdf/2401.09031
  • Abstract Data attribution methods trace model behavior back to its training dataset, offering an effective approach to better understand ``black-box'' neural networks. While prior research has established quantifiable links between model output and training data in diverse settings, interpreting diffusion model outputs in relation to training samples remains underexplored. In particular, diffusion models operate over a sequence of timesteps instead of instantaneous input-output relationships in previous contexts, posing a significant challenge to extend existing frameworks to diffusion models directly. Notably, we present Diffusion-TracIn that incorporates this temporal dynamics and observe that samples' loss gradient norms are highly dependent on timestep. This trend leads to a prominent bias in influence estimation, and is particularly noticeable for samples trained on large-norm-inducing timesteps, causing them to be generally influential. To mitigate this effect, we introduce Diffusion-ReTrac as a re-normalized adaptation that enables the retrieval of training samples more targeted to the test sample of interest, facilitating a localized measurement of influence and considerably more intuitive visualization. We demonstrate the efficacy of our approach through various evaluation metrics and auxiliary tasks, reducing the amount of generally influential samples to $\frac{1}{3}$ of its original quantity.

Asynchronous Local-SGD Training for Language Modeling

  • Authors: Authors: Bo Liu, Rachita Chhaparia, Arthur Douillard, Satyen Kale, Andrei A. Rusu, Jiajun Shen, Arthur Szlam, Marc'Aurelio Ranzato
  • Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2401.09135
  • Pdf link: https://arxiv.org/pdf/2401.09135
  • Abstract Local stochastic gradient descent (Local-SGD), also referred to as federated averaging, is an approach to distributed optimization where each device performs more than one SGD update per communication. This work presents an empirical study of {\it asynchronous} Local-SGD for training language models; that is, each worker updates the global parameters as soon as it has finished its SGD steps. We conduct a comprehensive investigation by examining how worker hardware heterogeneity, model size, number of workers, and optimizer could impact the learning performance. We find that with naive implementations, asynchronous Local-SGD takes more iterations to converge than its synchronous counterpart despite updating the (global) model parameters more frequently. We identify momentum acceleration on the global parameters when worker gradients are stale as a key challenge. We propose a novel method that utilizes a delayed Nesterov momentum update and adjusts the workers' local training steps based on their computation speed. This approach, evaluated with models up to 150M parameters on the C4 dataset, matches the performance of synchronous Local-SGD in terms of perplexity per update step, and significantly surpasses it in terms of wall clock time.

Beyond Anti-Forgetting: Multimodal Continual Instruction Tuning with Positive Forward Transfer

  • Authors: Authors: Junhao Zheng, Qianli Ma, Zhen Liu, Binquan Wu, Huawen Feng
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.09181
  • Pdf link: https://arxiv.org/pdf/2401.09181
  • Abstract Multimodal Continual Instruction Tuning (MCIT) enables Multimodal Large Language Models (MLLMs) to meet continuously emerging requirements without expensive retraining. MCIT faces two major obstacles: catastrophic forgetting (where old knowledge is forgotten) and negative forward transfer (where the performance of future tasks is degraded). Although existing methods have greatly alleviated catastrophic forgetting, they still suffer from negative forward transfer. By performing singular value decomposition (SVD) on input embeddings, we discover a large discrepancy in different input embeddings. The discrepancy results in the model learning irrelevant information for old and pre-trained tasks, which leads to catastrophic forgetting and negative forward transfer. To address these issues, we propose Fwd-Prompt, a prompt-based method projecting prompt gradient to the residual space to minimize the interference between tasks and to the pre-trained subspace for reusing pre-trained knowledge. Our experiments demonstrate that Fwd-Prompt achieves state-of-the-art performance while updating fewer parameters and requiring no old samples. Our research sheds light on the potential of continuously adapting MLLMs to new tasks under the instruction tuning paradigm and encourages future studies to explore MCIT. The code will soon be publicly available.

A First-Order Multi-Gradient Algorithm for Multi-Objective Bi-Level Optimization

  • Authors: Authors: Feiyang Ye, Baijiong Lin, Xiaofeng Cao, Yu Zhang, Ivor Tsang
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.09257
  • Pdf link: https://arxiv.org/pdf/2401.09257
  • Abstract In this paper, we study the Multi-Objective Bi-Level Optimization (MOBLO) problem, where the upper-level subproblem is a multi-objective optimization problem and the lower-level subproblem is for scalar optimization. Existing gradient-based MOBLO algorithms need to compute the Hessian matrix, causing the computational inefficient problem. To address this, we propose an efficient first-order multi-gradient method for MOBLO, called FORUM. Specifically, we reformulate MOBLO problems as a constrained multi-objective optimization (MOO) problem via the value-function approach. Then we propose a novel multi-gradient aggregation method to solve the challenging constrained MOO problem. Theoretically, we provide the complexity analysis to show the efficiency of the proposed method and a non-asymptotic convergence result. Empirically, extensive experiments demonstrate the effectiveness and efficiency of the proposed FORUM method in different learning problems. In particular, it achieves state-of-the-art performance on three multi-task learning benchmark datasets.

Randomized Kaczmarz with geometrically smoothed momentum

  • Authors: Authors: Seth J. Alderman, Roan W. Luikart, Nicholas F. Marshall
  • Subjects: Numerical Analysis (math.NA); Probability (math.PR); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2401.09415
  • Pdf link: https://arxiv.org/pdf/2401.09415
  • Abstract This paper studies the effect of adding geometrically smoothed momentum to the randomized Kaczmarz algorithm, which is an instance of stochastic gradient descent on a linear least squares loss function. We prove a result about the expected error in the direction of singular vectors of the matrix defining the least squares loss. We present several numerical examples illustrating the utility of our result and pose several questions.

Keyword: super-resolution

Robust DOA estimation using deep acoustic imaging

  • Authors: Authors: Adrian S. Roman, Iran R. Roman, Juan P. Bello
  • Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2401.08717
  • Pdf link: https://arxiv.org/pdf/2401.08717
  • Abstract Direction of arrival estimation (DoAE) aims at tracking a sound in azimuth and elevation. Recent advancements include data-driven models with inputs derived from ambisonics intensity vectors or correlations between channels in a microphone array. A spherical intensity map (SIM), or acoustic image, is an alternative input representation that remains underexplored. SIMs benefit from high-resolution microphone arrays, yet most DoAE datasets use low-resolution ones. Therefore, we first propose a super-resolution method to upsample low-resolution microphones. Next, we benchmark DoAE models that use SIMs as input. We arrive to a model that uses SIMs for DoAE estimation and outperforms a baseline and a state-of-the-art model. Our study highlights the relevance of acoustic imaging for DoAE tasks.

Efficient Image Super-Resolution via Symmetric Visual Attention Network

  • Authors: Authors: Chengxu Wu, Qinrui Fan, Shu Hu, Xi Wu, Xin Wang, Jing Hu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
  • Arxiv link: https://arxiv.org/abs/2401.08913
  • Pdf link: https://arxiv.org/pdf/2401.08913
  • Abstract An important development direction in the Single-Image Super-Resolution (SISR) algorithms is to improve the efficiency of the algorithms. Recently, efficient Super-Resolution (SR) research focuses on reducing model complexity and improving efficiency through improved deep small kernel convolution, leading to a small receptive field. The large receptive field obtained by large kernel convolution can significantly improve image quality, but the computational cost is too high. To improve the reconstruction details of efficient super-resolution reconstruction, we propose a Symmetric Visual Attention Network (SVAN) by applying large receptive fields. The SVAN decomposes a large kernel convolution into three different combinations of convolution operations and combines them with an attention mechanism to form a Symmetric Large Kernel Attention Block (SLKAB), which forms a symmetric attention block with a bottleneck structure by the size of the receptive field in the convolution combination to extract depth features effectively as the basic component of the SVAN. Our network gets a large receptive field while minimizing the number of parameters and improving the perceptual ability of the model. The experimental results show that the proposed SVAN can obtain high-quality super-resolution reconstruction results using only about 30% of the parameters of existing SOTA methods.

zoq avatar Jan 18 '24 07:01 zoq