arxiv-updates icon indicating copy to clipboard operation
arxiv-updates copied to clipboard

New submissions for Fri, 29 Dec 23

Open zoq opened this issue 1 year ago • 0 comments

Keyword: sgd

Dynamic Sub-graph Distillation for Robust Semi-supervised Continual Learning

  • Authors: Authors: Yan Fan, Yu Wang, Pengfei Zhu, Qinghua Hu
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.16409
  • Pdf link: https://arxiv.org/pdf/2312.16409
  • Abstract Continual learning (CL) has shown promising results and comparable performance to learning at once in a fully supervised manner. However, CL strategies typically require a large number of labeled samples, making their real-life deployment challenging. In this work, we focus on semi-supervised continual learning (SSCL), where the model progressively learns from partially labeled data with unknown categories. We provide a comprehensive analysis of SSCL and demonstrate that unreliable distributions of unlabeled data lead to unstable training and refinement of the progressing stages. This problem severely impacts the performance of SSCL. To address the limitations, we propose a novel approach called Dynamic Sub-Graph Distillation (DSGD) for semi-supervised continual learning, which leverages both semantic and structural information to achieve more stable knowledge distillation on unlabeled data and exhibit robustness against distribution bias. Firstly, we formalize a general model of structural distillation and design a dynamic graph construction for the continual learning progress. Next, we define a structure distillation vector and design a dynamic sub-graph distillation algorithm, which enables end-to-end training and adaptability to scale up tasks. The entire proposed method is adaptable to various CL methods and supervision settings. Finally, experiments conducted on three datasets CIFAR10, CIFAR100, and ImageNet-100, with varying supervision ratios, demonstrate the effectiveness of our proposed approach in mitigating the catastrophic forgetting problem in semi-supervised continual learning scenarios.

Keyword: optimization

GreenFlow: A Computation Allocation Framework for Building Environmentally Sound Recommendation System

  • Authors: Authors: Xingyu Lu, Zhining Liu, Yanchu Guan, Hongxuan Zhang, Chenyi Zhuang, Wenqi Ma, Yize Tan, Jinjie Gu, Guannan Zhang
  • Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.16176
  • Pdf link: https://arxiv.org/pdf/2312.16176
  • Abstract Given the enormous number of users and items, industrial cascade recommendation systems (RS) are continuously expanded in size and complexity to deliver relevant items, such as news, services, and commodities, to the appropriate users. In a real-world scenario with hundreds of thousands requests per second, significant computation is required to infer personalized results for each request, resulting in a massive energy consumption and carbon emission that raises concern. This paper proposes GreenFlow, a practical computation allocation framework for RS, that considers both accuracy and carbon emission during inference. For each stage (e.g., recall, pre-ranking, ranking, etc.) of a cascade RS, when a user triggers a request, we define two actions that determine the computation: (1) the trained instances of models with different computational complexity; and (2) the number of items to be inferred in the stage. We refer to the combinations of actions in all stages as action chains. A reward score is estimated for each action chain, followed by dynamic primal-dual optimization considering both the reward and computation budget. Extensive experiments verify the effectiveness of the framework, reducing computation consumption by 41% in an industrial mobile application while maintaining commercial revenue. Moreover, the proposed framework saves approximately 5000kWh of electricity and reduces 3 tons of carbon emissions per day.

A Method for Auto-Differentiation of the Voronoi Tessellation

  • Authors: Authors: Sergei Shumilin, Alexander Ryabov, Evgeny Burnaev, Vladimir Vanovskii
  • Subjects: Computational Geometry (cs.CG); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.16192
  • Pdf link: https://arxiv.org/pdf/2312.16192
  • Abstract Voronoi tessellation, also known as Voronoi diagram, is an important computational geometry technique that has applications in various scientific disciplines. It involves dividing a given space into regions based on the proximity to a set of points. Autodifferentiation is a powerful tool for solving optimization tasks. Autodifferentiation assumes constructing a computational graph that allows to compute gradients using backpropagation algorithm. However, often the Voronoi tessellation remains the only non-differentiable part of a pipeline, prohibiting end-to-end differentiation. We present the method for autodifferentiation of the 2D Voronoi tessellation. The method allows one to construct the Voronoi tessellation and pass gradients, making the construction end-to-end differentiable. We provide the implementation details and present several important applications. To the best of our knowledge this is the first autodifferentiable realization of the Voronoi tessellation providing full set of Voronoi geometrical parameters in a differentiable way.

Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks

  • Authors: Authors: Christian Simon, Sen He, Juan-Manuel Perez-Rua, Frost Xu, Amine Benhalloum, Tao Xiang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.16218
  • Pdf link: https://arxiv.org/pdf/2312.16218
  • Abstract Solving image-to-3D from a single view is an ill-posed problem, and current neural reconstruction methods addressing it through diffusion models still rely on scene-specific optimization, constraining their generalization capability. To overcome the limitations of existing approaches regarding generalization and consistency, we introduce a novel neural rendering technique. Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks. Specifically, our method builds neural encoding volumes from generated multi-view inputs. We adjust the weights of the SDF network conditioned on an input image at test-time to allow model adaptation to novel scenes in a feed-forward manner via HyperNetworks. To mitigate artifacts derived from the synthesized views, we propose the use of a volume transformer module to improve the aggregation of image features instead of processing each viewpoint separately. Through our proposed method, dubbed as Hyper-VolTran, we avoid the bottleneck of scene-specific optimization and maintain consistency across the images generated from multiple viewpoints. Our experiments show the advantages of our proposed approach with consistent results and rapid generation.

Revisiting Knowledge Distillation under Distribution Shift

  • Authors: Authors: Songming Zhang, Ziyu Lyu, Xiaofeng Chen
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.16242
  • Pdf link: https://arxiv.org/pdf/2312.16242
  • Abstract Knowledge distillation transfers knowledge from large models into small models, and has recently made remarkable achievements. However, few studies has investigated the mechanism of knowledge distillation against distribution shift. Distribution shift refers to the data distribution drifts between training and testing phases. In this paper, we reconsider the paradigm of knowledge distillation by reformulating the objective function in shift situations. Under the real scenarios, we propose a unified and systematic framework to benchmark knowledge distillation against two general distributional shifts including diversity and correlation shift. The evaluation benchmark covers more than 30 methods from algorithmic, data-driven, and optimization perspectives for five benchmark datasets. Overall, we conduct extensive experiments on the student model. We reveal intriguing observations of poor teaching performance under distribution shifts; in particular, complex algorithms and data augmentation offer limited gains in many cases.

iKUN: Speak to Trackers without Retraining

  • Authors: Authors: Yunhao Du, Cheng Lei, Zhicheng Zhao, Fei Su
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.16245
  • Pdf link: https://arxiv.org/pdf/2312.16245
  • Abstract Referring multi-object tracking (RMOT) aims to track multiple objects based on input textual descriptions. Previous works realize it by simply integrating an extra textual module into the multi-object tracker. However, they typically need to retrain the entire framework and have difficulties in optimization. In this work, we propose an insertable Knowledge Unification Network, termed iKUN, to enable communication with off-the-shelf trackers in a plug-and-play manner. Concretely, a knowledge unification module (KUM) is designed to adaptively extract visual features based on textual guidance. Meanwhile, to improve the localization accuracy, we present a neural version of Kalman filter (NKF) to dynamically adjust process noise and observation noise based on the current motion status. Moreover, to address the problem of open-set long-tail distribution of textual descriptions, a test-time similarity calibration method is proposed to refine the confidence score with pseudo frequency. Extensive experiments on Refer-KITTI dataset verify the effectiveness of our framework. Finally, to speed up the development of RMOT, we also contribute a more challenging dataset, Refer-Dance, by extending public DanceTrack dataset with motion and dressing descriptions. The code and dataset will be released in https://github.com/dyhBUPT/iKUN.

360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception

  • Authors: Authors: Zhijie Shen, Chunyu Lin, Junsong Zhang, Lang Nie, Kang Liao, Yao Zhao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.16268
  • Pdf link: https://arxiv.org/pdf/2312.16268
  • Abstract Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results as the compression process often muddles the semantics between various planes. Besides, these data-driven approaches impose an urgent demand for massive data annotations, which are laborious and time-consuming. For the first problem, we propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics. DOPNet consists of three modules that are integrated to deliver distortion-free, semantics-clean, and detail-sharp disentangled representations, which benefit the subsequent layout recovery. For the second problem, we present an unsupervised adaptation technique tailored for horizon-depth and ratio representations. Concretely, we introduce an optimization strategy for decision-level layout analysis and a 1D cost volume construction method for feature-level multi-view aggregation, both of which are designed to fully exploit the geometric consistency across multiple perspectives. The optimizer provides a reliable set of pseudo-labels for network training, while the 1D cost volume enriches each view with comprehensive scene information derived from other perspectives. Extensive experiments demonstrate that our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.

Coordination and Machine Learning in Multi-Robot Systems: Applications in Robotic Soccer

  • Authors: Authors: Luis Paulo Reis
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2312.16273
  • Pdf link: https://arxiv.org/pdf/2312.16273
  • Abstract This paper presents the concepts of Artificial Intelligence, Multi-Agent-Systems, Coordination, Intelligent Robotics and Deep Reinforcement Learning. Emphasis is given on and how AI and DRL, may be efficiently used to create efficient robot skills and coordinated robotic teams, capable of performing very complex actions and tasks, such as playing a game of soccer. The paper also presents the concept of robotic soccer and the vision and structure of the RoboCup initiative with emphasis on the Humanoid Simulation 3D league and the new challenges this competition, poses. The final topics presented at the paper are based on the research developed/coordinated by the author throughout the last 22 years in the context of the FCPortugal project. The paper presents a short description of the coordination methodologies developed, such as: Strategy, Tactics, Formations, Setplays, and Coaching Languages and the use of Machine Learning to optimize the use of this concepts. The topics presented also include novel stochastic search algorithms for black box optimization and their use in the optimization of omnidirectional walking skills, robotic multi-agent learning and the creation of a humanoid kick with controlled distance. Finally, new applications using variations of the Proximal Policy Optimization algorithm and advanced modelling for robot and multi-robot learning are briefly explained with emphasis for our new humanoid sprinting and running skills and an amazing humanoid robot soccer dribbling skill. FCPortugal project enabled us to publish more than 100 papers and win several competitions in different leagues and many scientific awards at RoboCup. In total, our team won more than 40 awards in international competitions including a clear victory at the Simulation 3D League at RoboCup 2022 competition, scoring 84 goals and conceding only 2.

Unifying Static and Dynamic Intermediate Languages for Accelerator Generators

  • Authors: Authors: Caleb Kim, Pai Li, Anshuman Mohan, Andrew Butt, Adrian Sampson, Rachit Nigam
  • Subjects: Programming Languages (cs.PL); Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2312.16300
  • Pdf link: https://arxiv.org/pdf/2312.16300
  • Abstract Compilers for accelerator design languages (ADLs) translate high-level languages into application-specific hardware. ADL compilers rely on a hardware control interface to compose hardware units. There are two choices: static control, which relies on cycle-level timing; or dynamic control, which uses explicit signalling to avoid depending on timing details. Static control is efficient but brittle; dynamic control incurs hardware costs to support compositional reasoning. Piezo is an ADL compiler that unifies static and dynamic control in a single intermediate language (IL). Its key insight is that the IL's static fragment is a refinement of its dynamic fragment: static code admits a subset of the run-time behaviors of the dynamic equivalent. Piezo can optimize code by combining facts from static and dynamic submodules, and it opportunistically converts code from dynamic to static control styles. We implement Piezo as an extension to an existing dynamic ADL compiler, Calyx. We use Piezo to implement an MLIR frontend, a systolic array generator, and a packet-scheduling hardware generator to demonstrate its optimizations and the static-dynamic interactions it enables.

In-Hand 3D Object Reconstruction from a Monocular RGB Video

  • Authors: Authors: Shijian Jiang, Qi Ye, Rengan Xie, Yuchi Huo, Xiang Li, Yang Zhou, Jiming Chen
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.16425
  • Pdf link: https://arxiv.org/pdf/2312.16425
  • Abstract Our work aims to reconstruct a 3D object that is held and rotated by a hand in front of a static RGB camera. Previous methods that use implicit neural representations to recover the geometry of a generic hand-held object from multi-view images achieved compelling results in the visible part of the object. However, these methods falter in accurately capturing the shape within the hand-object contact region due to occlusion. In this paper, we propose a novel method that deals with surface reconstruction under occlusion by incorporating priors of 2D occlusion elucidation and physical contact constraints. For the former, we introduce an object amodal completion network to infer the 2D complete mask of objects under occlusion. To ensure the accuracy and view consistency of the predicted 2D amodal masks, we devise a joint optimization method for both amodal mask refinement and 3D reconstruction. For the latter, we impose penetration and attraction constraints on the local geometry in contact regions. We evaluate our approach on HO3D and HOD datasets and demonstrate that it outperforms the state-of-the-art methods in terms of reconstruction surface quality, with an improvement of $52%$ on HO3D and $20%$ on HOD. Project webpage: https://east-j.github.io/ihor.

Preference as Reward, Maximum Preference Optimization with Importance Sampling

  • Authors: Authors: Zaifan Jiang, Xing Huang, Chao Wei
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2312.16430
  • Pdf link: https://arxiv.org/pdf/2312.16430
  • Abstract Preference learning is a key technology for aligning language models with human values. Reinforcement Learning from Human Feedback (RLHF) is a model based algorithm to optimize preference learning, which first fitting a reward model for preference score, and then optimizing generating policy with on-policy PPO algorithm to maximize the reward. The processing of RLHF is complex, time-consuming and unstable. Direct Preference Optimization (DPO) algorithm using off-policy algorithm to direct optimize generating policy and eliminating the need for reward model, which is data efficient and stable. DPO use Bradley-Terry model and log-loss which leads to over-fitting to the preference data at the expense of ignoring KL-regularization term when preference near deterministic. IPO uses a root-finding pairwise MSE loss to solve the ignoring KL-regularization problem, and learning an optimal policy. But IPO's pairwise loss still can't s make the KL-regularization to work. In this paper, we design a simple and intuitive off-policy preferences optimization algorithm from an importance sampling view, and add an off-policy KL-regularization term which makes KL-regularization truly effective. To simplify the learning process and save memory usage, we can generate regularization data in advance, which eliminate the needs for both reward model and reference policy in the stage of optimization.

Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

  • Authors: Authors: Guojian Wang, Faguo Wu, Xiao Zhang, Ning Guo, Zhiming Zheng
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.16456
  • Pdf link: https://arxiv.org/pdf/2312.16456
  • Abstract Deep reinforcement learning (DRL) faces significant challenges in addressing the hard-exploration problems in tasks with sparse or deceptive rewards and large state spaces. These challenges severely limit the practical application of DRL. Most previous exploration methods relied on complex architectures to estimate state novelty or introduced sensitive hyperparameters, resulting in instability. To mitigate these issues, we propose an efficient adaptive trajectory-constrained exploration strategy for DRL. The proposed method guides the policy of the agent away from suboptimal solutions by leveraging incomplete offline demonstrations as references. This approach gradually expands the exploration scope of the agent and strives for optimality in a constrained optimization manner. Additionally, we introduce a novel policy-gradient-based optimization algorithm that utilizes adaptively clipped trajectory-distance rewards for both single- and multi-agent reinforcement learning. We provide a theoretical analysis of our method, including a deduction of the worst-case approximation error bounds, highlighting the validity of our approach for enhancing exploration. To evaluate the effectiveness of the proposed method, we conducted experiments on two large 2D grid world mazes and several MuJoCo tasks. The extensive experimental results demonstrate the significant advantages of our method in achieving temporally extended exploration and avoiding myopic and suboptimal behaviors in both single- and multi-agent settings. Notably, the specific metrics and quantifiable results further support these findings. The code used in the study is available at \url{https://github.com/buaawgj/TACE}.

Multi-Contact Whole Body Force Control for Position-Controlled Robots

  • Authors: Authors: Quentin Rouxel (LARSEN), Serena Ivaldi (LARSEN), Jean-Baptiste Mouret (LARSEN)
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2312.16465
  • Pdf link: https://arxiv.org/pdf/2312.16465
  • Abstract Many humanoid and multi-legged robots are controlled in positions rather than in torques, preventing direct control of contact forces, and hampering their ability to create multiple contacts to enhance their balance, such as placing a hand on a wall or a handrail. This paper introduces the SEIKO (Sequential Equilibrium Inverse Kinematic Optimization) pipeline, drawing inspiration from flexibility models used in serial elastic actuators to indirectly control contact forces on traditional position-controlled robots. SEIKO formulates whole-body retargeting from Cartesian commands and admittance control using two quadratic programs solved in real time. We validated our pipeline with experiments on the real, full-scale humanoid robot Talos in various multicontact scenarios, including pushing tasks, far-reaching tasks, stair climbing, and stepping on sloped surfaces. This work opens the possibility of stable, contact-rich behaviors while getting around many of the challenges of torque-controlled robots. Code and videos are available at https://hucebot.github.io/seiko_controller_website/ .

Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation

  • Authors: Authors: Zhuohang Dang, Minnan Luo, Chengyou Jia, Guang Dai, Xiaojun Chang, Jingdong Wang
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.16478
  • Pdf link: https://arxiv.org/pdf/2312.16478
  • Abstract Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. Recently, to alleviate expensive data collection, co-occurring pairs from the Internet are automatically harvested for training. However, it inevitably includes mismatched pairs, \ie, noisy correspondences, undermining supervision reliability and degrading performance. Current methods leverage deep neural networks' memorization effect to address noisy correspondences, which overconfidently focus on \emph{similarity-guided training with hard negatives} and suffer from self-reinforcing errors. In light of above, we introduce a novel noisy correspondence learning framework, namely \textbf{S}elf-\textbf{R}einforcing \textbf{E}rrors \textbf{M}itigation (SREM). Specifically, by viewing sample matching as classification tasks within the batch, we generate classification logits for the given sample. Instead of a single similarity score, we refine sample filtration through energy uncertainty and estimate model's sensitivity of selected clean samples using swapped classification entropy, in view of the overall prediction distribution. Additionally, we propose cross-modal biased complementary learning to leverage negative matches overlooked in hard-negative training, further improving model optimization stability and curbing self-reinforcing errors. Extensive experiments on challenging benchmarks affirm the efficacy and efficiency of SREM.

A Theoretical Analysis of Efficiency Constrained Utility-Privacy Bi-Objective Optimization in Federated Learning

  • Authors: Authors: Hanlin Gu (1), Xinyuan Zhao (2), Yuxing Han (2), Yan Kang (1), Lixin Fan (1), Qiang Yang (1 and 3) ((1) WeBank, China, (2) Tsinghua University, China, (3) Hong Kong University of Science and Technology, China)
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2312.16554
  • Pdf link: https://arxiv.org/pdf/2312.16554
  • Abstract Federated learning (FL) enables multiple clients to collaboratively learn a shared model without sharing their individual data. Concerns about utility, privacy, and training efficiency in FL have garnered significant research attention. Differential privacy has emerged as a prevalent technique in FL, safeguarding the privacy of individual user data while impacting utility and training efficiency. Within Differential Privacy Federated Learning (DPFL), previous studies have primarily focused on the utility-privacy trade-off, neglecting training efficiency, which is crucial for timely completion. Moreover, differential privacy achieves privacy by introducing controlled randomness (noise) on selected clients in each communication round. Previous work has mainly examined the impact of noise level ($\sigma$) and communication rounds ($T$) on the privacy-utility dynamic, overlooking other influential factors like the sample ratio ($q$, the proportion of selected clients). This paper systematically formulates an efficiency-constrained utility-privacy bi-objective optimization problem in DPFL, focusing on $\sigma$, $T$, and $q$. We provide a comprehensive theoretical analysis, yielding analytical solutions for the Pareto front. Extensive empirical experiments verify the validity and efficacy of our analysis, offering valuable guidance for low-cost parameter design in DPFL.

Optimal Beamforming Structure and Efficient Optimization Algorithms for Generalized Multi-Group Multicast Beamforming Optimization

  • Authors: Authors: Tianyu Fang, Yijie Mao
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2312.16559
  • Pdf link: https://arxiv.org/pdf/2312.16559
  • Abstract In this work, we focus on solving non-smooth non-convex maximization problems in multi-group multicast transmission. Leveraging Karush-Kuhn-Tucker (KKT) optimality conditions and successive incumbent transcending (SIT) duality, we thoroughly analyze the optimal beamforming structure for a set of optimization problems characterized by a general utility-based objective function. By exploiting the identified optimal structure, we further unveil inherent low-dimensional beamforming structures within the problems, which are asymptotically optimal in various regimes of transmit signal-to-noise ratios (SNRs) or the number of transmit antennas. Building upon the discovered optimal and low-dimensional beamforming structures, we then propose highly efficient and toolbox-free optimization algorithms to solve a specific multi-group multicast optimization problem based on the weighted sum rate (WSR) utility function. The proposed algorithms first use the cyclic maximization (CM) framework to decompose the problem into multiple subproblems, each has an optimal or low-dimensional closed-form beamforming solution structure. Then, we propose the projected adaptive gradient descent (PAGD) algorithm to compute the optimal Lagrangian dual variables for each subproblem. Numerical results show that the proposed algorithms maintain comparable or improved WSR performance compared to baseline algorithms, while dramatically reducing the computational complexity. Notably, the proposed ultra-low-complexity algorithms based on low-dimensional beamforming structures achieve near optimal WSR performance with extremely low computational complexity. This complexity remains independent of the number of transmit antennas, making them promising and practical for extremely large multiple-input multiple-output (XL-MIMO) applications in 6G.

GRSDet: Learning to Generate Local Reverse Samples for Few-shot Object Detection

  • Authors: Authors: Hefei Mei, Taijin Zhao, Shiyuan Tang, Heqian Qiu, Lanxiao Wang, Minjian Zhang, Fanman Meng, Hongliang Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.16571
  • Pdf link: https://arxiv.org/pdf/2312.16571
  • Abstract Few-shot object detection (FSOD) aims to achieve object detection only using a few novel class training data. Most of the existing methods usually adopt a transfer-learning strategy to construct the novel class distribution by transferring the base class knowledge. However, this direct way easily results in confusion between the novel class and other similar categories in the decision space. To address the problem, we propose generating local reverse samples (LRSamples) in Prototype Reference Frames to adaptively adjust the center position and boundary range of the novel class distribution to learn more discriminative novel class samples for FSOD. Firstly, we propose a Center Calibration Variance Augmentation (CCVA) module, which contains the selection rule of LRSamples, the generator of LRSamples, and augmentation on the calibrated distribution centers. Specifically, we design an intra-class feature converter (IFC) as the generator of CCVA to learn the selecting rule. By transferring the knowledge of IFC from the base training to fine-tuning, the IFC generates plentiful novel samples to calibrate the novel class distribution. Moreover, we propose a Feature Density Boundary Optimization (FDBO) module to adaptively adjust the importance of samples depending on their distance from the decision boundary. It can emphasize the importance of the high-density area of the similar class (closer decision boundary area) and reduce the weight of the low-density area of the similar class (farther decision boundary area), thus optimizing a clearer decision boundary for each category. We conduct extensive experiments to demonstrate the effectiveness of our proposed method. Our method achieves consistent improvement on the Pascal VOC and MS COCO datasets based on DeFRCN and MFDC baselines.

Observation-based Optimal Control Law Learning with LQR Reconstruction

  • Authors: Authors: Chendi Qu, Jianping He, Xiaoming Duan
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2312.16572
  • Pdf link: https://arxiv.org/pdf/2312.16572
  • Abstract Designing controllers to generate various trajectories has been studied for years, while recently, recovering an optimal controller from trajectories receives increasing attention. In this paper, we reveal that the inherent linear quadratic regulator (LQR) problem of a moving agent can be reconstructed based on its trajectory observations only, which enables one to learn the optimal control law of the agent autonomously. Specifically, the reconstruction of the optimization problem requires estimation of three unknown parameters including the target state, weighting matrices in the objective function and the control horizon. Our algorithm considers two types of objective function settings and identifies the weighting matrices with proposed novel inverse optimal control methods, providing the well-posedness and identifiability proof. We obtain the optimal estimate of the control horizon using binary search and finally reconstruct the LQR problem with above estimates. The strength of learning control law with optimization problem recovery lies in less computation consumption and strong generalization ability. We apply our algorithm to the future control input prediction and the discrepancy loss is further derived. Numerical simulations and hardware experiments on a self-designed robot platform illustrate the effectiveness of our work.

Dual-Functional Artificial Noise (DFAN) Aided Robust Covert Communications in Integrated Sensing and Communications

  • Authors: Authors: Runzhe Tang, Long Yang, Lv Lu, Zheng Zhang, Yuanwei Liu, Jian Chen
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2312.16621
  • Pdf link: https://arxiv.org/pdf/2312.16621
  • Abstract This paper investigates covert communications in an integrated sensing and communications system, where a dual-functional base station (called Alice) covertly transmits signals to a covert user (called Bob) while sensing multiple targets, with one of them acting as a potential watcher (called Willie) and maliciously eavesdropping on legitimate communications. To shelter the covert communications, Alice transmits additional dual-functional artificial noise (DFAN) with a varying power not only to create uncertainty at Willie's signal reception to confuse Willie but also to sense the targets simultaneously. Based on this framework, the weighted sum of the sensing beampattern means square error (MSE) and cross correlation is minimized by jointly optimizing the covert communication and DFAN signals subject to the minimum covert rate requirement. The robust design considers both cases of imperfect Willie's CSI (WCSI) and statistical WCSI. Under the worst-case assumption that Willie can adaptively adjust the detection threshold to achieve the best detection performance, the minimum detection error probability (DEP) at Willie is analytically derived in the closed-form expression. The formulated covertness constrained optimization problems are tackled by a feasibility-checking based difference-of-convex relaxation (DC) algorithm utilizing the S-procedure, Bernstein-type inequality, and the DC method. Simulation results validate the feasibility of the proposed scheme and demonstrate the covertness performance gains achieved by our proposed design over various benchmarks.

Some things are more CRINGE than others: Preference Optimization with the Pairwise Cringe Loss

  • Authors: Authors: Jing Xu, Andrew Lee, Sainbayar Sukhbaatar, Jason Weston
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2312.16682
  • Pdf link: https://arxiv.org/pdf/2312.16682
  • Abstract Practitioners commonly align large language models using pairwise preferences, i.e., given labels of the type response A is preferred to response B for a given input. Perhaps less commonly, methods have also been developed for binary feedback, i.e. training models given labels of type response A is good or bad. We show how an existing performant binary feedback method, the Cringe Loss (Adolphs et al., 2022), can be generalized to the pairwise preference setting using a simple soft margin extension. Pairwise Cringe Loss is straightforward to implement and efficient to train, and we find it outperforms state-of-the-art preference optimization algorithms such as PPO and DPO on the AlpacaFarm benchmark.

Performance Comparison of Session-based Recommendation Algorithms based on GNNs

  • Authors: Authors: Faisal Shehzad, Dietmar Jannach
  • Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2312.16695
  • Pdf link: https://arxiv.org/pdf/2312.16695
  • Abstract In session-based recommendation settings, a recommender system has to base its suggestions on the user interactions that are ob served in an ongoing session. Since such sessions can consist of only a small set of interactions, various approaches based on Graph Neural Networks (GNN) were recently proposed, as they allow us to integrate various types of side information about the items in a natural way. Unfortunately, a variety of evaluation settings are used in the literature, e.g., in terms of protocols, metrics and baselines, making it difficult to assess what represents the state of the art. In this work, we present the results of an evaluation of eight recent GNN-based approaches that were published in high-quality outlets. For a fair comparison, all models are systematically tuned and tested under identical conditions using three common datasets. We furthermore include k-nearest-neighbor and sequential rules-based models as baselines, as such models have previously exhibited competitive performance results for similar settings. To our surprise, the evaluation showed that the simple models outperform all recent GNN models in terms of the Mean Reciprocal Rank, which we used as an optimization criterion, and were only outperformed in three cases in terms of the Hit Rate. Additional analyses furthermore reveal that several other factors that are often not deeply discussed in papers, e.g., random seeds, can markedly impact the performance of GNN-based models. Our results therefore (a) point to continuing issues in the community in terms of research methodology and (b) indicate that there is ample room for improvement in session-based recommendation.

HMP: Hand Motion Priors for Pose and Shape Estimation from Video

  • Authors: Authors: Enes Duran, Muhammed Kocabas, Vasileios Choutas, Zicong Fan, Michael J. Black
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.16737
  • Pdf link: https://arxiv.org/pdf/2312.16737
  • Abstract Understanding how humans interact with the world necessitates accurate 3D hand pose estimation, a task complicated by the hand's high degree of articulation, frequent occlusions, self-occlusions, and rapid motions. While most existing methods rely on single-image inputs, videos have useful cues to address aforementioned issues. However, existing video-based 3D hand datasets are insufficient for training feedforward models to generalize to in-the-wild scenarios. On the other hand, we have access to large human motion capture datasets which also include hand motions, e.g. AMASS. Therefore, we develop a generative motion prior specific for hands, trained on the AMASS dataset which features diverse and high-quality hand motions. This motion prior is then employed for video-based 3D hand motion estimation following a latent optimization approach. Our integration of a robust motion prior significantly enhances performance, especially in occluded scenarios. It produces stable, temporally consistent results that surpass conventional single-frame methods. We demonstrate our method's efficacy via qualitative and quantitative evaluations on the HO3D and DexYCB datasets, with special emphasis on an occlusion-focused subset of HO3D. Code is available at https://hmp.is.tue.mpg.de

Bayesian Sensor Placement for Multi-source Localization of Pathogens in Wastewater Networks

  • Authors: Authors: Kalvik Jakkala, Srinivas Akella
  • Subjects: Social and Information Networks (cs.SI); Computational Engineering, Finance, and Science (cs.CE); Physics and Society (physics.soc-ph)
  • Arxiv link: https://arxiv.org/abs/2312.16750
  • Pdf link: https://arxiv.org/pdf/2312.16750
  • Abstract Wastewater monitoring is an effective approach for the early detection of viral and bacterial disease outbreaks. It has recently been used to identify the presence of individuals infected with COVID-19. To monitor large communities and accurately localize buildings with infected individuals with a limited number of sensors, one must carefully choose the sampling locations in wastewater networks. We also have to account for concentration requirements on the collected wastewater samples to ensure reliable virus presence test results. We model this as a sensor placement problem. Although sensor placement for source localization arises in numerous problems, most approaches use application-specific heuristics and fail to consider multiple source scenarios. To address these limitations, we develop a novel approach that combines Bayesian networks and discrete optimization to efficiently identify informative sensor placements and accurately localize virus sources. Our approach also takes into account concentration requirements on wastewater samples during optimization. Our simulation experiments demonstrate the quality of our sensor placements and the accuracy of our source localization approach. Furthermore, we show the robustness of our approach to discrepancies between the virus outbreak model and the actual outbreak rates.

Adaptive Anytime Multi-Agent Path Finding Using Bandit-Based Large Neighborhood Search

  • Authors: Authors: Thomy Phan, Taoan Huang, Bistra Dilkina, Sven Koenig
  • Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2312.16767
  • Pdf link: https://arxiv.org/pdf/2312.16767
  • Abstract Anytime multi-agent path finding (MAPF) is a promising approach to scalable path optimization in large-scale multi-agent systems. State-of-the-art anytime MAPF is based on Large Neighborhood Search (LNS), where a fast initial solution is iteratively optimized by destroying and repairing a fixed number of parts, i.e., the neighborhood, of the solution, using randomized destroy heuristics and prioritized planning. Despite their recent success in various MAPF instances, current LNS-based approaches lack exploration and flexibility due to greedy optimization with a fixed neighborhood size which can lead to low quality solutions in general. So far, these limitations have been addressed with extensive prior effort in tuning or offline machine learning beyond actual planning. In this paper, we focus on online learning in LNS and propose Bandit-based Adaptive LArge Neighborhood search Combined with Exploration (BALANCE). BALANCE uses a bi-level multi-armed bandit scheme to adapt the selection of destroy heuristics and neighborhood sizes on the fly during search. We evaluate BALANCE on multiple maps from the MAPF benchmark set and empirically demonstrate cost improvements of at least 50% compared to state-of-the-art anytime MAPF in large-scale scenarios. We find that Thompson Sampling performs particularly well compared to alternative multi-armed bandit algorithms.

Hidden Minima in Two-Layer ReLU Networks

  • Authors: Authors: Yossi Arjevani
  • Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2312.16819
  • Pdf link: https://arxiv.org/pdf/2312.16819
  • Abstract The optimization problem associated to fitting two-layer ReLU networks having $d$~inputs, $k$~neurons, and labels generated by a target network, is considered. Two categories of infinite families of minima, giving one minimum per $d$ and $k$, were recently found. The loss at minima belonging to the first category converges to zero as $d$ increases. In the second category, the loss remains bounded away from zero. That being so, how may one avoid minima belonging to the latter category? Fortunately, such minima are never detected by standard optimization methods. Motivated by questions concerning the nature of this phenomenon, we develop methods to study distinctive analytic properties of hidden minima. By existing analyses, the Hessian spectrum of both categories agree modulus $O(d^{-1/2})$-terms -- not promising. Thus, rather, our investigation proceeds by studying curves along which the loss is minimized or maximized, referred to as tangency arcs. We prove that pure, seemingly remote, group representation-theoretic considerations concerning the arrangement of subspaces invariant to the action of subgroups of $S_d$, the symmetry group over $d$ symbols, relative to ones fixed by the action yield a precise description of all finitely many admissible types of tangency arcs. The general results applied for the loss function reveal that arcs emanating from hidden minima differ, characteristically, by their structure and symmetry, precisely on account of the $O(d^{-1/2})$-eigenvalue terms absent in previous work, indicating the subtly of the analysis. The theoretical results, stated and proved for o-minimal structures, show that the set comprising all tangency arcs is topologically sufficiently tame, permitting a numerical construction of tangency arcs, and ultimately, a comparison of how minima from both categories are positioned relative to adjacent critical points.

Pareto-based Multi-Objective Recommender System with Forgetting Curve

  • Authors: Authors: Jipeng Jin, Zhaoxiang Zhang, Zhiheng Li, Xiaofeng Gao, Xiongwei Yang, Lei Xiao, Jie Jiang
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2312.16868
  • Pdf link: https://arxiv.org/pdf/2312.16868
  • Abstract Recommender systems with cascading architecture play an increasingly significant role in online recommendation platforms, where the approach to dealing with negative feedback is a vital issue. For instance, in short video platforms, users tend to quickly slip away from candidates that they feel aversive, and recommender systems are expected to receive these explicit negative feedbacks and make adjustments to avoid these recommendations. Considering recency effect in memories, we propose a forgetting model based on Ebbinghaus Forgetting Curve to cope with negative feedback. In addition, we introduce a Pareto optimization solver to guarantee a better trade-off between recency and model performance. In conclusion, we propose Pareto-based Multi-Objective Recommender System with forgetting curve (PMORS), which can be applied to any multi-objective recommendation and show sufficiently superiority when facing explicit negative feedback. We have conducted evaluations of PMORS and achieved favorable outcomes in short-video scenarios on both public dataset and industrial dataset. After being deployed on an online short video platform named WeChat Channels in May, 2023, PMORS has not only demonstrated promising results for both consistency and recency but also achieved an improvement of up to +1.45% GMV.

TypeEvalPy: A Micro-benchmarking Framework for Python Type Inference Tools

  • Authors: Authors: Ashwin Prasad Shivarpatna Venkatesh, Samkutty Sabu, Jiawei Wang, Amir M. Mir, Li Li, Eric Bodden
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2312.16882
  • Pdf link: https://arxiv.org/pdf/2312.16882
  • Abstract In light of the growing interest in type inference research for Python, both researchers and practitioners require a standardized process to assess the performance of various type inference techniques. This paper introduces TypeEvalPy, a comprehensive micro-benchmarking framework for evaluating type inference tools. TypeEvalPy contains 154 code snippets with 845 type annotations across 18 categories that target various Python features. The framework manages the execution of containerized tools, transforms inferred types into a standardized format, and produces meaningful metrics for assessment. Through our analysis, we compare the performance of six type inference tools, highlighting their strengths and limitations. Our findings provide a foundation for further research and optimization in the domain of Python type inference.

RLPlanner: Reinforcement Learning based Floorplanning for Chiplets with Fast Thermal Analysis

  • Authors: Authors: Yuanyuan Duan, Xingchen Liu, Zhiping Yu, Hanming Wu, Leilai Shao, Xiaolei Zhu
  • Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR)
  • Arxiv link: https://arxiv.org/abs/2312.16895
  • Pdf link: https://arxiv.org/pdf/2312.16895
  • Abstract Chiplet-based systems have gained significant attention in recent years due to their low cost and competitive performance. As the complexity and compactness of a chiplet-based system increase, careful consideration must be given to microbump assignments, interconnect delays, and thermal limitations during the floorplanning stage. This paper introduces RLPlanner, an efficient early-stage floorplanning tool for chiplet-based systems with a novel fast thermal evaluation method. RLPlanner employs advanced reinforcement learning to jointly minimize total wirelength and temperature. To alleviate the time-consuming thermal calculations, RLPlanner incorporates the developed fast thermal evaluation method to expedite the iterations and optimizations. Comprehensive experiments demonstrate that our proposed fast thermal evaluation method achieves a mean absolute error (MAE) of 0.25 K and delivers over 120x speed-up compared to the open-source thermal solver HotSpot. When integrated with our fast thermal evaluation method, RLPlanner achieves an average improvement of 20.28% in minimizing the target objective (a combination of wirelength and temperature), within a similar running time, compared to the classic simulated annealing method with HotSpot.

DOEPatch: Dynamically Optimized Ensemble Model for Adversarial Patches Generation

  • Authors: Authors: Wenyi Tan, Yang Li, Chenxing Zhao, Zhunga Liu, Quan Pan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.16907
  • Pdf link: https://arxiv.org/pdf/2312.16907
  • Abstract Object detection is a fundamental task in various applications ranging from autonomous driving to intelligent security systems. However, recognition of a person can be hindered when their clothing is decorated with carefully designed graffiti patterns, leading to the failure of object detection. To achieve greater attack potential against unknown black-box models, adversarial patches capable of affecting the outputs of multiple-object detection models are required. While ensemble models have proven effective, current research in the field of object detection typically focuses on the simple fusion of the outputs of all models, with limited attention being given to developing general adversarial patches that can function effectively in the physical world. In this paper, we introduce the concept of energy and treat the adversarial patches generation process as an optimization of the adversarial patches to minimize the total energy of the ``person'' category. Additionally, by adopting adversarial training, we construct a dynamically optimized ensemble model. During training, the weight parameters of the attacked target models are adjusted to find the balance point at which the generated adversarial patches can effectively attack all target models. We carried out six sets of comparative experiments and tested our algorithm on five mainstream object detection models. The adversarial patches generated by our algorithm can reduce the recognition accuracy of YOLOv2 and YOLOv3 to 13.19% and 29.20%, respectively. In addition, we conducted experiments to test the effectiveness of T-shirts covered with our adversarial patches in the physical world and could achieve that people are not recognized by the object detection model. Finally, leveraging the Grad-CAM tool, we explored the attack mechanism of adversarial patches from an energetic perspective.

A GAN-based Semantic Communication for Text without CSI

  • Authors: Authors: Jin Mao, Ke Xiong, Ming Liu, Zhijin Qin, Wei Chen, Pingyi Fan, Khaled Ben Letaief
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2312.16909
  • Pdf link: https://arxiv.org/pdf/2312.16909
  • Abstract Recently, semantic communication (SC) has been regarded as one of the potential paradigms of 6G. Current SC frameworks require channel state information (CSI) to handle severe signal distortion induced by channel fading. Since the channel estimation overhead for obtaining CSI cannot be neglected, we therefore propose a generative adversarial network (GAN) based SC framework (Ti-GSC) that doesn't require CSI. In Ti-GSC, two main modules, i.e., an autoencoder-based encoder-decoder module (AEDM) and a GAN-based signal distortion suppression module (GSDSM) are included where AEDM first encodes the data at the source before transmission, and then GSDSM suppresses the distortion of the received signals in both syntactic and semantic dimensions at the destination. At last, AEDM decodes the distortion-suppressed signal at the destination. To measure signal distortion, syntactic distortion and semantic distortion terms are newly added to the total loss function. To achieve better training results, joint optimization-based training (JOT) and alternating optimization-based training (AOT) are designed for the proposed Ti-GSC. Experimental results show that JOT is more efficient for Ti-GSC. Moreover, without CSI, bilingual evaluation understudy (BLEU) score achieved by Ti-GSC is about 40% and 62% higher than that achieved by existing SC frameworks in Rician and Rayleigh fading, respectively. (*Due to the notification of arXiv "The Abstract field cannot be longer than 1,920 characters", the appeared Abstract is shortened. For the full Abstract, please download the Article.)

Intelligent Surfaces Empowered Wireless Network:Recent Advances and The Road to 6G

  • Authors: Authors: Qingqing Wu, Beixiong Zheng, Changsheng You, Lipeng Zhu, Kaiming Shen, Xiaodan Shao, Weidong Mei, Boya Di, Hongliang Zhang, Ertugrul Basar, Lingyang Song, Marco Di Renzo, Zhi-Quan Luo, Rui Zhang
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2312.16918
  • Pdf link: https://arxiv.org/pdf/2312.16918
  • Abstract Intelligent surfaces (ISs) have emerged as a key technology to empower a wide range of appealing applications for wireless networks, due to their low cost, high energy efficiency, flexibility of deployment and capability of constructing favorable wireless channels/radio environments. Moreover, the recent advent of several new IS architectures further expanded their electromagnetic functionalities from passive reflection to active amplification, simultaneous reflection and refraction, as well as holographic beamforming. However, the research on ISs is still in rapid progress and there have been recent technological advances in ISs and their emerging applications that are worthy of a timely review. Thus, we provide in this paper a comprehensive survey on the recent development and advances of ISs aided wireless networks. Specifically, we start with an overview on the anticipated use cases of ISs in future wireless networks such as 6G, followed by a summary of the recent standardization activities related to ISs. Then, the main design issues of the commonly adopted reflection-based IS and their state-of-theart solutions are presented in detail, including reflection optimization, deployment, signal modulation, wireless sensing, and integrated sensing and communications. Finally, recent progress and new challenges in advanced IS architectures are discussed to inspire futrue research.

Efficient High-Quality Clustering for Large Bipartite Graphs

  • Authors: Authors: Renchi Yang, Jieming Shi
  • Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.16926
  • Pdf link: https://arxiv.org/pdf/2312.16926
  • Abstract A bipartite graph contains inter-set edges between two disjoint vertex sets, and is widely used to model real-world data, such as user-item purchase records, author-article publications, and biological interactions between drugs and proteins. k-Bipartite Graph Clustering (k-BGC) is to partition the target vertex set in a bipartite graph into k disjoint clusters. The clustering quality is important to the utility of k-BGC in various applications like social network analysis, recommendation systems, text mining, and bioinformatics, to name a few. Existing approaches to k-BGC either output clustering results with compromised quality due to inadequate exploitation of high-order information between vertices, or fail to handle sizable bipartite graphs with billions of edges. Motivated by this, this paper presents two efficient k-BGC solutions, HOPE and HOPE+, which achieve state-of-the-art performance on large-scale bipartite graphs. HOPE obtains high scalability and effectiveness through a new k-BGC problem formulation based on the novel notion of high-order perspective (HOP) vectors and an efficient technique for low-rank approximation of HOP vectors. HOPE+ further elevates the k-BGC performance to another level with a judicious problem transformation and a highly efficient two-stage optimization framework. Two variants, HOPE+ (FNEM) and HOPE+ (SNEM) are designed when either the Frobenius norm or spectral norm is applied in the transformation. Extensive experiments, comparing HOPE and HOPE+ against 13 competitors on 10 real-world datasets, exhibit that our solutions, especially HOPE+, are superior to existing methods in terms of result quality, while being up to orders of magnitude faster. On the largest dataset MAG with 1.1 billion edges, HOPE+ is able to produce clusters with the highest clustering accuracy within 31 minutes, which is unmatched by any existing solution for k-BGC.

A modified AAA algorithm for learning stable reduced-order models from data

  • Authors: Authors: Tommaso Bradde, Stefano Grivet-Talocia, Quirin Aumann, Ion Victor Gosea
  • Subjects: Numerical Analysis (math.NA); Systems and Control (eess.SY); Dynamical Systems (math.DS); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2312.16978
  • Pdf link: https://arxiv.org/pdf/2312.16978
  • Abstract In recent years, the Adaptive Antoulas-Anderson AAA algorithm has established itself as the method of choice for solving rational approximation problems. Data-driven Model Order Reduction (MOR) of large-scale Linear Time-Invariant (LTI) systems represents one of the many applications in which this algorithm has proven to be successful since it typically generates reduced-order models (ROMs) efficiently and in an automated way. Despite its effectiveness and numerical reliability, the classical AAA algorithm is not guaranteed to return a ROM that retains the same structural features of the underlying dynamical system, such as the stability of the dynamics. In this paper, we propose a novel algebraic characterization for the stability of ROMs with transfer function obeying the AAA barycentric structure. We use this characterization to formulate a set of convex constraints on the free coefficients of the AAA model that, whenever verified, guarantee by construction the asymptotic stability of the resulting ROM. We suggest how to embed such constraints within the AAA optimization routine, and we validate experimentally the effectiveness of the resulting algorithm, named stabAAA, over a set of relevant MOR applications.

PG-LBO: Enhancing High-Dimensional Bayesian Optimization with Pseudo-Label and Gaussian Process Guidance

  • Authors: Authors: Taicai Chen, Yue Duan, Dong Li, Lei Qi, Yinghuan Shi, Yang Gao
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2312.16983
  • Pdf link: https://arxiv.org/pdf/2312.16983
  • Abstract Variational Autoencoder based Bayesian Optimization (VAE-BO) has demonstrated its excellent performance in addressing high-dimensional structured optimization problems. However, current mainstream methods overlook the potential of utilizing a pool of unlabeled data to construct the latent space, while only concentrating on designing sophisticated models to leverage the labeled data. Despite their effective usage of labeled data, these methods often require extra network structures, additional procedure, resulting in computational inefficiency. To address this issue, we propose a novel method to effectively utilize unlabeled data with the guidance of labeled data. Specifically, we tailor the pseudo-labeling technique from semi-supervised learning to explicitly reveal the relative magnitudes of optimization objective values hidden within the unlabeled data. Based on this technique, we assign appropriate training weights to unlabeled data to enhance the construction of a discriminative latent space. Furthermore, we treat the VAE encoder and the Gaussian Process (GP) in Bayesian optimization as a unified deep kernel learning process, allowing the direct utilization of labeled data, which we term as Gaussian Process guidance. This directly and effectively integrates the goal of improving GP accuracy into the VAE training, thereby guiding the construction of the latent space. The extensive experiments demonstrate that our proposed method outperforms existing VAE-BO algorithms in various optimization scenarios. Our code will be published at https://github.com/TaicaiChen/PG-LBO.

Battery model impact on time-optimal co-design for electric racing cars: review and application

  • Authors: Authors: Giorgio Riva, Stefano Radrizzani, Giulio Panzani
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2312.17003
  • Pdf link: https://arxiv.org/pdf/2312.17003
  • Abstract The sustainable mobility trend touches the racing world as well, from the hybridization of Formula 1 (F1) and Le Mans Hypercars to the fully electric Formula E racing class. In this scenario, the research community is studying how to push electric racing vehicles to their limit, combining vehicle dynamics and energy management, to successfully solve the minimum lap time problem. Recently, this class of problems has been enlarged towards optimal sizing, with a particular interest in batteries, which represent the main bottleneck for electric vehicle performance. In this work, starting from a thorough review of literature approaches, we define a general optimization framework of minimum lap and race time problems for electric vehicles, suitable to figure out the impact of different modeling choices on both problem structure and optimal variables profiles. Exploiting a case study on Generation 3 (Gen 3) of Formula E cars, we delve into the impact of battery models' complexity on both optimal sizing and optimal battery usage. We show how highly detailed models are necessary to study the evolution of both battery and vehicle control variables during the race, while, simple models are more than sufficient to address the battery sizing problem.

On the rate of convergence of an over-parametrized Transformer classifier learned by gradient descent

  • Authors: Authors: Michael Kohler, Adam Krzyzak
  • Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2312.17007
  • Pdf link: https://arxiv.org/pdf/2312.17007
  • Abstract One of the most recent and fascinating breakthroughs in artificial intelligence is ChatGPT, a chatbot which can simulate human conversation. ChatGPT is an instance of GPT4, which is a language model based on generative gredictive gransformers. So if one wants to study from a theoretical point of view, how powerful such artificial intelligence can be, one approach is to consider transformer networks and to study which problems one can solve with these networks theoretically. Here it is not only important what kind of models these network can approximate, or how they can generalize their knowledge learned by choosing the best possible approximation to a concrete data set, but also how well optimization of such transformer network based on concrete data set works. In this article we consider all these three different aspects simultaneously and show a theoretical upper bound on the missclassification probability of a transformer network fitted to the observed data. For simplicity we focus in this context on transformer encoder networks which can be applied to define an estimate in the context of a classification problem involving natural language.

DreamGaussian4D: Generative 4D Gaussian Splatting

  • Authors: Authors: Jiawei Ren, Liang Pan, Jiaxiang Tang, Chi Zhang, Ang Cao, Gang Zeng, Ziwei Liu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
  • Arxiv link: https://arxiv.org/abs/2312.17142
  • Pdf link: https://arxiv.org/pdf/2312.17142
  • Abstract Remarkable progress has been made in 4D content generation recently. However, existing methods suffer from long optimization time, lack of motion controllability, and a low level of detail. In this paper, we introduce DreamGaussian4D, an efficient 4D generation framework that builds on 4D Gaussian Splatting representation. Our key insight is that the explicit modeling of spatial transformations in Gaussian Splatting makes it more suitable for the 4D generation setting compared with implicit representations. DreamGaussian4D reduces the optimization time from several hours to just a few minutes, allows flexible control of the generated 3D motion, and produces animated meshes that can be efficiently rendered in 3D engines.

Keyword: adam

Spectral approximation of $ψ$-fractional differential equation based on mapped Jacobi functions

  • Authors: Authors: Tinggang Zhao, Zhenyu Zhao, Changpin Li, Dongxia Li
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2312.16426
  • Pdf link: https://arxiv.org/pdf/2312.16426
  • Abstract Fractional calculus with respect to function $\psi$, also named as $\psi$-fractional calculus, generalizes the Hadamard and the Riemann-Liouville fractional calculi, which causes challenge in numerical treatment. In this paper we study spectral-type methods using mapped Jacobi functions (MJFs) as basis functions and obtain efficient algorithms to solve $\psi$-fractional differential equations. In particular, we setup the Petrov-Galerkin spectral method and spectral collocation method for initial and boundary value problems involving $\psi$-fractional derivatives. We develop basic approximation theory for the MJFs and conduct the error estimates of the derived methods. We also establish a recurrence relation to evaluate the collocation differentiation matrix for implementing the spectral collocation algorithm. Numerical examples confirm the theoretical results and demonstrate the effectiveness of the spectral and collocation methods.

Keyword: gradient

Learning to Infer Unobserved Behaviors: Estimating User's Preference for a Site over Other Sites

  • Authors: Authors: Atanu R Sinha, Tanay Anand, Paridhi Maheshwari, A V Lakshmy, Vishal Jain
  • Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2312.16177
  • Pdf link: https://arxiv.org/pdf/2312.16177
  • Abstract A site's recommendation system relies on knowledge of its users' preferences to offer relevant recommendations to them. These preferences are for attributes that comprise items and content shown on the site, and are estimated from the data of users' interactions with the site. Another form of users' preferences is material too, namely, users' preferences for the site over other sites, since that shows users' base level propensities to engage with the site. Estimating users' preferences for the site, however, faces major obstacles because (a) the focal site usually has no data of its users' interactions with other sites; these interactions are users' unobserved behaviors for the focal site; and (b) the Machine Learning literature in recommendation does not offer a model of this situation. Even if (b) is resolved, the problem in (a) persists since without access to data of its users' interactions with other sites, there is no ground truth for evaluation. Moreover, it is most useful when (c) users' preferences for the site can be estimated at the individual level, since the site can then personalize recommendations to individual users. We offer a method to estimate individual user's preference for a focal site, under this premise. In particular, we compute the focal site's share of a user's online engagements without any data from other sites. We show an evaluation framework for the model using only the focal site's data, allowing the site to test the model. We rely upon a Hierarchical Bayes Method and perform estimation in two different ways - Markov Chain Monte Carlo and Stochastic Gradient with Langevin Dynamics. Our results find good support for the approach to computing personalized share of engagement and for its evaluation.

A Method for Auto-Differentiation of the Voronoi Tessellation

  • Authors: Authors: Sergei Shumilin, Alexander Ryabov, Evgeny Burnaev, Vladimir Vanovskii
  • Subjects: Computational Geometry (cs.CG); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.16192
  • Pdf link: https://arxiv.org/pdf/2312.16192
  • Abstract Voronoi tessellation, also known as Voronoi diagram, is an important computational geometry technique that has applications in various scientific disciplines. It involves dividing a given space into regions based on the proximity to a set of points. Autodifferentiation is a powerful tool for solving optimization tasks. Autodifferentiation assumes constructing a computational graph that allows to compute gradients using backpropagation algorithm. However, often the Voronoi tessellation remains the only non-differentiable part of a pipeline, prohibiting end-to-end differentiation. We present the method for autodifferentiation of the 2D Voronoi tessellation. The method allows one to construct the Voronoi tessellation and pass gradients, making the construction end-to-end differentiable. We provide the implementation details and present several important applications. To the best of our knowledge this is the first autodifferentiable realization of the Voronoi tessellation providing full set of Voronoi geometrical parameters in a differentiable way.

Alternate Training of Shared and Task-Specific Parameters for Multi-Task Neural Networks

  • Authors: Authors: Stefania Bellavia, Francesco Della Santa, Alessandra Papini
  • Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2312.16340
  • Pdf link: https://arxiv.org/pdf/2312.16340
  • Abstract This paper introduces novel alternate training procedures for hard-parameter sharing Multi-Task Neural Networks (MTNNs). Traditional MTNN training faces challenges in managing conflicting loss gradients, often yielding sub-optimal performance. The proposed alternate training method updates shared and task-specific weights alternately, exploiting the multi-head architecture of the model. This approach reduces computational costs, enhances training regularization, and improves generalization. Convergence properties similar to those of the classical stochastic gradient method are established. Empirical experiments demonstrate delayed overfitting, improved prediction, and reduced computational demands. In summary, our alternate training procedures offer a promising advancement for the training of hard-parameter sharing MTNNs.

GAD-PVI: A General Accelerated Dynamic-Weight Particle-Based Variational Inference Framework

  • Authors: Authors: Fangyikang Wang, Huminhao Zhu, Chao Zhang, Hanbin Zhao, Hui Qian
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2312.16429
  • Pdf link: https://arxiv.org/pdf/2312.16429
  • Abstract Particle-based Variational Inference (ParVI) methods approximate the target distribution by iteratively evolving finite weighted particle systems. Recent advances of ParVI methods reveal the benefits of accelerated position update strategies and dynamic weight adjustment approaches. In this paper, we propose the first ParVI framework that possesses both accelerated position update and dynamical weight adjustment simultaneously, named the General Accelerated Dynamic-Weight Particle-based Variational Inference (GAD-PVI) framework. Generally, GAD-PVI simulates the semi-Hamiltonian gradient flow on a novel Information-Fisher-Rao space, which yields an additional decrease on the local functional dissipation. GAD-PVI is compatible with different dissimilarity functionals and associated smoothing approaches under three information metrics. Experiments on both synthetic and real-world data demonstrate the faster convergence and reduced approximation error of GAD-PVI methods over the state-of-the-art.

Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

  • Authors: Authors: Guojian Wang, Faguo Wu, Xiao Zhang, Ning Guo, Zhiming Zheng
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.16456
  • Pdf link: https://arxiv.org/pdf/2312.16456
  • Abstract Deep reinforcement learning (DRL) faces significant challenges in addressing the hard-exploration problems in tasks with sparse or deceptive rewards and large state spaces. These challenges severely limit the practical application of DRL. Most previous exploration methods relied on complex architectures to estimate state novelty or introduced sensitive hyperparameters, resulting in instability. To mitigate these issues, we propose an efficient adaptive trajectory-constrained exploration strategy for DRL. The proposed method guides the policy of the agent away from suboptimal solutions by leveraging incomplete offline demonstrations as references. This approach gradually expands the exploration scope of the agent and strives for optimality in a constrained optimization manner. Additionally, we introduce a novel policy-gradient-based optimization algorithm that utilizes adaptively clipped trajectory-distance rewards for both single- and multi-agent reinforcement learning. We provide a theoretical analysis of our method, including a deduction of the worst-case approximation error bounds, highlighting the validity of our approach for enhancing exploration. To evaluate the effectiveness of the proposed method, we conducted experiments on two large 2D grid world mazes and several MuJoCo tasks. The extensive experimental results demonstrate the significant advantages of our method in achieving temporally extended exploration and avoiding myopic and suboptimal behaviors in both single- and multi-agent settings. Notably, the specific metrics and quantifiable results further support these findings. The code used in the study is available at \url{https://github.com/buaawgj/TACE}.

Mobility and Cost Aware Inference Accelerating Algorithm for Edge Intelligence

  • Authors: Authors: Xin Yuan, Ning Li, kang Wei, Wenchao Xu, Quan Chen, Hao Chen, Song Guo
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2312.16497
  • Pdf link: https://arxiv.org/pdf/2312.16497
  • Abstract The edge intelligence (EI) has been widely applied recently. Spliting the model between device, edge server, and cloud can improve the performance of EI greatly. The model segmentation without user mobility has been investigated deeply by previous works. However, in most use cases of EI, the end devices are mobile. Only a few works have been carried out on this aspect. These works still have many issues, such as ignoring the energy consumption of mobile device, inappropriate network assumption, and low effectiveness on adaptiving user mobility, etc. Therefore, for addressing the disadvantages of model segmentation and resource allocation in previous works, we propose mobility and cost aware model segmentation and resource allocation algorithm for accelerating the inference at edge (MCSA). Specfically, in the scenario without user mobility, the loop interation gradient descent (Li-GD) algorithm is provided. When the mobile user has a large model inference task needs to be calculated, it will take the energy consumption of mobile user, the communication and computing resource renting cost, and the inference delay into account to find the optimal model segmentation and resource allocation strategy. In the scenario with user mobility, the mobiity aware Li-GD (MLi-GD) algorithm is proposed to calculate the optimal strategy. Then, the properties of the proposed algorithms are investigated, including convergence, complexity, and approximation ratio. The experimental results demonstrate the effectiveness of the proposed algorithms.

Optimal Beamforming Structure and Efficient Optimization Algorithms for Generalized Multi-Group Multicast Beamforming Optimization

  • Authors: Authors: Tianyu Fang, Yijie Mao
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2312.16559
  • Pdf link: https://arxiv.org/pdf/2312.16559
  • Abstract In this work, we focus on solving non-smooth non-convex maximization problems in multi-group multicast transmission. Leveraging Karush-Kuhn-Tucker (KKT) optimality conditions and successive incumbent transcending (SIT) duality, we thoroughly analyze the optimal beamforming structure for a set of optimization problems characterized by a general utility-based objective function. By exploiting the identified optimal structure, we further unveil inherent low-dimensional beamforming structures within the problems, which are asymptotically optimal in various regimes of transmit signal-to-noise ratios (SNRs) or the number of transmit antennas. Building upon the discovered optimal and low-dimensional beamforming structures, we then propose highly efficient and toolbox-free optimization algorithms to solve a specific multi-group multicast optimization problem based on the weighted sum rate (WSR) utility function. The proposed algorithms first use the cyclic maximization (CM) framework to decompose the problem into multiple subproblems, each has an optimal or low-dimensional closed-form beamforming solution structure. Then, we propose the projected adaptive gradient descent (PAGD) algorithm to compute the optimal Lagrangian dual variables for each subproblem. Numerical results show that the proposed algorithms maintain comparable or improved WSR performance compared to baseline algorithms, while dramatically reducing the computational complexity. Notably, the proposed ultra-low-complexity algorithms based on low-dimensional beamforming structures achieve near optimal WSR performance with extremely low computational complexity. This complexity remains independent of the number of transmit antennas, making them promising and practical for extremely large multiple-input multiple-output (XL-MIMO) applications in 6G.

Inverse Reinforcement Learning with Unknown Reward Model based on Structural Risk Minimization

  • Authors: Authors: Chendi Qu, Jianping He, Xiaoming Duan, Jiming Chen
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.16566
  • Pdf link: https://arxiv.org/pdf/2312.16566
  • Abstract Inverse reinforcement learning (IRL) usually assumes the model of the reward function is pre-specified and estimates the parameter only. However, how to determine a proper reward model is nontrivial. A simplistic model is less likely to contain the real reward function, while a model with high complexity leads to substantial computation cost and risks overfitting. This paper addresses this trade-off in IRL model selection by introducing the structural risk minimization (SRM) method from statistical learning. SRM selects an optimal reward function class from a hypothesis set minimizing both estimation error and model complexity. To formulate an SRM scheme for IRL, we estimate policy gradient by demonstration serving as empirical risk and establish the upper bound of Rademacher complexity of hypothesis classes as model penalty. The learning guarantee is further presented. In particular, we provide explicit SRM for the common linear weighted sum setting in IRL. Simulations demonstrate the performance and efficiency of our scheme.

Exploiting hidden structures in non-convex games for convergence to Nash equilibrium

  • Authors: Authors: Iosif Sakos, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Panayotis Mertikopoulos, Georgios Piliouras
  • Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.16609
  • Pdf link: https://arxiv.org/pdf/2312.16609
  • Abstract A wide array of modern machine learning applications - from adversarial models to multi-agent reinforcement learning - can be formulated as non-cooperative games whose Nash equilibria represent the system's desired operational states. Despite having a highly non-convex loss landscape, many cases of interest possess a latent convex structure that could potentially be leveraged to yield convergence to equilibrium. Driven by this observation, our paper proposes a flexible first-order method that successfully exploits such "hidden structures" and achieves convergence under minimal assumptions for the transformation connecting the players' control variables to the game's latent, convex-structured layer. The proposed method - which we call preconditioned hidden gradient descent (PHGD) - hinges on a judiciously chosen gradient preconditioning scheme related to natural gradient methods. Importantly, we make no separability assumptions for the game's hidden structure, and we provide explicit convergence rate guarantees for both deterministic and stochastic environments.

Exploring intra-task relations to improve meta-learning algorithms

  • Authors: Authors: Prabhat Agarwal, Shreya Singh
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.16612
  • Pdf link: https://arxiv.org/pdf/2312.16612
  • Abstract Meta-learning has emerged as an effective methodology to model several real-world tasks and problems due to its extraordinary effectiveness in the low-data regime. There are many scenarios ranging from the classification of rare diseases to language modelling of uncommon languages where the availability of large datasets is rare. Similarly, for more broader scenarios like self-driving, an autonomous vehicle needs to be trained to handle every situation well. This requires training the ML model on a variety of tasks with good quality data. But often times, we find that the data distribution across various tasks is skewed, i.e.the data follows a long-tail distribution. This leads to the model performing well on some tasks and not performing so well on others leading to model robustness issues. Meta-learning has recently emerged as a potential learning paradigm which can effectively learn from one task and generalize that learning to unseen tasks. In this study, we aim to exploit external knowledge of task relations to improve training stability via effective mini-batching of tasks. We hypothesize that selecting a diverse set of tasks in a mini-batch will lead to a better estimate of the full gradient and hence will lead to a reduction of noise in training.

Agnostically Learning Multi-index Models with Queries

  • Authors: Authors: Ilias Diakonikolas, Daniel M. Kane, Vasilis Kontonis, Christos Tzamos, Nikos Zarifis
  • Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Statistics Theory (math.ST); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2312.16616
  • Pdf link: https://arxiv.org/pdf/2312.16616
  • Abstract We study the power of query access for the task of agnostic learning under the Gaussian distribution. In the agnostic model, no assumptions are made on the labels and the goal is to compute a hypothesis that is competitive with the {\em best-fit} function in a known class, i.e., it achieves error $\mathrm{opt}+\epsilon$, where $\mathrm{opt}$ is the error of the best function in the class. We focus on a general family of Multi-Index Models (MIMs), which are $d$-variate functions that depend only on few relevant directions, i.e., have the form $g(\mathbf{W} \mathbf{x})$ for an unknown link function $g$ and a $k \times d$ matrix $\mathbf{W}$. Multi-index models cover a wide range of commonly studied function classes, including constant-depth neural networks with ReLU activations, and intersections of halfspaces. Our main result shows that query access gives significant runtime improvements over random examples for agnostically learning MIMs. Under standard regularity assumptions for the link function (namely, bounded variation or surface area), we give an agnostic query learner for MIMs with complexity $O(k)^{\mathrm{poly}(1/\epsilon)} ; \mathrm{poly}(d) $. In contrast, algorithms that rely only on random examples inherently require $d^{\mathrm{poly}(1/\epsilon)}$ samples and runtime, even for the basic problem of agnostically learning a single ReLU or a halfspace. Our algorithmic result establishes a strong computational separation between the agnostic PAC and the agnostic PAC+Query models under the Gaussian distribution. Prior to our work, no such separation was known -- even for the special case of agnostically learning a single halfspace, for which it was an open problem first posed by Feldman. Our results are enabled by a general dimension-reduction technique that leverages query access to estimate gradients of (a smoothed version of) the underlying label function.

MIM4DD: Mutual Information Maximization for Dataset Distillation

  • Authors: Authors: Yuzhang Shang, Zhihang Yuan, Yan Yan
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.16627
  • Pdf link: https://arxiv.org/pdf/2312.16627
  • Abstract Dataset distillation (DD) aims to synthesize a small dataset whose test performance is comparable to a full dataset using the same model. State-of-the-art (SoTA) methods optimize synthetic datasets primarily by matching heuristic indicators extracted from two networks: one from real data and one from synthetic data (see Fig.1, Left), such as gradients and training trajectories. DD is essentially a compression problem that emphasizes maximizing the preservation of information contained in the data. We argue that well-defined metrics which measure the amount of shared information between variables in information theory are necessary for success measurement but are never considered by previous works. Thus, we introduce mutual information (MI) as the metric to quantify the shared information between the synthetic and the real datasets, and devise MIM4DD numerically maximizing the MI via a newly designed optimizable objective within a contrastive learning framework to update the synthetic dataset. Specifically, we designate the samples in different datasets that share the same labels as positive pairs and vice versa negative pairs. Then we respectively pull and push those samples in positive and negative pairs into contrastive space via minimizing NCE loss. As a result, the targeted MI can be transformed into a lower bound represented by feature maps of samples, which is numerically feasible. Experiment results show that MIM4DD can be implemented as an add-on module to existing SoTA DD methods.

Adversarial Attacks on LoRa Device Identification and Rogue Signal Detection with Deep Learning

  • Authors: Authors: Yalin E. Sagduyu, Tugba Erpek
  • Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2312.16715
  • Pdf link: https://arxiv.org/pdf/2312.16715
  • Abstract Low-Power Wide-Area Network (LPWAN) technologies, such as LoRa, have gained significant attention for their ability to enable long-range, low-power communication for Internet of Things (IoT) applications. However, the security of LoRa networks remains a major concern, particularly in scenarios where device identification and classification of legitimate and spoofed signals are crucial. This paper studies a deep learning framework to address these challenges, considering LoRa device identification and legitimate vs. rogue LoRa device classification tasks. A deep neural network (DNN), either a convolutional neural network (CNN) or feedforward neural network (FNN), is trained for each task by utilizing real experimental I/Q data for LoRa signals, while rogue signals are generated by using kernel density estimation (KDE) of received signals by rogue devices. Fast Gradient Sign Method (FGSM)-based adversarial attacks are considered for LoRa signal classification tasks using deep learning models. The impact of these attacks is assessed on the performance of two tasks, namely device identification and legitimate vs. rogue device classification, by utilizing separate or common perturbations against these signal classification tasks. Results presented in this paper quantify the level of transferability of adversarial attacks on different LoRa signal classification tasks as a major vulnerability and highlight the need to make IoT applications robust to adversarial attacks.

GUITAR: Gradient Pruning toward Fast Neural Ranking

  • Authors: Authors: Weijie Zhao, Shulong Tan, Ping Li
  • Subjects: Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2312.16828
  • Pdf link: https://arxiv.org/pdf/2312.16828
  • Abstract With the continuous popularity of deep learning and representation learning, fast vector search becomes a vital task in various ranking/retrieval based applications, say recommendation, ads ranking and question answering. Neural network based ranking is widely adopted due to its powerful capacity in modeling complex relationships, such as between users and items, questions and answers. However, it is usually exploited in offline or re-ranking manners for it is time-consuming in computations. Online neural network ranking--so called fast neural ranking--is considered challenging because neural network measures are usually non-convex and asymmetric. Traditional Approximate Nearest Neighbor (ANN) search which usually focuses on metric ranking measures, is not applicable to these advanced measures. In this paper, we introduce a novel graph searching framework to accelerate the searching in the fast neural ranking problem. The proposed graph searching algorithm is bi-level: we first construct a probable candidate set; then we only evaluate the neural network measure over the probable candidate set instead of evaluating the neural network over all neighbors. Specifically, we propose a gradient-based algorithm that approximates the rank of the neural network matching score to construct the probable candidate set; and we present an angle-based heuristic procedure to adaptively identify the proper size of the probable candidate set. Empirical results on public data confirm the effectiveness of our proposed algorithms.

Adversarial Attacks on Image Classification Models: Analysis and Defense

  • Authors: Authors: Jaydip Sen, Abhiraj Sen, Ananda Chatterjee
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.16880
  • Pdf link: https://arxiv.org/pdf/2312.16880
  • Abstract The notion of adversarial attacks on image classification models based on convolutional neural networks (CNN) is introduced in this work. To classify images, deep learning models called CNNs are frequently used. However, when the networks are subject to adversarial attacks, extremely potent and previously trained CNN models that perform quite effectively on image datasets for image classification tasks may perform poorly. In this work, one well-known adversarial attack known as the fast gradient sign method (FGSM) is explored and its adverse effects on the performances of image classification models are examined. The FGSM attack is simulated on three pre-trained image classifier CNN architectures, ResNet-101, AlexNet, and RegNetY 400MF using randomly chosen images from the ImageNet dataset. The classification accuracies of the models are computed in the absence and presence of the attack to demonstrate the detrimental effect of the attack on the performances of the classifiers. Finally, a mechanism is proposed to defend against the FGSM attack based on a modified defensive distillation-based approach. Extensive results are presented for the validation of the proposed scheme.

Spike No More: Stabilizing the Pre-training of Large Language Models

  • Authors: Authors: Sho Takase, Shun Kiyono, Sosuke Kobayashi, Jun Suzuki
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2312.16903
  • Pdf link: https://arxiv.org/pdf/2312.16903
  • Abstract The loss spike often occurs during pre-training of a large language model. The spikes degrade the performance of a large language model, and sometimes ruin the pre-training. Since the pre-training needs a vast computational budget, we should avoid such spikes. To investigate a cause of loss spikes, we focus on gradients of internal layers in this study. Through theoretical analyses, we introduce two causes of the exploding gradients, and provide requirements to prevent the explosion. In addition, we introduce the combination of the initialization method and a simple modification to embeddings as a method to satisfy the requirements. We conduct various experiments to verify our theoretical analyses empirically. Experimental results indicate that the combination is effective in preventing spikes during pre-training.

Reformulation and generalisation of the air-gap element

  • Authors: Authors: Herbert De Gersem, Thomas Weiland
  • Subjects: Computational Engineering, Finance, and Science (cs.CE)
  • Arxiv link: https://arxiv.org/abs/2312.16984
  • Pdf link: https://arxiv.org/pdf/2312.16984
  • Abstract The air-gap macro element is reformulated such that rotation, rotor or stator skewing and rotor eccentricity can be incorporated easily. The air-gap element is evaluated using Fast Fourier Transforms which in combination with the Conjugate Gradient algorithm leads to highly efficient and memory inexpensive iterative solution scheme. The improved air-gap element features beneficial approximation properties and is competitive to moving-band and sliding-surface technique.

Projected Langevin Monte Carlo algorithms in non-convex and super-linear setting

  • Authors: Authors: Chenxu Pang, Xiaojie Wang, Yue Wu
  • Subjects: Numerical Analysis (math.NA); Probability (math.PR)
  • Arxiv link: https://arxiv.org/abs/2312.17077
  • Pdf link: https://arxiv.org/pdf/2312.17077
  • Abstract It is of significant interest in many applications to sample from a high-dimensional target distribution $\pi$ with the density $\pi(\text{d} x) \propto e^{-U(x)} (\text{d} x) $, based on the temporal discretization of the Langevin stochastic differential equations (SDEs). In this paper, we propose an explicit projected Langevin Monte Carlo (PLMC) algorithm with non-convex potential $U$ and super-linear gradient of $U$ and investigate the non-asymptotic analysis of its sampling error in total variation distance. Equipped with time-independent regularity estimates for the corresponding Kolmogorov equation, we derive the non-asymptotic bounds on the total variation distance between the target distribution of the Langevin SDEs and the law induced by the PLMC scheme with order $\mathcal{O}(h |\ln h|)$. Moreover, for a given precision $\epsilon$, the smallest number of iterations of the classical Langevin Monte Carlo (LMC) scheme with the non-convex potential $U$ and the globally Lipshitz gradient of $U$ can be guaranteed by order ${\mathcal{O}}\big(\tfrac{d^{3/2}}{\epsilon} \cdot \ln (\tfrac{d}{\epsilon}) \cdot \ln (\tfrac{1}{\epsilon}) \big)$. Numerical experiments are provided to confirm the theoretical findings.

Visual Explanations of Image-Text Representations via Multi-Modal Information Bottleneck Attribution

  • Authors: Authors: Ying Wang, Tim G. J. Rudner, Andrew Gordon Wilson
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.17174
  • Pdf link: https://arxiv.org/pdf/2312.17174
  • Abstract Vision-language pretrained models have seen remarkable success, but their application to safety-critical settings is limited by their lack of interpretability. To improve the interpretability of vision-language models such as CLIP, we propose a multi-modal information bottleneck (M2IB) approach that learns latent representations that compress irrelevant information while preserving relevant visual and textual features. We demonstrate how M2IB can be applied to attribution analysis of vision-language pretrained models, increasing attribution accuracy and improving the interpretability of such models when applied to safety-critical domains such as healthcare. Crucially, unlike commonly used unimodal attribution methods, M2IB does not require ground truth labels, making it possible to audit representations of vision-language pretrained models when multiple modalities but no ground-truth data is available. Using CLIP as an example, we demonstrate the effectiveness of M2IB attribution and show that it outperforms gradient-based, perturbation-based, and attention-based attribution methods both qualitatively and quantitatively.

Gradient-based Planning with World Models

  • Authors: Authors: Jyothir S V, Siddhartha Jalagam, Yann LeCun, Vlad Sobal
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2312.17227
  • Pdf link: https://arxiv.org/pdf/2312.17227
  • Abstract The enduring challenge in the field of artificial intelligence has been the control of systems to achieve desired behaviours. While for systems governed by straightforward dynamics equations, methods like Linear Quadratic Regulation (LQR) have historically proven highly effective, most real-world tasks, which require a general problem-solver, demand world models with dynamics that cannot be easily described by simple equations. Consequently, these models must be learned from data using neural networks. Most model predictive control (MPC) algorithms designed for visual world models have traditionally explored gradient-free population-based optimisation methods, such as Cross Entropy and Model Predictive Path Integral (MPPI) for planning. However, we present an exploration of a gradient-based alternative that fully leverages the differentiability of the world model. In our study, we conduct a comparative analysis between our method and other MPC-based alternatives, as well as policy-based algorithms. In a sample-efficient setting, our method achieves on par or superior performance compared to the alternative approaches in most tasks. Additionally, we introduce a hybrid model that combines policy networks and gradient-based MPC, which outperforms pure policy based methods thereby holding promise for Gradient-based planning with world models in complex real-world tasks.

Keyword: super-resolution

Learning from small data sets: Patch-based regularizers in inverse problems for image reconstruction

  • Authors: Authors: Moritz Piening, Fabian Altekrüger, Johannes Hertrich, Paul Hagemann, Andrea Walther, Gabriele Steidl
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Probability (math.PR)
  • Arxiv link: https://arxiv.org/abs/2312.16611
  • Pdf link: https://arxiv.org/pdf/2312.16611
  • Abstract The solution of inverse problems is of fundamental interest in medical and astronomical imaging, geophysics as well as engineering and life sciences. Recent advances were made by using methods from machine learning, in particular deep neural networks. Most of these methods require a huge amount of (paired) data and computer capacity to train the networks, which often may not be available. Our paper addresses the issue of learning from small data sets by taking patches of very few images into account. We focus on the combination of model-based and data-driven methods by approximating just the image prior, also known as regularizer in the variational model. We review two methodically different approaches, namely optimizing the maximum log-likelihood of the patch distribution, and penalizing Wasserstein-like discrepancies of whole empirical patch distributions. From the point of view of Bayesian inverse problems, we show how we can achieve uncertainty quantification by approximating the posterior using Langevin Monte Carlo methods. We demonstrate the power of the methods in computed tomography, image super-resolution, and inpainting. Indeed, the approach provides also high-quality results in zero-shot super-resolution, where only a low-resolution image is available. The paper is accompanied by a GitHub repository containing implementations of all methods as well as data examples so that the reader can get their own insight into the performance.

KeDuSR: Real-World Dual-Lens Super-Resolution via Kernel-Free Matching

  • Authors: Authors: Huanjing Yue, Zifan Cui, Kun Li, Jingyu Yang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.17050
  • Pdf link: https://arxiv.org/pdf/2312.17050
  • Abstract Dual-lens super-resolution (SR) is a practical scenario for reference (Ref) based SR by utilizing the telephoto image (Ref) to assist the super-resolution of the low-resolution wide-angle image (LR input). Different from general RefSR, the Ref in dual-lens SR only covers the overlapped field of view (FoV) area. However, current dual-lens SR methods rarely utilize these specific characteristics and directly perform dense matching between the LR input and Ref. Due to the resolution gap between LR and Ref, the matching may miss the best-matched candidate and destroy the consistent structures in the overlapped FoV area. Different from them, we propose to first align the Ref with the center region (namely the overlapped FoV area) of the LR input by combining global warping and local warping to make the aligned Ref be sharp and consistent. Then, we formulate the aligned Ref and LR center as value-key pairs, and the corner region of the LR is formulated as queries. In this way, we propose a kernel-free matching strategy by matching between the LR-corner (query) and LR-center (key) regions, and the corresponding aligned Ref (value) can be warped to the corner region of the target. Our kernel-free matching strategy avoids the resolution gap between LR and Ref, which makes our network have better generalization ability. In addition, we construct a DuSR-Real dataset with (LR, Ref, HR) triples, where the LR and HR are well aligned. Experiments on three datasets demonstrate that our method outperforms the second-best method by a large margin. Our code and dataset are available at https://github.com/Craigie-Hill/KeDuSR.

zoq avatar Dec 29 '23 07:12 zoq