arxiv-updates icon indicating copy to clipboard operation
arxiv-updates copied to clipboard

New submissions for Mon, 22 Jan 24

Open zoq opened this issue 1 year ago • 0 comments

Keyword: sgd

Tight Group-Level DP Guarantees for DP-SGD with Sampling via Mixture of Gaussians Mechanisms

  • Authors: Authors: Arun Ganesh
  • Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.10294
  • Pdf link: https://arxiv.org/pdf/2401.10294
  • Abstract We give a procedure for computing group-level $(\epsilon, \delta)$-DP guarantees for DP-SGD, when using Poisson sampling or fixed batch size sampling. Up to discretization errors in the implementation, the DP guarantees computed by this procedure are tight (assuming we release every intermediate iterate).

Keyword: optimization

Non-Terrestrial Network (NTN): a Novel Alternate Fractional Programming for the Downlink Channels Power Allocation

  • Authors: Authors: Mahfuzur Rahman, Zoheb Hassan, Jeffrey H. Reed, Lingjia Liu
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2401.10251
  • Pdf link: https://arxiv.org/pdf/2401.10251
  • Abstract Non-terrestrial network (NTN) communication has garnered considerable attention from government entities, industries, and academia in recent times. NTN networks encompass a variety of systems, including Low Earth Orbit (LEO) satellites, Medium Earth Orbit (MEO) satellites, Geostationary Earth Orbit (GEO) satellites, High Altitude Platforms (HAPS), and Low Altitude Platforms (LAPS). Furthermore, the deployment of high-throughput satellites (HTS/VHTS) in the GEO space has gained momentum. While LEO and MEO satellites offer advantages such as low latency and reduced launching costs compared to GEO satellites, this study focuses on GEO satellites due to their stationary nature and broader coverage. In traditional cellular networks, each user equipment (UE) is allocated at least one resource block (RB), which is not shared with other UEs. However, in NTN communications, where the coverage area is extensive, dedicating an RB to only one UE is an inefficient utilization of radio resources. To address this challenge, fractional programming (FP), cognitive radio, and rate splitting multiple access (RSMA) are existing technologies. This paper aims to maximize spectral efficiency, average RBG rate, and sum rate for GEO satellite systems. However, achieving this objective involves dealing with a non-convex, NP-hard problem, as it requires the logarithmic sum of different fractions. Finding a global solution to such an NP-hard problem presents significant challenges. This paper introduces a novel alternate fractional programming algorithm specifically designed to tackle these complex NP-hard problems in the context of GEO NTN cellular networks. By employing this innovative approach, the study seeks to contribute to the optimization of NTN communication systems, enabling efficient resource allocation and improved network performance.

Hybrid-Task Meta-Learning: A Graph Neural Network Approach for Scalable and Transferable Bandwidth Allocation

  • Authors: Authors: Xin Hao, Changyang She, Phee Lep Yeoh, Yuhong Liu, Branka Vucetic, Yonghui Li
  • Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.10253
  • Pdf link: https://arxiv.org/pdf/2401.10253
  • Abstract In this paper, we develop a deep learning-based bandwidth allocation policy that is: 1) scalable with the number of users and 2) transferable to different communication scenarios, such as non-stationary wireless channels, different quality-of-service (QoS) requirements, and dynamically available resources. To support scalability, the bandwidth allocation policy is represented by a graph neural network (GNN), with which the number of training parameters does not change with the number of users. To enable the generalization of the GNN, we develop a hybrid-task meta-learning (HML) algorithm that trains the initial parameters of the GNN with different communication scenarios during meta-training. Next, during meta-testing, a few samples are used to fine-tune the GNN with unseen communication scenarios. Simulation results demonstrate that our HML approach can improve the initial performance by $8.79%$, and sampling efficiency by $73%$, compared with existing benchmarks. After fine-tuning, our near-optimal GNN-based policy can achieve close to the same reward with much lower inference complexity compared to the optimal policy obtained using iterative optimization.

Migrating Birds Optimization-Based Feature Selection for Text Classification

  • Authors: Authors: Cem Kaya, Zeynep Hilal Kilimci, Mitat Uysal, Murat Kaya
  • Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.10270
  • Pdf link: https://arxiv.org/pdf/2401.10270
  • Abstract This research introduces a novel approach, MBO-NB, that leverages Migrating Birds Optimization (MBO) coupled with Naive Bayes as an internal classifier to address feature selection challenges in text classification having large number of features. Focusing on computational efficiency, we preprocess raw data using the Information Gain algorithm, strategically reducing the feature count from an average of 62221 to 2089. Our experiments demonstrate MBO-NB's superior effectiveness in feature reduction compared to other existing techniques, emphasizing an increased classification accuracy. The successful integration of Naive Bayes within MBO presents a well-rounded solution. In individual comparisons with Particle Swarm Optimization (PSO), MBO-NB consistently outperforms by an average of 6.9% across four setups. This research offers valuable insights into enhancing feature selection methods, providing a scalable and effective solution for text classification

Knowledge-Assisted Dual-Stage Evolutionary Optimization of Large-Scale Crude Oil Scheduling

  • Authors: Authors: Wanting Zhang, Wei Du, Guo Yu, Renchu He, Wenli Du, Yaochu Jin
  • Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2401.10274
  • Pdf link: https://arxiv.org/pdf/2401.10274
  • Abstract With the scaling up of crude oil scheduling in modern refineries, large-scale crude oil scheduling problems (LSCOSPs) emerge with thousands of binary variables and non-linear constraints, which are challenging to be optimized by traditional optimization methods. To solve LSCOSPs, we take the practical crude oil scheduling from a marine-access refinery as an example and start with modeling LSCOSPs from crude unloading, transportation, crude distillation unit processing, and inventory management of intermediate products. On the basis of the proposed model, a dual-stage evolutionary algorithm driven by heuristic rules (denoted by DSEA/HR) is developed, where the dual-stage search mechanism consists of global search and local refinement. In the global search stage, we devise several heuristic rules based on the empirical operating knowledge to generate a well-performing initial population and accelerate convergence in the mixed variables space. In the local refinement stage, a repair strategy is proposed to move the infeasible solutions towards feasible regions by further optimizing the local continuous variables. During the whole evolutionary process, the proposed dual-stage framework plays a crucial role in balancing exploration and exploitation. Experimental results have shown that DSEA/HR outperforms the state-of-the-art and widely-used mathematical programming methods and metaheuristic algorithms on LSCOSP instances within a reasonable time.

Hybrid Quantum Solvers in Production: how to succeed in the NISQ era?

  • Authors: Authors: Eneko Osaba, Esther Villar-Rodriguez, Aitor Gomez-Tejedor, Izaskun Oregi
  • Subjects: Emerging Technologies (cs.ET); Quantum Physics (quant-ph)
  • Arxiv link: https://arxiv.org/abs/2401.10302
  • Pdf link: https://arxiv.org/pdf/2401.10302
  • Abstract Hybrid quantum computing is considered the present and the future within the field of quantum computing. Far from being a passing fad, this trend cannot be considered just a stopgap to address the limitations of NISQ-era devices. The foundations linking both computing paradigms will remain robust over time. Despite buoyant research activity, the challenges in hybrid computing are still countless, ranging from the proper characterization of current solvers to the establishment of appropriate methodologies for the design and fair evaluation of hybrid algorithms. The contribution of this work is twofold: first, we describe and categorize some of the most frequently used hybrid solvers, resorting to two different taxonomies recently published in the literature. Secondly, we put a special focus on two solvers that are currently deployed in real production and that have demonstrated to be near the real industry. These solvers are the LeapHybridBQMSampler contained in D-Wave's Hybrid Solver Service and Quantagonia's Hybrid Solver. We analyze the performance of both hybrid methods using as benchmarks four well-known combinatorial optimization problems: the Traveling Salesman Problem, Vehicle Routing Problem, Bin Packing Problem, and Maximum Cut Problem. Thanks to the contributions presented in this paper, the reader gains insight into the performance of those hybridization strategies nowadays in production and close to the industrial markets.

Hacking Predictors Means Hacking Cars: Using Sensitivity Analysis to Identify Trajectory Prediction Vulnerabilities for Autonomous Driving Security

  • Authors: Authors: Marsalis Gibson, David Babazadeh, Claire Tomlin, Shankar Sastry
  • Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2401.10313
  • Pdf link: https://arxiv.org/pdf/2401.10313
  • Abstract Adversarial attacks on learning-based trajectory predictors have already been demonstrated. However, there are still open questions about the effects of perturbations on trajectory predictor inputs other than state histories, and how these attacks impact downstream planning and control. In this paper, we conduct a sensitivity analysis on two trajectory prediction models, Trajectron++ and AgentFormer. We observe that between all inputs, almost all of the perturbation sensitivities for Trajectron++ lie only within the most recent state history time point, while perturbation sensitivities for AgentFormer are spread across state histories over time. We additionally demonstrate that, despite dominant sensitivity on state history perturbations, an undetectable image map perturbation made with the Fast Gradient Sign Method can induce large prediction error increases in both models. Even though image maps may contribute slightly to the prediction output of both models, this result reveals that rather than being robust to adversarial image perturbations, trajectory predictors are susceptible to image attacks. Using an optimization-based planner and example perturbations crafted from sensitivity results, we show how this vulnerability can cause a vehicle to come to a sudden stop from moderate driving speeds.

LangProp: A code optimization framework using Language Models applied to driving

  • Authors: Authors: Shu Ishida, Gianluca Corrado, George Fedoseev, Hudson Yeo, Lloyd Russell, Jamie Shotton, João F. Henriques, Anthony Hu
  • Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2401.10314
  • Pdf link: https://arxiv.org/pdf/2401.10314
  • Abstract LangProp is a framework for iteratively optimizing code generated by large language models (LLMs) in a supervised/reinforcement learning setting. While LLMs can generate sensible solutions zero-shot, the solutions are often sub-optimal. Especially for code generation tasks, it is likely that the initial code will fail on certain edge cases. LangProp automatically evaluates the code performance on a dataset of input-output pairs, as well as catches any exceptions, and feeds the results back to the LLM in the training loop, so that the LLM can iteratively improve the code it generates. By adopting a metric- and data-driven training paradigm for this code optimization procedure, one could easily adapt findings from traditional machine learning techniques such as imitation learning, DAgger, and reinforcement learning. We demonstrate the first proof of concept of automated code optimization for autonomous driving in CARLA, showing that LangProp can generate interpretable and transparent driving policies that can be verified and improved in a metric- and data-driven way. Our code will be open-sourced and is available at https://github.com/shuishida/LangProp.

Joint Processing and Transmission Energy Optimization for ISAC in Cell-Free Massive MIMO with URLLC

  • Authors: Authors: Zinat Behdad, Özlem Tuğfe Demir, Ki Won Sung, Cicek Cavdar
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2401.10315
  • Pdf link: https://arxiv.org/pdf/2401.10315
  • Abstract In this paper, we explore the concept of integrated sensing and communication (ISAC) within a downlink cell-free massive MIMO (multiple-input multiple-output) system featuring multi-static sensing and users requiring ultra-reliable low-latency communications (URLLC). Our focus involves the formulation of two non-convex algorithms that jointly solve power and blocklength allocation for end-to-end (E2E) minimization. The objectives are to jointly minimize sensing/communication processing and transmission energy consumption, while simultaneously meeting the requirements for sensing and URLLC. To address the inherent non-convexity of these optimization problems, we utilize techniques such as the Feasible Point Pursuit - Successive Convex Approximation (FPP-SCA), Concave-Convex Programming (CCP), and fractional programming. We conduct a comparative analysis of the performance of these algorithms in ISAC scenarios and against a URLLC-only scenario where sensing is not integrated. Our numerical results highlight the superior performance of the E2E energy minimization algorithm, especially in scenarios without sensing capability. Additionally, our study underscores the increasing prominence of energy consumption associated with sensing processing tasks as the number of sensing receive access points rises. Furthermore, the results emphasize that a higher sensing signal-to-interference-plus-noise ratio threshold is associated with an escalation in E2E energy consumption, thereby narrowing the performance gap between the two proposed algorithms.

PAC Code Rate-Profile Design Using Search-Constrained Optimization Algorithms

  • Authors: Authors: Mohsen Moradi, David G. M. Mitchell
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2401.10376
  • Pdf link: https://arxiv.org/pdf/2401.10376
  • Abstract In this paper, we introduce a novel rate-profile design based on search-constrained optimization techniques to assess the performance of polarization-adjusted convolutional (PAC) codes under Fano (sequential) decoding. The results demonstrate that the resulting PAC code offers much reduced computational complexity compared to a construction based on a conventional genetic algorithm without a performance loss in error-correction performance. As the fitness function of our algorithm, we propose an adaptive successive cancellation list decoding algorithm to determine the weight distribution of the rate profiles. The simulation results indicate that, for a PAC(256, 128) code, only 8% of the population requires that their fitness function be evaluated with a large list size. This represents an improvement of almost 92% over a conventional evolutionary algorithm. For a PAC(64, 32) code, this improvement is about 99%. We also plotted the performance of the high-rate PAC(128, 105) and PAC(64, 51) codes, and the results show that they exhibit superior performance compared to other algorithms.

Bypassing a Reactive Jammer via NOMA-Based Transmissions in Critical Missions

  • Authors: Authors: Mohammadreza Amini, Ghazal Asemian, Michel Kulhandjian, Burak Kantarci, Claude D'Amours, Melike Erol-Kantarci
  • Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2401.10387
  • Pdf link: https://arxiv.org/pdf/2401.10387
  • Abstract Wireless networks can be vulnerable to radio jamming attacks. The quality of service under a jamming attack is not guaranteed and the service requirements such as reliability, latency, and effective rate, specifically in mission-critical military applications, can be deeply affected by the jammer's actions. This paper analyzes the effect of a reactive jammer. Particularly, reliability, average transmission delay, and the effective sum rate (ESR) for a NOMA-based scheme with finite blocklength transmissions are mathematically derived taking the detection probability of the jammer into account. Furthermore, the effect of UEs' allocated power and blocklength on the network metrics is explored. Contrary to the existing literature, results show that gNB can mitigate the impact of reactive jamming by decreasing transmit power, making the transmissions covert at the jammer side. Finally, an optimization problem is formulated to maximize the ESR under reliability, delay, and transmit power constraints. It is shown that by adjusting the allocated transmit power to UEs by gNB, the gNB can bypass the jammer effect to fulfill the 0.99999 reliability and the latency of 5ms without the need for packet re-transmission.

Distribution Consistency based Self-Training for Graph Neural Networks with Sparse Labels

  • Authors: Authors: Fali Wang, Tianxiang Zhao, Suhang Wang
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2401.10394
  • Pdf link: https://arxiv.org/pdf/2401.10394
  • Abstract Few-shot node classification poses a significant challenge for Graph Neural Networks (GNNs) due to insufficient supervision and potential distribution shifts between labeled and unlabeled nodes. Self-training has emerged as a widely popular framework to leverage the abundance of unlabeled data, which expands the training set by assigning pseudo-labels to selected unlabeled nodes. Efforts have been made to develop various selection strategies based on confidence, information gain, etc. However, none of these methods takes into account the distribution shift between the training and testing node sets. The pseudo-labeling step may amplify this shift and even introduce new ones, hindering the effectiveness of self-training. Therefore, in this work, we explore the potential of explicitly bridging the distribution shift between the expanded training set and test set during self-training. To this end, we propose a novel Distribution-Consistent Graph Self-Training (DC-GST) framework to identify pseudo-labeled nodes that are both informative and capable of redeeming the distribution discrepancy and formulate it as a differentiable optimization task. A distribution-shift-aware edge predictor is further adopted to augment the graph and increase the model's generalizability in assigning pseudo labels. We evaluate our proposed method on four publicly available benchmark datasets and extensive experiments demonstrate that our framework consistently outperforms state-of-the-art baselines.

Analyzing and Mitigating Bias for Vulnerable Classes: Towards Balanced Representation in Dataset

  • Authors: Authors: Dewant Katare, David Solans Noguero, Souneil Park, Nicolas Kourtellis, Marijn Janssen, Aaron Yi Ding
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2401.10397
  • Pdf link: https://arxiv.org/pdf/2401.10397
  • Abstract The accuracy and fairness of perception systems in autonomous driving are crucial, particularly for vulnerable road users. Mainstream research has looked into improving the performance metrics for classification accuracy. However, the hidden traits of bias inheritance in the AI models, class imbalances and disparities in the datasets are often overlooked. In this context, our study examines the class imbalances for vulnerable road users by focusing on class distribution analysis, performance evaluation, and bias impact assessment. We identify the concern of imbalances in class representation, leading to potential biases in detection accuracy. Utilizing popular CNN models and Vision Transformers (ViTs) with the nuScenes dataset, our performance evaluation reveals detection disparities for underrepresented classes. We propose a methodology for model optimization and bias mitigation, which includes data augmentation, resampling, and metric-specific learning. Using the proposed mitigation approaches, we see improvement in IoU(%) and NDS(%) metrics from 71.3 to 75.6 and 80.6 to 83.7 respectively, for the CNN model. Similarly, for ViT, we observe improvement in IoU and NDS metrics from 74.9 to 79.2 and 83.8 to 87.1 respectively. This research contributes to developing more reliable models and datasets, enhancing inclusiveness for minority classes.

Learning-assisted Stochastic Capacity Expansion Planning: A Bayesian Optimization Approach

  • Authors: Authors: Aron Brenner, Rahman Khorramfar, Dharik Mallapragada, Saurabh Amin
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.10451
  • Pdf link: https://arxiv.org/pdf/2401.10451
  • Abstract Solving large-scale capacity expansion problems (CEPs) is central to cost-effective decarbonization of regional-scale energy systems. To ensure the intended outcomes of CEPs, modeling uncertainty due to weather-dependent variable renewable energy (VRE) supply and energy demand becomes crucially important. However, the resulting stochastic optimization models are often less computationally tractable than their deterministic counterparts. Here, we propose a learning-assisted approximate solution method to tractably solve two-stage stochastic CEPs. Our method identifies low-cost planning decisions by constructing and solving a sequence of tractable temporally aggregated surrogate problems. We adopt a Bayesian optimization approach to searching the space of time series aggregation hyperparameters and compute approximate solutions that minimize costs on a validation set of supply-demand projections. Importantly, we evaluate solved planning outcomes on a held-out set of test projections. We apply our approach to generation and transmission expansion planning for a joint power-gas system spanning New England. We show that our approach yields an estimated cost savings of up to 3.8% in comparison to benchmark time series aggregation approaches.

Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis

  • Authors: Authors: Prabhav Agrawal, Thilo Koehler, Zhiping Xiu, Prashant Serai, Qing He
  • Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
  • Arxiv link: https://arxiv.org/abs/2401.10460
  • Pdf link: https://arxiv.org/pdf/2401.10460
  • Abstract Neural vocoders model the raw audio waveform and synthesize high-quality audio, but even the highly efficient ones, like MB-MelGAN and LPCNet, fail to run real-time on a low-end device like a smartglass. A pure digital signal processing (DSP) based vocoder can be implemented via lightweight fast Fourier transforms (FFT), and therefore, is a magnitude faster than any neural vocoder. A DSP vocoder often gets a lower audio quality due to consuming over-smoothed acoustic model predictions of approximate representations for the vocal tract. In this paper, we propose an ultra-lightweight differential DSP (DDSP) vocoder that uses a jointly optimized acoustic model with a DSP vocoder, and learns without an extracted spectral feature for the vocal tract. The model achieves audio quality comparable to neural vocoders with a high average MOS of 4.36 while being efficient as a DSP vocoder. Our C++ implementation, without any hardware-specific optimization, is at 15 MFLOPS, surpasses MB-MelGAN by 340 times in terms of FLOPS, and achieves a vocoder-only RTF of 0.003 and overall RTF of 0.044 while running single-threaded on a 2GHz Intel Xeon CPU.

A match made in consistency heaven: when large language models meet evolutionary algorithms

  • Authors: Authors: Wang Chao, Jiaxuan Zhao, Licheng Jiao, Lingling Li, Fang Liu, Shuyuan Yang
  • Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.10510
  • Pdf link: https://arxiv.org/pdf/2401.10510
  • Abstract Pre-trained large language models (LLMs) have powerful capabilities for generating creative natural text. Evolutionary algorithms (EAs) can discover diverse solutions to complex real-world problems. Motivated by the common collective and directionality of text sequence generation and evolution, this paper illustrates the strong consistency of LLMs and EAs, which includes multiple one-to-one key characteristics: token embedding and genotype-phenotype mapping, position encoding and fitness shaping, position embedding and selection, attention and crossover, feed-forward neural network and mutation, model training and parameter update, and multi-task learning and multi-objective optimization. Based on this consistency perspective, existing coupling studies are analyzed, including evolutionary fine-tuning and LLM-enhanced EAs. Leveraging these insights, we outline a fundamental roadmap for future research in coupling LLMs and EAs, while highlighting key challenges along the way. The consistency not only reveals the evolution mechanism behind LLMs but also facilitates the development of evolved artificial agents that approach or surpass biological organisms.

Quality-Diversity Algorithms Can Provably Be Helpful for Optimization

  • Authors: Authors: Chao Qian, Ke Xue, Ren-Jian Wang
  • Subjects: Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2401.10539
  • Pdf link: https://arxiv.org/pdf/2401.10539
  • Abstract Quality-Diversity (QD) algorithms are a new type of Evolutionary Algorithms (EAs), aiming to find a set of high-performing, yet diverse solutions. They have found many successful applications in reinforcement learning and robotics, helping improve the robustness in complex environments. Furthermore, they often empirically find a better overall solution than traditional search algorithms which explicitly search for a single highest-performing solution. However, their theoretical analysis is far behind, leaving many fundamental questions unexplored. In this paper, we try to shed some light on the optimization ability of QD algorithms via rigorous running time analysis. By comparing the popular QD algorithm MAP-Elites with $(\mu+1)$-EA (a typical EA focusing on finding better objective values only), we prove that on two NP-hard problem classes with wide applications, i.e., monotone approximately submodular maximization with a size constraint, and set cover, MAP-Elites can achieve the (asymptotically) optimal polynomial-time approximation ratio, while $(\mu+1)$-EA requires exponential expected time on some instances. This provides theoretical justification for that QD algorithms can be helpful for optimization, and discloses that the simultaneous search for high-performing solutions with diverse behaviors can provide stepping stones to good overall solutions and help avoid local optima.

PhoGAD: Graph-based Anomaly Behavior Detection with Persistent Homology Optimization

  • Authors: Authors: Ziqi Yuan, Haoyi Zhou, Tianyu Chen, Jianxin Li
  • Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2401.10547
  • Pdf link: https://arxiv.org/pdf/2401.10547
  • Abstract A multitude of toxic online behaviors, ranging from network attacks to anonymous traffic and spam, have severely disrupted the smooth operation of networks. Due to the inherent sender-receiver nature of network behaviors, graph-based frameworks are commonly used for detecting anomalous behaviors. However, in real-world scenarios, the boundary between normal and anomalous behaviors tends to be ambiguous. The local heterophily of graphs interferes with the detection, and existing methods based on nodes or edges introduce unwanted noise into representation results, thereby impacting the effectiveness of detection. To address these issues, we propose PhoGAD, a graph-based anomaly detection framework. PhoGAD leverages persistent homology optimization to clarify behavioral boundaries. Building upon this, the weights of adjacent edges are designed to mitigate the effects of local heterophily. Subsequently, to tackle the noise problem, we conduct a formal analysis and propose a disentangled representation-based explicit embedding method, ultimately achieving anomaly behavior detection. Experiments on intrusion, traffic, and spam datasets verify that PhoGAD has surpassed the performance of state-of-the-art (SOTA) frameworks in detection efficacy. Notably, PhoGAD demonstrates robust detection even with diminished anomaly proportions, highlighting its applicability to real-world scenarios. The analysis of persistent homology demonstrates its effectiveness in capturing the topological structure formed by normal edge features. Additionally, ablation experiments validate the effectiveness of the innovative mechanisms integrated within PhoGAD.

PHOENIX: Open-Source Language Adaption for Direct Preference Optimization

  • Authors: Authors: Matthias Uhlig, Sigurd Schacht, Sudarshan Kamath Barkur
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2401.10580
  • Pdf link: https://arxiv.org/pdf/2401.10580
  • Abstract Large language models have gained immense importance in recent years and have demonstrated outstanding results in solving various tasks. However, despite these achievements, many questions remain unanswered in the context of large language models. Besides the optimal use of the models for inference and the alignment of the results to the desired specifications, the transfer of models to other languages is still an underdeveloped area of research. The recent publication of models such as Llama-2 and Zephyr has provided new insights into architectural improvements and the use of human feedback. However, insights into adapting these techniques to other languages remain scarce. In this paper, we build on latest improvements and apply the Direct Preference Optimization(DPO) approach to the German language. The model is available at https://huggingface.co/DRXD1000/Phoenix.

Rethinking the Soft Conflict Pseudo Boolean Constraint on MaxSAT Local Search Solvers

  • Authors: Authors: Jiongzhi Zheng, Zhuo Chen, Chu-Min Li, Kun He
  • Subjects: Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2401.10589
  • Pdf link: https://arxiv.org/pdf/2401.10589
  • Abstract MaxSAT is an optimization version of the famous NP-complete Satisfiability problem (SAT). Algorithms for MaxSAT mainly include complete solvers and local search incomplete solvers. In many complete solvers, once a better solution is found, a Soft conflict Pseudo Boolean (SPB) constraint will be generated to enforce the algorithm to find better solutions. In many local search algorithms, clause weighting is a key technique for effectively guiding the search directions. In this paper, we propose to transfer the SPB constraint into the clause weighting system of the local search method, leading the algorithm to better solutions. We further propose an adaptive clause weighting strategy that breaks the tradition of using constant values to adjust clause weights. Based on the above methods, we propose a new local search algorithm called SPB-MaxSAT that provides new perspectives for clause weighting on MaxSAT local search solvers. Extensive experiments demonstrate the excellent performance of the proposed methods.

Interventional Fairness on Partially Known Causal Graphs: A Constrained Optimization Approach

  • Authors: Authors: Aoqi Zuo, Yiqing Li, Susan Wei, Mingming Gong
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.10632
  • Pdf link: https://arxiv.org/pdf/2401.10632
  • Abstract Fair machine learning aims to prevent discrimination against individuals or sub-populations based on sensitive attributes such as gender and race. In recent years, causal inference methods have been increasingly used in fair machine learning to measure unfairness by causal effects. However, current methods assume that the true causal graph is given, which is often not true in real-world applications. To address this limitation, this paper proposes a framework for achieving causal fairness based on the notion of interventions when the true causal graph is partially known. The proposed approach involves modeling fair prediction using a Partially Directed Acyclic Graph (PDAG), specifically, a class of causal DAGs that can be learned from observational data combined with domain knowledge. The PDAG is used to measure causal fairness, and a constrained optimization problem is formulated to balance between fairness and accuracy. Results on both simulated and real-world datasets demonstrate the effectiveness of this method.

Maximizing Real-Time Video QoE via Bandwidth Sharing under Markovian setting

  • Authors: Authors: Sushi Anna George, Vinay Joseph
  • Subjects: Networking and Internet Architecture (cs.NI)
  • Arxiv link: https://arxiv.org/abs/2401.10681
  • Pdf link: https://arxiv.org/pdf/2401.10681
  • Abstract We consider the problem of optimizing Quality of Experience (QoE) of clients streaming real-time video, served by networks managed by different operators that can share bandwidth with each other. The abundance of real-time video traffic is evident in the popularity of applications like video conferencing and video streaming of live events, which have increased significantly since the recent pandemic. We model the problem as a joint optimization of resource allocation for the clients and bandwidth sharing across the operators, with special attention to how the resource allocation impacts clients' perceived video quality. We propose an online policy as a solution, which involves dynamically sharing a portion of one operator's bandwidth with another operator. We provide strong theoretical optimality guarantees for the policy. We also use extensive simulations to demonstrate the policy's substantial performance improvements (of up to ninety percent), and identify insights into key system parameters (e.g., imbalance in arrival rates or channel conditions of the operators) that dictate the improvements.

Manipulating Sparse Double Descent

  • Authors: Authors: Ya Shi Zhang
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.10686
  • Pdf link: https://arxiv.org/pdf/2401.10686
  • Abstract This paper investigates the double descent phenomenon in two-layer neural networks, focusing on the role of L1 regularization and representation dimensions. It explores an alternative double descent phenomenon, named sparse double descent. The study emphasizes the complex relationship between model complexity, sparsity, and generalization, and suggests further research into more diverse models and datasets. The findings contribute to a deeper understanding of neural network training and optimization.

Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model

  • Authors: Authors: Yinan Zheng, Jianxiong Li, Dongjie Yu, Yujie Yang, Shengbo Eben Li, Xianyuan Zhan, Jingjing Liu
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2401.10700
  • Pdf link: https://arxiv.org/pdf/2401.10700
  • Abstract Safe offline RL is a promising way to bypass risky online interactions towards safe policy learning. Most existing methods only enforce soft constraints, i.e., constraining safety violations in expectation below thresholds predetermined. This can lead to potentially unsafe outcomes, thus unacceptable in safety-critical scenarios. An alternative is to enforce the hard constraint of zero violation. However, this can be challenging in offline setting, as it needs to strike the right balance among three highly intricate and correlated aspects: safety constraint satisfaction, reward maximization, and behavior regularization imposed by offline datasets. Interestingly, we discover that via reachability analysis of safe-control theory, the hard safety constraint can be equivalently translated to identifying the largest feasible region given the offline dataset. This seamlessly converts the original trilogy problem to a feasibility-dependent objective, i.e., maximizing reward value within the feasible region while minimizing safety risks in the infeasible region. Inspired by these, we propose FISOR (FeasIbility-guided Safe Offline RL), which allows safety constraint adherence, reward maximization, and offline policy learning to be realized via three decoupled processes, while offering strong safety performance and stability. In FISOR, the optimal policy for the translated optimization problem can be derived in a special form of weighted behavior cloning. Thus, we propose a novel energy-guided diffusion model that does not require training a complicated time-dependent classifier to extract the policy, greatly simplifying the training. We compare FISOR against baselines on DSRL benchmark for safe offline RL. Evaluation results show that FISOR is the only method that can guarantee safety satisfaction in all tasks, while achieving top returns in most tasks.

Empowering Aggregators with Practical Data-Driven Tools: Harnessing Aggregated and Disaggregated Flexibility for Demand Response

  • Authors: Authors: Costas Mylonas, Donata Boric, Leila Luttenberger Maric, Alexandros Tsitsanis, Eleftheria Petrianou, Magda Foti
  • Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.10726
  • Pdf link: https://arxiv.org/pdf/2401.10726
  • Abstract This study explores the crucial interplay between aggregators and building occupants in activating flexibility through Demand Response (DR) programs, with a keen focus on achieving robust decarbonization and fortifying the resilience of the energy system amidst the uncertainties presented by Renewable Energy Sources (RES). Firstly, it introduces a methodology of optimizing aggregated flexibility provision strategies in environments with limited data, utilizing Discrete Fourier Transformation (DFT) and clustering techniques to identify building occupant's activity patterns. Secondly, the study assesses the disaggregated flexibility provision of Heating Ventilation and Air Conditioning (HVAC) systems during DR events, employing machine learning and optimization techniques for precise, device-level analysis. The first approach offers a non-intrusive pathway for aggregators to provide flexibility services in environments of a single smart meter for the whole building's consumption, while the second approach carefully considers building occupants' thermal comfort profiles, while maximizing flexibility in case of existence of dedicated smart meters to the HVAC systems. Through the application of data-driven techniques and encompassing case studies from both industrial and residential buildings, this paper not only unveils pivotal opportunities for aggregators in the balancing and emerging flexibility markets but also successfully develops end-to-end practical tools for aggregators. Furthermore, the efficacy of this tool is validated through detailed case studies, substantiating its operational capability and contributing to the evolution of a resilient and efficient energy system.

Dynamic Q&A of Clinical Documents with Large Language Models

  • Authors: Authors: Ran Elgedawy, Sudarshan Srinivasan, Ioana Danciu
  • Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2401.10733
  • Pdf link: https://arxiv.org/pdf/2401.10733
  • Abstract Electronic health records (EHRs) house crucial patient data in clinical notes. As these notes grow in volume and complexity, manual extraction becomes challenging. This work introduces a natural language interface using large language models (LLMs) for dynamic question-answering on clinical notes. Our chatbot, powered by Langchain and transformer-based LLMs, allows users to query in natural language, receiving relevant answers from clinical notes. Experiments, utilizing various embedding models and advanced LLMs, show Wizard Vicuna's superior accuracy, albeit with high compute demands. Model optimization, including weight quantization, improves latency by approximately 48 times. Promising results indicate potential, yet challenges such as model hallucinations and limited diverse medical case evaluations remain. Addressing these gaps is crucial for unlocking the value in clinical notes and advancing AI-driven clinical decision-making.

A Low-Frequency-Stable Higher-Order Spline-Based Integral Equation Method

  • Authors: Authors: Maximilian Nolte, Riccardo Torchio, Sebastian Schöps, Jürgen Dölz, Felix Wolf, Albert E. Ruehli
  • Subjects: Computational Engineering, Finance, and Science (cs.CE); Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2401.10735
  • Pdf link: https://arxiv.org/pdf/2401.10735
  • Abstract This contribution investigates the connection between Isogeometric Analysis and Integral Equation methods for full-wave electromagnetic problems. The proposed spline-based integral equation method allows for an exact representation of the model geometry described in terms of Non-Uniform Rational B-Splines without meshing. This is particularly useful when high accuracy is required or when meshing is cumbersome for instance during optimization of electric components. The Augmented Electric Field Integral Equation is adopted, so the low-frequency breakdown is avoided. The extension to higher-order basis functions is analyzed and the convergence rate discussed. The analogy with the Partial Element Equivalent Circuit method for the lowest-order case is established, allowing for a circuit interpretation while maintaining the exact representation of geometry even for coarse discretizations. Numerical experiments on academic and realistic test cases demonstrate the high accuracy of the proposed approach.

Fast gradient-free activation maximization for neurons in spiking neural networks

  • Authors: Authors: Nikita Pospelov, Andrei Chertkov, Maxim Beketov, Ivan Oseledets, Konstantin Anokhin
  • Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.10748
  • Pdf link: https://arxiv.org/pdf/2401.10748
  • Abstract Neural networks (NNs), both living and artificial, work due to being complex systems of neurons, each having its own specialization. Revealing these specializations is important for understanding NNs inner working mechanisms. The only way to do this for a living system, the neural response of which to a stimulus is not a known (let alone differentiable) function is to build a feedback loop of exposing it to stimuli, the properties of which can be iteratively varied aiming in the direction of maximal response. To test such a loop on a living network, one should first learn how to run it quickly and efficiently, reaching most effective stimuli (ones that maximize certain neurons activation) in least possible number of iterations. We present a framework with an effective design of such a loop, successfully testing it on an artificial spiking neural network (SNN, a model that mimics the behaviour of NNs in living brains). Our optimization method used for activation maximization (AM) was based on low-rank tensor decomposition (Tensor Train, TT) of the activation function's discretization over its domain the latent parameter space of stimuli (CIFAR10-size color images, generated by either VQ-VAE or SN-GAN from their latent description vectors, fed to the SNN). To our knowledge, the present work is the first attempt to perform effective AM for SNNs. The source code of our framework, MANGO (for Maximization of neural Activation via Non-Gradient Optimization) is available on GitHub.

BoolGebra: Attributed Graph-learning for Boolean Algebraic Manipulation

  • Authors: Authors: Yingjie Li, Anthony Agnesina, Yanqing Zhang, Haoxing Ren, Cunxi Yu
  • Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.10753
  • Pdf link: https://arxiv.org/pdf/2401.10753
  • Abstract Boolean algebraic manipulation is at the core of logic synthesis in Electronic Design Automation (EDA) design flow. Existing methods struggle to fully exploit optimization opportunities, and often suffer from an explosive search space and limited scalability efficiency. This work presents BoolGebra, a novel attributed graph-learning approach for Boolean algebraic manipulation that aims to improve fundamental logic synthesis. BoolGebra incorporates Graph Neural Networks (GNNs) and takes initial feature embeddings from both structural and functional information as inputs. A fully connected neural network is employed as the predictor for direct optimization result predictions, significantly reducing the search space and efficiently locating the optimization space. The experiments involve training the BoolGebra model w.r.t design-specific and cross-design inferences using the trained model, where BoolGebra demonstrates generalizability for cross-design inference and its potential to scale from small, simple training datasets to large, complex inference datasets. Finally, BoolGebra is integrated with existing synthesis tool ABC to perform end-to-end logic minimization evaluation w.r.t SOTA baselines.

Semantic-Aware Resource Allocation in Constrained Networks with Limited User Participation

  • Authors: Authors: Ouiame Marnissi, Hajar EL Hammouti, El Houcine Bergou
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2401.10766
  • Pdf link: https://arxiv.org/pdf/2401.10766
  • Abstract Semantic communication has gained attention as a key enabler for intelligent and context-aware communication. However, one of the key challenges of semantic communications is the need to tailor the resource allocation to meet the specific requirements of semantic transmission. In this paper, we focus on networks with limited resources where devices are constrained to transmit with limited bandwidth and power over large distance. Specifically, we devise an efficient strategy to select the most pertinent semantic features and participating users, taking into account the channel quality, the transmission time, and the recovery accuracy. To this end, we formulate an optimization problem with the goal of selecting the most relevant and accurate semantic features over devices while satisfying constraints on transmission time and quality of the channel. This involves optimizing communication resources, identifying participating users, and choosing specific semantic information for transmission. The underlying problem is inherently complex due to its non-convex nature and combinatorial constraints. To overcome this challenge, we efficiently approximate the optimal solution by solving a series of integer linear programming problems. Our numerical findings illustrate the effectiveness and efficiency of our approach in managing semantic communications in networks with limited resources.

Reconfigurable Intelligent Surface (RIS)-Assisted Entanglement Distribution in FSO Quantum Networks

  • Authors: Authors: Mahdi Chehimi, Mohamed Elhattab, Walid Saad, Gayane Vardoyan, Nitish K. Panigrahy, Chadi Assi, Don Towsley
  • Subjects: Networking and Internet Architecture (cs.NI); Quantum Physics (quant-ph)
  • Arxiv link: https://arxiv.org/abs/2401.10823
  • Pdf link: https://arxiv.org/pdf/2401.10823
  • Abstract Quantum networks (QNs) relying on free-space optical (FSO) quantum channels can support quantum applications in environments wherein establishing an optical fiber infrastructure is challenging and costly. However, FSO-based QNs require a clear line-of-sight (LoS) between users, which is challenging due to blockages and natural obstacles. In this paper, a reconfigurable intelligent surface (RIS)-assisted FSO-based QN is proposed as a cost-efficient framework providing a virtual LoS between users for entanglement distribution. A novel modeling of the quantum noise and losses experienced by quantum states over FSO channels defined by atmospheric losses, turbulence, and pointing errors is derived. Then, the joint optimization of entanglement distribution and RIS placement problem is formulated, under heterogeneous entanglement rate and fidelity constraints. This problem is solved using a simulated annealing metaheuristic algorithm. Simulation results show that the proposed framework effectively meets the minimum fidelity requirements of all users' quantum applications. This is in stark contrast to baseline algorithms that lead to a drop of at least 83% in users' end-to-end fidelities. The proposed framework also achieves a 64% enhancement in the fairness level between users compared to baseline rate maximizing frameworks. Finally, the weather conditions, e.g., rain, are observed to have a more significant effect than pointing errors and turbulence.

Symbolic Cognitive Diagnosis via Hybrid Optimization for Intelligent Education Systems

  • Authors: Authors: Junhao Shen, Hong Qian, Wei Zhang, Aimin Zhou
  • Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.10840
  • Pdf link: https://arxiv.org/pdf/2401.10840
  • Abstract Cognitive diagnosis assessment is a fundamental and crucial task for student learning. It models the student-exercise interaction, and discovers the students' proficiency levels on each knowledge attribute. In real-world intelligent education systems, generalization and interpretability of cognitive diagnosis methods are of equal importance. However, most existing methods can hardly make the best of both worlds due to the complicated student-exercise interaction. To this end, this paper proposes a symbolic cognitive diagnosis~(SCD) framework to simultaneously enhance generalization and interpretability. The SCD framework incorporates the symbolic tree to explicably represent the complicated student-exercise interaction function, and utilizes gradient-based optimization methods to effectively learn the student and exercise parameters. Meanwhile, the accompanying challenge is that we need to tunnel the discrete symbolic representation and continuous parameter optimization. To address this challenge, we propose to hybridly optimize the representation and parameters in an alternating manner. To fulfill SCD, it alternately learns the symbolic tree by derivative-free genetic programming and learns the student and exercise parameters via gradient-based Adam. The extensive experimental results on various real-world datasets show the superiority of SCD on both generalization and interpretability. The ablation study verifies the efficacy of each ingredient in SCD, and the case study explicitly showcases how the interpretable ability of SCD works.

Reinforcement learning for question answering in programming domain using public community scoring as a human feedback

  • Authors: Authors: Alexey Gorbatovski, Sergey Kovalchuk
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2401.10882
  • Pdf link: https://arxiv.org/pdf/2401.10882
  • Abstract In this study, we investigate the enhancement of the GPT Neo 125M performance in Community Question Answering (CQA) with a focus on programming, through the integration of Reinforcement Learning from Human Feedback (RLHF) and the utilization of scores from Stack Overflow. Two distinct reward model training strategies are employed for fine-tuning with Proximal Policy Optimization (PPO). Notably, the improvements in performance achieved through this method are comparable to those of GPT Neo 2.7B parameter variant. Additionally, an auxiliary scoring mechanism is introduced, which demonstrates the limitations of conventional linguistic metrics in evaluating responses in the programming domain. Through accurate analysis, this paper looks at the divergence between traditional linguistic metrics and our human-preferences-based reward model, underscoring the imperative for domain-specific evaluation methods. By elucidating the complexities involved in applying RLHF to programming CQA and accentuating the significance of context-aware evaluation, this study contributes to the ongoing efforts in refining Large Language Models through focused human feedback.

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

  • Authors: Authors: Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2401.10891
  • Pdf link: https://arxiv.org/pdf/2401.10891
  • Abstract This work presents Depth Anything, a highly practical solution for robust monocular depth estimation. Without pursuing novel technical modules, we aim to build a simple yet powerful foundation model dealing with any images under any circumstances. To this end, we scale up the dataset by designing a data engine to collect and automatically annotate large-scale unlabeled data (~62M), which significantly enlarges the data coverage and thus is able to reduce the generalization error. We investigate two simple yet effective strategies that make data scaling-up promising. First, a more challenging optimization target is created by leveraging data augmentation tools. It compels the model to actively seek extra visual knowledge and acquire robust representations. Second, an auxiliary supervision is developed to enforce the model to inherit rich semantic priors from pre-trained encoders. We evaluate its zero-shot capabilities extensively, including six public datasets and randomly captured photos. It demonstrates impressive generalization ability. Further, through fine-tuning it with metric depth information from NYUv2 and KITTI, new SOTAs are set. Our better depth model also results in a better depth-conditioned ControlNet. Our models are released at https://github.com/LiheYoung/Depth-Anything.

Keyword: adam

Symbolic Cognitive Diagnosis via Hybrid Optimization for Intelligent Education Systems

  • Authors: Authors: Junhao Shen, Hong Qian, Wei Zhang, Aimin Zhou
  • Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.10840
  • Pdf link: https://arxiv.org/pdf/2401.10840
  • Abstract Cognitive diagnosis assessment is a fundamental and crucial task for student learning. It models the student-exercise interaction, and discovers the students' proficiency levels on each knowledge attribute. In real-world intelligent education systems, generalization and interpretability of cognitive diagnosis methods are of equal importance. However, most existing methods can hardly make the best of both worlds due to the complicated student-exercise interaction. To this end, this paper proposes a symbolic cognitive diagnosis~(SCD) framework to simultaneously enhance generalization and interpretability. The SCD framework incorporates the symbolic tree to explicably represent the complicated student-exercise interaction function, and utilizes gradient-based optimization methods to effectively learn the student and exercise parameters. Meanwhile, the accompanying challenge is that we need to tunnel the discrete symbolic representation and continuous parameter optimization. To address this challenge, we propose to hybridly optimize the representation and parameters in an alternating manner. To fulfill SCD, it alternately learns the symbolic tree by derivative-free genetic programming and learns the student and exercise parameters via gradient-based Adam. The extensive experimental results on various real-world datasets show the superiority of SCD on both generalization and interpretability. The ablation study verifies the efficacy of each ingredient in SCD, and the case study explicitly showcases how the interpretable ability of SCD works.

Keyword: gradient

Zero Bubble Pipeline Parallelism

  • Authors: Authors: Penghui Qi, Xinyi Wan, Guangxing Huang, Min Lin
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.10241
  • Pdf link: https://arxiv.org/pdf/2401.10241
  • Abstract Pipeline parallelism is one of the key components for large-scale distributed training, yet its efficiency suffers from pipeline bubbles which were deemed inevitable. In this work, we introduce a scheduling strategy that, to our knowledge, is the first to successfully achieve zero pipeline bubbles under synchronous training semantics. The key idea behind this improvement is to split the backward computation into two parts, one that computes gradient for the input and another that computes for the parameters. Based on this idea, we handcraft novel pipeline schedules that significantly outperform the baseline methods. We further develop an algorithm that automatically finds an optimal schedule based on specific model configuration and memory limit. Additionally, to truly achieve zero bubble, we introduce a novel technique to bypass synchronizations during the optimizer step. Experimental evaluations show that our method outperforms the 1F1B schedule up to 23% in throughput under a similar memory limit. This number can be further pushed to 31% when the memory constraint is relaxed. We believe our results mark a major step forward in harnessing the true potential of pipeline parallelism. We open sourced our implementation based on the popular Megatron-LM repository on https://github.com/sail-sg/zero-bubble-pipeline-parallelism.

Multi-Source Collaborative Gradient Discrepancy Minimization for Federated Domain Generalization

  • Authors: Authors: Yikang Wei, Yahong Han
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2401.10272
  • Pdf link: https://arxiv.org/pdf/2401.10272
  • Abstract Federated Domain Generalization aims to learn a domain-invariant model from multiple decentralized source domains for deployment on unseen target domain. Due to privacy concerns, the data from different source domains are kept isolated, which poses challenges in bridging the domain gap. To address this issue, we propose a Multi-source Collaborative Gradient Discrepancy Minimization (MCGDM) method for federated domain generalization. Specifically, we propose intra-domain gradient matching between the original images and augmented images to avoid overfitting the domain-specific information within isolated domains. Additionally, we propose inter-domain gradient matching with the collaboration of other domains, which can further reduce the domain shift across decentralized domains. Combining intra-domain and inter-domain gradient matching, our method enables the learned model to generalize well on unseen domains. Furthermore, our method can be extended to the federated domain adaptation task by fine-tuning the target model on the pseudo-labeled target domain. The extensive experiments on federated domain generalization and adaptation indicate that our method outperforms the state-of-the-art methods significantly.

Hacking Predictors Means Hacking Cars: Using Sensitivity Analysis to Identify Trajectory Prediction Vulnerabilities for Autonomous Driving Security

  • Authors: Authors: Marsalis Gibson, David Babazadeh, Claire Tomlin, Shankar Sastry
  • Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2401.10313
  • Pdf link: https://arxiv.org/pdf/2401.10313
  • Abstract Adversarial attacks on learning-based trajectory predictors have already been demonstrated. However, there are still open questions about the effects of perturbations on trajectory predictor inputs other than state histories, and how these attacks impact downstream planning and control. In this paper, we conduct a sensitivity analysis on two trajectory prediction models, Trajectron++ and AgentFormer. We observe that between all inputs, almost all of the perturbation sensitivities for Trajectron++ lie only within the most recent state history time point, while perturbation sensitivities for AgentFormer are spread across state histories over time. We additionally demonstrate that, despite dominant sensitivity on state history perturbations, an undetectable image map perturbation made with the Fast Gradient Sign Method can induce large prediction error increases in both models. Even though image maps may contribute slightly to the prediction output of both models, this result reveals that rather than being robust to adversarial image perturbations, trajectory predictors are susceptible to image attacks. Using an optimization-based planner and example perturbations crafted from sensitivity results, we show how this vulnerability can cause a vehicle to come to a sudden stop from moderate driving speeds.

Hierarchical Federated Learning in Multi-hop Cluster-Based VANETs

  • Authors: Authors: M. Saeid HaghighiFard, Sinem Coleri
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2401.10361
  • Pdf link: https://arxiv.org/pdf/2401.10361
  • Abstract The usage of federated learning (FL) in Vehicular Ad hoc Networks (VANET) has garnered significant interest in research due to the advantages of reducing transmission overhead and protecting user privacy by communicating local dataset gradients instead of raw data. However, implementing FL in VANETs faces challenges, including limited communication resources, high vehicle mobility, and the statistical diversity of data distributions. In order to tackle these issues, this paper introduces a novel framework for hierarchical federated learning (HFL) over multi-hop clustering-based VANET. The proposed method utilizes a weighted combination of the average relative speed and cosine similarity of FL model parameters as a clustering metric to consider both data diversity and high vehicle mobility. This metric ensures convergence with minimum changes in cluster heads while tackling the complexities associated with non-independent and identically distributed (non-IID) data scenarios. Additionally, the framework includes a novel mechanism to manage seamless transitions of cluster heads (CHs), followed by transferring the most recent FL model parameter to the designated CH. Furthermore, the proposed approach considers the option of merging CHs, aiming to reduce their count and, consequently, mitigate associated overhead. Through extensive simulations, the proposed hierarchical federated learning over clustered VANET has been demonstrated to improve accuracy and convergence time significantly while maintaining an acceptable level of packet overhead compared to previously proposed clustering algorithms and non-clustered VANET.

Langevin Unlearning: A New Perspective of Noisy Gradient Descent for Machine Unlearning

  • Authors: Authors: Eli Chien, Haoyu Wang, Ziang Chen, Pan Li
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.10371
  • Pdf link: https://arxiv.org/pdf/2401.10371
  • Abstract Machine unlearning has raised significant interest with the adoption of laws ensuring the ``right to be forgotten''. Researchers have provided a probabilistic notion of approximate unlearning under a similar definition of Differential Privacy (DP), where privacy is defined as statistical indistinguishability to retraining from scratch. We propose Langevin unlearning, an unlearning framework based on noisy gradient descent with privacy guarantees for approximate unlearning problems. Langevin unlearning unifies the DP learning process and the privacy-certified unlearning process with many algorithmic benefits. These include approximate certified unlearning for non-convex problems, complexity saving compared to retraining, sequential and batch unlearning for multiple unlearning requests. We verify the practicality of Langevin unlearning by studying its privacy-utility-complexity trade-off via experiments on benchmark datasets, and also demonstrate its superiority against gradient-decent-plus-output-perturbation based approximate unlearning.

PuriDefense: Randomized Local Implicit Adversarial Purification for Defending Black-box Query-based Attacks

  • Authors: Authors: Ping Guo, Zhiyuan Yang, Xi Lin, Qingchuan Zhao, Qingfu Zhang
  • Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.10586
  • Pdf link: https://arxiv.org/pdf/2401.10586
  • Abstract Black-box query-based attacks constitute significant threats to Machine Learning as a Service (MLaaS) systems since they can generate adversarial examples without accessing the target model's architecture and parameters. Traditional defense mechanisms, such as adversarial training, gradient masking, and input transformations, either impose substantial computational costs or compromise the test accuracy of non-adversarial inputs. To address these challenges, we propose an efficient defense mechanism, PuriDefense, that employs random patch-wise purifications with an ensemble of lightweight purification models at a low level of inference cost. These models leverage the local implicit function and rebuild the natural image manifold. Our theoretical analysis suggests that this approach slows down the convergence of query-based attacks by incorporating randomness into purifications. Extensive experiments on CIFAR-10 and ImageNet validate the effectiveness of our proposed purifier-based defense mechanism, demonstrating significant improvements in robustness against query-based attacks.

Towards End-to-End GPS Localization with Neural Pseudorange Correction

  • Authors: Authors: Xu Weng, KV Ling, Haochen Liu, Kun Cao
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2401.10685
  • Pdf link: https://arxiv.org/pdf/2401.10685
  • Abstract Pseudorange errors are the root cause of localization inaccuracy in GPS. Previous data-driven methods regress and eliminate pseudorange errors using handcrafted intermediate labels. Unlike them, we propose an end-to-end GPS localization framework, E2E-PrNet, to train a neural network for pseudorange correction (PrNet) directly using the final task loss calculated with the ground truth of GPS receiver states. The gradients of the loss with respect to learnable parameters are backpropagated through a differentiable nonlinear least squares optimizer to PrNet. The feasibility is verified with GPS data collected by Android phones, showing that E2E-PrNet outperforms the state-of-the-art end-to-end GPS localization methods.

Fast gradient-free activation maximization for neurons in spiking neural networks

  • Authors: Authors: Nikita Pospelov, Andrei Chertkov, Maxim Beketov, Ivan Oseledets, Konstantin Anokhin
  • Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.10748
  • Pdf link: https://arxiv.org/pdf/2401.10748
  • Abstract Neural networks (NNs), both living and artificial, work due to being complex systems of neurons, each having its own specialization. Revealing these specializations is important for understanding NNs inner working mechanisms. The only way to do this for a living system, the neural response of which to a stimulus is not a known (let alone differentiable) function is to build a feedback loop of exposing it to stimuli, the properties of which can be iteratively varied aiming in the direction of maximal response. To test such a loop on a living network, one should first learn how to run it quickly and efficiently, reaching most effective stimuli (ones that maximize certain neurons activation) in least possible number of iterations. We present a framework with an effective design of such a loop, successfully testing it on an artificial spiking neural network (SNN, a model that mimics the behaviour of NNs in living brains). Our optimization method used for activation maximization (AM) was based on low-rank tensor decomposition (Tensor Train, TT) of the activation function's discretization over its domain the latent parameter space of stimuli (CIFAR10-size color images, generated by either VQ-VAE or SN-GAN from their latent description vectors, fed to the SNN). To our knowledge, the present work is the first attempt to perform effective AM for SNNs. The source code of our framework, MANGO (for Maximization of neural Activation via Non-Gradient Optimization) is available on GitHub.

Early alignment in two-layer networks training is a two-edged sword

  • Authors: Authors: Etienne Boursier, Nicolas Flammarion
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.10791
  • Pdf link: https://arxiv.org/pdf/2401.10791
  • Abstract Training neural networks with first order optimisation methods is at the core of the empirical success of deep learning. The scale of initialisation is a crucial factor, as small initialisations are generally associated to a feature learning regime, for which gradient descent is implicitly biased towards simple solutions. This work provides a general and quantitative description of the early alignment phase, originally introduced by Maennel et al. (2018) . For small initialisation and one hidden ReLU layer networks, the early stage of the training dynamics leads to an alignment of the neurons towards key directions. This alignment induces a sparse representation of the network, which is directly related to the implicit bias of gradient flow at convergence. This sparsity inducing alignment however comes at the expense of difficulties in minimising the training objective: we also provide a simple data example for which overparameterised networks fail to converge towards global minima and only converge to a spurious stationary point instead.

Neglected Hessian component explains mysteries in Sharpness regularization

  • Authors: Authors: Yann N. Dauphin, Atish Agarwala, Hossein Mobahi
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.10809
  • Pdf link: https://arxiv.org/pdf/2401.10809
  • Abstract Recent work has shown that methods like SAM which either explicitly or implicitly penalize second order information can improve generalization in deep learning. Seemingly similar methods like weight noise and gradient penalties often fail to provide such benefits. We show that these differences can be explained by the structure of the Hessian of the loss. First, we show that a common decomposition of the Hessian can be quantitatively interpreted as separating the feature exploitation from feature exploration. The feature exploration, which can be described by the Nonlinear Modeling Error matrix (NME), is commonly neglected in the literature since it vanishes at interpolation. Our work shows that the NME is in fact important as it can explain why gradient penalties are sensitive to the choice of activation function. Using this insight we design interventions to improve performance. We also provide evidence that challenges the long held equivalence of weight noise and gradient penalties. This equivalence relies on the assumption that the NME can be ignored, which we find does not hold for modern networks since they involve significant feature learning. We find that regularizing feature exploitation but not feature exploration yields performance similar to gradient penalties.

Symbolic Cognitive Diagnosis via Hybrid Optimization for Intelligent Education Systems

  • Authors: Authors: Junhao Shen, Hong Qian, Wei Zhang, Aimin Zhou
  • Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2401.10840
  • Pdf link: https://arxiv.org/pdf/2401.10840
  • Abstract Cognitive diagnosis assessment is a fundamental and crucial task for student learning. It models the student-exercise interaction, and discovers the students' proficiency levels on each knowledge attribute. In real-world intelligent education systems, generalization and interpretability of cognitive diagnosis methods are of equal importance. However, most existing methods can hardly make the best of both worlds due to the complicated student-exercise interaction. To this end, this paper proposes a symbolic cognitive diagnosis~(SCD) framework to simultaneously enhance generalization and interpretability. The SCD framework incorporates the symbolic tree to explicably represent the complicated student-exercise interaction function, and utilizes gradient-based optimization methods to effectively learn the student and exercise parameters. Meanwhile, the accompanying challenge is that we need to tunnel the discrete symbolic representation and continuous parameter optimization. To address this challenge, we propose to hybridly optimize the representation and parameters in an alternating manner. To fulfill SCD, it alternately learns the symbolic tree by derivative-free genetic programming and learns the student and exercise parameters via gradient-based Adam. The extensive experimental results on various real-world datasets show the superiority of SCD on both generalization and interpretability. The ablation study verifies the efficacy of each ingredient in SCD, and the case study explicitly showcases how the interpretable ability of SCD works.

Keyword: super-resolution

Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

  • Authors: Authors: Xin Yuan, Jinoo Baek, Keyang Xu, Omer Tov, Hongliang Fei
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2401.10404
  • Pdf link: https://arxiv.org/pdf/2401.10404
  • Abstract We propose an efficient diffusion-based text-to-video super-resolution (SR) tuning approach that leverages the readily learned capacity of pixel level image diffusion model to capture spatial information for video generation. To accomplish this goal, we design an efficient architecture by inflating the weightings of the text-to-image SR model into our video generation framework. Additionally, we incorporate a temporal adapter to ensure temporal coherence across video frames. We investigate different tuning approaches based on our inflated architecture and report trade-offs between computational costs and super-resolution quality. Empirical evaluation, both quantitative and qualitative, on the Shutterstock video dataset, demonstrates that our approach is able to perform text-to-video SR generation with good visual quality and temporal consistency. To evaluate temporal coherence, we also present visualizations in video format in https://drive.google.com/drive/folders/1YVc-KMSJqOrEUdQWVaI-Yfu8Vsfu_1aO?usp=sharing .

zoq avatar Jan 22 '24 07:01 zoq