arxiv-updates icon indicating copy to clipboard operation
arxiv-updates copied to clipboard

New submissions for Thu, 21 Dec 23

Open zoq opened this issue 1 year ago • 0 comments

Keyword: sgd

Spectral Prompt Tuning:Unveiling Unseen Classes for Zero-Shot Semantic Segmentation

  • Authors: Authors: Wenhao Xu, Rongtao Xu, Changwei Wang, Shibiao Xu, Li Guo, Man Zhang, Xiaopeng Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2312.12754
  • Pdf link: https://arxiv.org/pdf/2312.12754
  • Abstract Recently, CLIP has found practical utility in the domain of pixel-level zero-shot segmentation tasks. The present landscape features two-stage methodologies beset by issues such as intricate pipelines and elevated computational costs. While current one-stage approaches alleviate these concerns and incorporate Visual Prompt Training (VPT) to uphold CLIP's generalization capacity, they still fall short in fully harnessing CLIP's potential for pixel-level unseen class demarcation and precise pixel predictions. To further stimulate CLIP's zero-shot dense prediction capability, we propose SPT-SEG, a one-stage approach that improves CLIP's adaptability from image to pixel. Specifically, we initially introduce Spectral Prompt Tuning (SPT), incorporating spectral prompts into the CLIP visual encoder's shallow layers to capture structural intricacies of images, thereby enhancing comprehension of unseen classes. Subsequently, we introduce the Spectral Guided Decoder (SGD), utilizing both high and low-frequency information to steer the network's spatial focus towards more prominent classification features, enabling precise pixel-level prediction outcomes. Through extensive experiments on two public datasets, we demonstrate the superiority of our method over state-of-the-art approaches, performing well across all classes and particularly excelling in handling unseen classes. Code is available at:https://github.com/clearxu/SPT.

Keyword: optimization

Democratize with Care: The need for fairness specific features in user-interface based open source AutoML tools

  • Authors: Authors: Sundaraparipurnan Narayanan
  • Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.12460
  • Pdf link: https://arxiv.org/pdf/2312.12460
  • Abstract AI is increasingly playing a pivotal role in businesses and organizations, impacting the outcomes and interests of human users. Automated Machine Learning (AutoML) streamlines the machine learning model development process by automating repetitive tasks and making data-driven decisions, enabling even non-experts to construct high-quality models efficiently. This democratization allows more users (including non-experts) to access and utilize state-of-the-art machine-learning expertise. However, AutoML tools may also propagate bias in the way these tools handle the data, model choices, and optimization approaches adopted. We conducted an experimental study of User-interface-based open source AutoML tools (DataRobot, H2O Studio, Dataiku, and Rapidminer Studio) to examine if they had features to assist users in developing fairness-aware machine learning models. The experiments covered the following considerations for the evaluation of features: understanding use case context, data representation, feature relevance and sensitivity, data bias and preprocessing techniques, data handling capabilities, training-testing split, hyperparameter handling, and constraints, fairness-oriented model development, explainability and ability to download and edit models by the user. The results revealed inadequacies in features that could support in fairness-aware model development. Further, the results also highlight the need to establish certain essential features for promoting fairness in AutoML tools.

Learning to Reweight for Graph Neural Network

  • Authors: Authors: Zhengyu Chen, Teng Xiao, Kun Kuang, Zheqi Lv, Min Zhang, Jinluan Yang, Chengqiang Lu, Hongxia Yang, Fei Wu
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2312.12475
  • Pdf link: https://arxiv.org/pdf/2312.12475
  • Abstract Graph Neural Networks (GNNs) show promising results for graph tasks. However, existing GNNs' generalization ability will degrade when there exist distribution shifts between testing and training graph data. The cardinal impetus underlying the severe degeneration is that the GNNs are architected predicated upon the I.I.D assumptions. In such a setting, GNNs are inclined to leverage imperceptible statistical correlations subsisting in the training set to predict, albeit it is a spurious correlation. In this paper, we study the problem of the generalization ability of GNNs in Out-Of-Distribution (OOD) settings. To solve this problem, we propose the Learning to Reweight for Generalizable Graph Neural Network (L2R-GNN) to enhance the generalization ability for achieving satisfactory performance on unseen testing graphs that have different distributions with training graphs. We propose a novel nonlinear graph decorrelation method, which can substantially improve the out-of-distribution generalization ability and compares favorably to previous methods in restraining the over-reduced sample size. The variables of the graph representation are clustered based on the stability of the correlation, and the graph decorrelation method learns weights to remove correlations between the variables of different clusters rather than any two variables. Besides, we interpose an efficacious stochastic algorithm upon bi-level optimization for the L2R-GNN framework, which facilitates simultaneously learning the optimal weights and GNN parameters, and avoids the overfitting problem. Experimental results show that L2R-GNN greatly outperforms baselines on various graph prediction benchmarks under distribution shifts.

Foreseeing Reconstruction Quality of Gradient Inversion: An Optimization Perspective

  • Authors: Authors: HyeongGwon Hong, Yooshin Cho, Hanbyel Cho, Jaesung Ahn, Junmo Kim
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.12488
  • Pdf link: https://arxiv.org/pdf/2312.12488
  • Abstract Gradient inversion attacks can leak data privacy when clients share weight updates with the server in federated learning (FL). Existing studies mainly use L2 or cosine distance as the loss function for gradient matching in the attack. Our empirical investigation shows that the vulnerability ranking varies with the loss function used. Gradient norm, which is commonly used as a vulnerability proxy for gradient inversion attack, cannot explain this as it remains constant regardless of the loss function for gradient matching. In this paper, we propose a loss-aware vulnerability proxy (LAVP) for the first time. LAVP refers to either the maximum or minimum eigenvalue of the Hessian with respect to gradient matching loss at ground truth. This suggestion is based on our theoretical findings regarding the local optimization of the gradient inversion in proximity to the ground truth, which corresponds to the worst case attack scenario. We demonstrate the effectiveness of LAVP on various architectures and datasets, showing its consistent superiority over the gradient norm in capturing sample vulnerabilities. The performance of each proxy is measured in terms of Spearman's rank correlation with respect to several similarity scores. This work will contribute to enhancing FL security against any potential loss functions beyond L2 or cosine distance in the future.

Tensor Train Decomposition for Adversarial Attacks on Computer Vision Models

  • Authors: Authors: Andrei Chertkov, Ivan Oseledets
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2312.12556
  • Pdf link: https://arxiv.org/pdf/2312.12556
  • Abstract Deep neural networks (DNNs) are widely used today, but they are vulnerable to adversarial attacks. To develop effective methods of defense, it is important to understand the potential weak spots of DNNs. Often attacks are organized taking into account the architecture of models (white-box approach) and based on gradient methods, but for real-world DNNs this approach in most cases is impossible. At the same time, several gradient-free optimization algorithms are used to attack black-box models. However, classical methods are often ineffective in the multidimensional case. To organize black-box attacks for computer vision models, in this work, we propose the use of an optimizer based on the low-rank tensor train (TT) format, which has gained popularity in various practical multidimensional applications in recent years. Combined with the attribution of the target image, which is built by the auxiliary (white-box) model, the TT-based optimization method makes it possible to organize an effective black-box attack by small perturbation of pixels in the target image. The superiority of the proposed approach over three popular baselines is demonstrated for five modern DNNs on the ImageNet dataset.

A Faster Combinatorial Algorithm for Maximum Bipartite Matching

  • Authors: Authors: Julia Chuzhoy, Sanjeev Khanna
  • Subjects: Data Structures and Algorithms (cs.DS)
  • Arxiv link: https://arxiv.org/abs/2312.12584
  • Pdf link: https://arxiv.org/pdf/2312.12584
  • Abstract The maximum bipartite matching problem is among the most fundamental and well-studied problems in combinatorial optimization. A beautiful and celebrated combinatorial algorithm of Hopcroft and Karp (1973) shows that maximum bipartite matching can be solved in $O(m \sqrt{n})$ time on a graph with $n$ vertices and $m$ edges. For the case of very dense graphs, a fast matrix multiplication based approach gives a running time of $O(n^{2.371})$. These results represented the fastest known algorithms for the problem until 2013, when Madry introduced a new approach based on continuous techniques achieving much faster runtime in sparse graphs. This line of research has culminated in a spectacular recent breakthrough due to Chen et al. (2022) that gives an $m^{1+o(1)}$ time algorithm for maximum bipartite matching (and more generally, for min cost flows). This raises a natural question: are continuous techniques essential to obtaining fast algorithms for the bipartite matching problem? Our work makes progress on this question by presenting a new, purely combinatorial algorithm for bipartite matching, that runs in $\tilde{O}(m^{1/3}n^{5/3})$ time, and hence outperforms both Hopcroft-Karp and the fast matrix multiplication based algorithms on moderately dense graphs. Using a standard reduction, we also obtain an $\tilde{O}(m^{1/3}n^{5/3})$ time deterministic algorithm for maximum vertex-capacitated $s$-$t$ flow in directed graphs when all vertex capacities are identical.

Optimizing Neural Networks with Gradient Lexicase Selection

  • Authors: Authors: Li Ding, Lee Spector
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2312.12606
  • Pdf link: https://arxiv.org/pdf/2312.12606
  • Abstract One potential drawback of using aggregated performance measurement in machine learning is that models may learn to accept higher errors on some training cases as compromises for lower errors on others, with the lower errors actually being instances of overfitting. This can lead to both stagnation at local optima and poor generalization. Lexicase selection is an uncompromising method developed in evolutionary computation, which selects models on the basis of sequences of individual training case errors instead of using aggregated metrics such as loss and accuracy. In this paper, we investigate how lexicase selection, in its general form, can be integrated into the context of deep learning to enhance generalization. We propose Gradient Lexicase Selection, an optimization framework that combines gradient descent and lexicase selection in an evolutionary fashion. Our experimental results demonstrate that the proposed method improves the generalization performance of various widely-used deep neural network architectures across three image classification benchmarks. Additionally, qualitative analysis suggests that our method assists networks in learning more diverse representations. Our source code is available on GitHub: https://github.com/ld-ing/gradient-lexicase.

IS-DARTS: Stabilizing DARTS through Precise Measurement on Candidate Importance

  • Authors: Authors: Hongyi He, Longjun Liu, Haonan Zhang, Nanning Zheng
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.12648
  • Pdf link: https://arxiv.org/pdf/2312.12648
  • Abstract Among existing Neural Architecture Search methods, DARTS is known for its efficiency and simplicity. This approach applies continuous relaxation of network representation to construct a weight-sharing supernet and enables the identification of excellent subnets in just a few GPU days. However, performance collapse in DARTS results in deteriorating architectures filled with parameter-free operations and remains a great challenge to the robustness. To resolve this problem, we reveal that the fundamental reason is the biased estimation of the candidate importance in the search space through theoretical and experimental analysis, and more precisely select operations via information-based measurements. Furthermore, we demonstrate that the excessive concern over the supernet and inefficient utilization of data in bi-level optimization also account for suboptimal results. We adopt a more realistic objective focusing on the performance of subnets and simplify it with the help of the information-based measurements. Finally, we explain theoretically why progressively shrinking the width of the supernet is necessary and reduce the approximation error of optimal weights in DARTS. Our proposed method, named IS-DARTS, comprehensively improves DARTS and resolves the aforementioned problems. Extensive experiments on NAS-Bench-201 and DARTS-based search space demonstrate the effectiveness of IS-DARTS.

The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models

  • Authors: Authors: Tolga Ergen, Mert Pilanci
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2312.12657
  • Pdf link: https://arxiv.org/pdf/2312.12657
  • Abstract Due to the non-convex nature of training Deep Neural Network (DNN) models, their effectiveness relies on the use of non-convex optimization heuristics. Traditional methods for training DNNs often require costly empirical methods to produce successful models and do not have a clear theoretical foundation. In this study, we examine the use of convex optimization theory and sparse recovery models to refine the training process of neural networks and provide a better interpretation of their optimal weights. We focus on training two-layer neural networks with piecewise linear activations and demonstrate that they can be formulated as a finite-dimensional convex program. These programs include a regularization term that promotes sparsity, which constitutes a variant of group Lasso. We first utilize semi-infinite programming theory to prove strong duality for finite width neural networks and then we express these architectures equivalently as high dimensional convex sparse recovery models. Remarkably, the worst-case complexity to solve the convex program is polynomial in the number of samples and number of neurons when the rank of the data matrix is bounded, which is the case in convolutional networks. To extend our method to training data of arbitrary rank, we develop a novel polynomial-time approximation scheme based on zonotope subsampling that comes with a guaranteed approximation ratio. We also show that all the stationary of the nonconvex training objective can be characterized as the global optimum of a subsampled convex program. Our convex models can be trained using standard convex solvers without resorting to heuristics or extensive hyper-parameter tuning unlike non-convex methods. Through extensive numerical experiments, we show that convex models can outperform traditional non-convex methods and are not sensitive to optimizer hyperparameters.

Mini-GPTs: Efficient Large Language Models through Contextual Pruning

  • Authors: Authors: Tim Valicenti, Justice Vidal, Ritik Patnaik
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2312.12682
  • Pdf link: https://arxiv.org/pdf/2312.12682
  • Abstract In AI research, the optimization of Large Language Models (LLMs) remains a significant challenge, crucial for advancing the field's practical applications and sustainability. Building upon the foundational work of Professor Song Han's lab at MIT, this paper introduces a novel approach in developing Mini-GPTs via contextual pruning. Our methodology strategically prunes the computational architecture of traditional LLMs, like Phi-1.5, focusing on retaining core functionalities while drastically reducing model sizes. We employ the technique across diverse and complex datasets, including US law, Medical Q&A, Skyrim dialogue, English-Taiwanese translation, and Economics articles. The results underscore the efficiency and effectiveness of contextual pruning, not merely as a theoretical concept but as a practical tool in developing domain-specific, resource-efficient LLMs. Contextual pruning is a promising method for building domain-specific LLMs, and this research is a building block towards future development with more hardware compute, refined fine-tuning, and quantization.

AdvST: Revisiting Data Augmentations for Single Domain Generalization

  • Authors: Authors: Guangtao Zheng, Mengdi Huai, Aidong Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.12720
  • Pdf link: https://arxiv.org/pdf/2312.12720
  • Abstract Single domain generalization (SDG) aims to train a robust model against unknown target domain shifts using data from a single source domain. Data augmentation has been proven an effective approach to SDG. However, the utility of standard augmentations, such as translate, or invert, has not been fully exploited in SDG; practically, these augmentations are used as a part of a data preprocessing procedure. Although it is intuitive to use many such augmentations to boost the robustness of a model to out-of-distribution domain shifts, we lack a principled approach to harvest the benefit brought from multiple these augmentations. Here, we conceptualize standard data augmentations with learnable parameters as semantics transformations that can manipulate certain semantics of a sample, such as the geometry or color of an image. Then, we propose Adversarial learning with Semantics Transformations (AdvST) that augments the source domain data with semantics transformations and learns a robust model with the augmented data. We theoretically show that AdvST essentially optimizes a distributionally robust optimization objective defined on a set of semantics distributions induced by the parameters of semantics transformations. We demonstrate that AdvST can produce samples that expand the coverage on target domain data. Compared with the state-of-the-art methods, AdvST, despite being a simple method, is surprisingly competitive and achieves the best average SDG performance on the Digits, PACS, and DomainNet datasets. Our code is available at https://github.com/gtzheng/AdvST.

Parallel Ranking of Ads and Creatives in Real-Time Advertising Systems

  • Authors: Authors: Zhiguang Yang, Lu Wang, Chun Gan, Liufang Sang, Haoran Wang, Wenlong Chen, Jie He, Changping Peng, Zhangang Lin, Jingping Shao
  • Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2312.12750
  • Pdf link: https://arxiv.org/pdf/2312.12750
  • Abstract "Creativity is the heart and soul of advertising services". Effective creatives can create a win-win scenario: advertisers can reach target users and achieve marketing objectives more effectively, users can more quickly find products of interest, and platforms can generate more advertising revenue. With the advent of AI-Generated Content, advertisers now can produce vast amounts of creative content at a minimal cost. The current challenge lies in how advertising systems can select the most pertinent creative in real-time for each user personally. Existing methods typically perform serial ranking of ads or creatives, limiting the creative module in terms of both effectiveness and efficiency. In this paper, we propose for the first time a novel architecture for online parallel estimation of ads and creatives ranking, as well as the corresponding offline joint optimization model. The online architecture enables sophisticated personalized creative modeling while reducing overall latency. The offline joint model for CTR estimation allows mutual awareness and collaborative optimization between ads and creatives. Additionally, we optimize the offline evaluation metrics for the implicit feedback sorting task involved in ad creative ranking. We conduct extensive experiments to compare ours with two state-of-the-art approaches. The results demonstrate the effectiveness of our approach in both offline evaluations and real-world advertising platforms online in terms of response time, CTR, and CPM.

Mutual-modality Adversarial Attack with Semantic Perturbation

  • Authors: Authors: Jingwen Ye, Ruonan Yu, Songhua Liu, Xinchao Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.12768
  • Pdf link: https://arxiv.org/pdf/2312.12768
  • Abstract Adversarial attacks constitute a notable threat to machine learning systems, given their potential to induce erroneous predictions and classifications. However, within real-world contexts, the essential specifics of the deployed model are frequently treated as a black box, consequently mitigating the vulnerability to such attacks. Thus, enhancing the transferability of the adversarial samples has become a crucial area of research, which heavily relies on selecting appropriate surrogate models. To address this challenge, we propose a novel approach that generates adversarial attacks in a mutual-modality optimization scheme. Our approach is accomplished by leveraging the pre-trained CLIP model. Firstly, we conduct a visual attack on the clean image that causes semantic perturbations on the aligned embedding space with the other textual modality. Then, we apply the corresponding defense on the textual modality by updating the prompts, which forces the re-matching on the perturbed embedding space. Finally, to enhance the attack transferability, we utilize the iterative training strategy on the visual attack and the textual defense, where the two processes optimize from each other. We evaluate our approach on several benchmark datasets and demonstrate that our mutual-modal attack strategy can effectively produce high-transferable attacks, which are stable regardless of the target networks. Our approach outperforms state-of-the-art attack methods and can be readily deployed as a plug-and-play solution.

Fast Cell Library Characterization for Design Technology Co-Optimization Based on Graph Neural Networks

  • Authors: Authors: Tianliang Ma, Zhihui Deng, Xuguang Sun, Leilai Shao
  • Subjects: Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.12784
  • Pdf link: https://arxiv.org/pdf/2312.12784
  • Abstract Design technology co-optimization (DTCO) plays a critical role in achieving optimal power, performance, and area (PPA) for advanced semiconductor process development. Cell library characterization is essential in DTCO flow, but traditional methods are time-consuming and costly. To overcome these challenges, we propose a graph neural network (GNN)-based machine learning model for rapid and accurate cell library characterization. Our model incorporates cell structures and demonstrates high prediction accuracy across various process-voltage-temperature (PVT) corners and technology parameters. Validation with 512 unseen technology corners and over one million test data points shows accurate predictions of delay, power, and input pin capacitance for 33 types of cells, with a mean absolute percentage error (MAPE) $\le$ 0.95% and a speed-up of 100X compared with SPICE simulations. Additionally, we investigate system-level metrics such as worst negative slack (WNS), leakage power, and dynamic power using predictions obtained from the GNN-based model on unseen corners. Our model achieves precise predictions, with absolute error $\le$3.0 ps for WNS, percentage errors $\le$0.60% for leakage power, and $\le$0.99% for dynamic power, when compared to golden reference. With the developed model, we further proposed a fine-grained drive strength interpolation methodology to enhance PPA for small-to-medium-scale designs, resulting in an approximate 1-3% improvement.

Model-Based Control with Sparse Neural Dynamics

  • Authors: Authors: Ziang Liu, Genggeng Zhou, Jeff He, Tobia Marcucci, Li Fei-Fei, Jiajun Wu, Yunzhu Li
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.12791
  • Pdf link: https://arxiv.org/pdf/2312.12791
  • Abstract Learning predictive models from observations using deep neural networks (DNNs) is a promising new approach to many real-world planning and control problems. However, common DNNs are too unstructured for effective planning, and current control methods typically rely on extensive sampling or local gradient descent. In this paper, we propose a new framework for integrated model learning and predictive control that is amenable to efficient optimization algorithms. Specifically, we start with a ReLU neural model of the system dynamics and, with minimal losses in prediction accuracy, we gradually sparsify it by removing redundant neurons. This discrete sparsification process is approximated as a continuous problem, enabling an end-to-end optimization of both the model architecture and the weight parameters. The sparsified model is subsequently used by a mixed-integer predictive controller, which represents the neuron activations as binary variables and employs efficient branch-and-bound algorithms. Our framework is applicable to a wide variety of DNNs, from simple multilayer perceptrons to complex graph neural dynamics. It can efficiently handle tasks involving complicated contact dynamics, such as object pushing, compositional object sorting, and manipulation of deformable objects. Numerical and hardware experiments show that, despite the aggressive sparsification, our framework can deliver better closed-loop performance than existing state-of-the-art methods.

Joint Trading and Scheduling among Coupled Carbon-Electricity-Heat-Gas Industrial Clusters

  • Authors: Authors: Dafeng Zhu, Bo Yang, Yu Wu, Haoran Deng, Zhaoyang Dong, Kai Ma, Xinping Guan
  • Subjects: Systems and Control (eess.SY)
  • Arxiv link: https://arxiv.org/abs/2312.12795
  • Pdf link: https://arxiv.org/pdf/2312.12795
  • Abstract This paper presents a carbon-energy coupling management framework for an industrial park, where the carbon flow model accompanying multi-energy flows is adopted to track and suppress carbon emissions on the user side. To deal with the quadratic constraint of gas flows, a bound tightening algorithm for constraints relaxation is adopted. The synergies among the carbon capture, energy storage, power-to-gas further consume renewable energy and reduce carbon emissions. Aiming at carbon emissions disparities and supply-demand imbalances, this paper proposes a carbon trading ladder reward and punishment mechanism and an energy trading and scheduling method based on Lyapunov optimization and matching game to maximize the long-term benefits of each industrial cluster without knowing the prior information of random variables. Case studies show that our proposed trading method can reduce overall costs and carbon emissions while relieving energy pressure, which is important for Environmental, Social and Governance (ESG).

Selecting Source Code Generation Tools Based on Bandit Algorithms

  • Authors: Authors: Ryoto Shima, Masateru Tsunoda, Yukasa Murakami, Akito Monden, Amjed Tahir, Kwabena Ebo Bennin, Koji Toda, Keitaro Nakasai
  • Subjects: Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2312.12813
  • Pdf link: https://arxiv.org/pdf/2312.12813
  • Abstract Background: Recently, code generation tools such as ChatGPT have drawn attention to their performance. Generally, a prior analysis of their performance is needed to select new code-generation tools from a list of candidates. Without such analysis, there is a higher risk of selecting an ineffective tool, negatively affecting software development productivity. Additionally, conducting prior analysis of new code generation tools takes time and effort. Aim: To use a new code generation tool without prior analysis but with low risk, we propose to evaluate the new tools during software development (i.e., online optimization). Method: We apply the bandit algorithm (BA) approach to help select the best code-generation tool among candidates. Developers evaluate whether the result of the tool is correct or not. When code generation and evaluation are repeated, the evaluation results are saved. We utilize the stored evaluation results to select the best tool based on the BA approach. Our preliminary analysis evaluated five code generation tools with 164 code generation cases using BA. Result: The BA approach selected ChatGPT as the best tool as the evaluation proceeded, and during the evaluation, the average accuracy by the BA approach outperformed the second-best performing tool. Our results reveal the feasibility and effectiveness of BA in assisting the selection of best-performing code generation tools.

Object-aware Adaptive-Positivity Learning for Audio-Visual Question Answering

  • Authors: Authors: Zhangbin Li, Dan Guo, Jinxing Zhou, Jing Zhang, Meng Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.12816
  • Pdf link: https://arxiv.org/pdf/2312.12816
  • Abstract This paper focuses on the Audio-Visual Question Answering (AVQA) task that aims to answer questions derived from untrimmed audible videos. To generate accurate answers, an AVQA model is expected to find the most informative audio-visual clues relevant to the given questions. In this paper, we propose to explicitly consider fine-grained visual objects in video frames (object-level clues) and explore the multi-modal relations(i.e., the object, audio, and question) in terms of feature interaction and model optimization. For the former, we present an end-to-end object-oriented network that adopts a question-conditioned clue discovery module to concentrate audio/visual modalities on respective keywords of the question and designs a modality-conditioned clue collection module to highlight closely associated audio segments or visual objects. For model optimization, we propose an object-aware adaptive-positivity learning strategy that selects the highly semantic-matched multi-modal pair as positivity. Specifically, we design two object-aware contrastive loss functions to identify the highly relevant question-object pairs and audio-object pairs, respectively. These selected pairs are constrained to have larger similarity values than the mismatched pairs. The positivity-selecting process is adaptive as the positivity pairs selected in each video frame may be different. These two object-aware objectives help the model understand which objects are exactly relevant to the question and which are making sounds. Extensive experiments on the MUSIC-AVQA dataset demonstrate the proposed method is effective in finding favorable audio-visual clues and also achieves new state-of-the-art question-answering performance.

Causal Discovery under Identifiable Heteroscedastic Noise Model

  • Authors: Authors: Naiyu Yin, Tian Gao, Yue Yu, Qiang Ji
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Methodology (stat.ME)
  • Arxiv link: https://arxiv.org/abs/2312.12844
  • Pdf link: https://arxiv.org/pdf/2312.12844
  • Abstract Capturing the underlying structural causal relations represented by Directed Acyclic Graphs (DAGs) has been a fundamental task in various AI disciplines. Causal DAG learning via the continuous optimization framework has recently achieved promising performance in terms of both accuracy and efficiency. However, most methods make strong assumptions of homoscedastic noise, i.e., exogenous noises have equal variances across variables, observations, or even both. The noises in real data usually violate both assumptions due to the biases introduced by different data collection processes. To address the issue of heteroscedastic noise, we introduce relaxed and implementable sufficient conditions, proving the identifiability of a general class of SEM subject to these conditions. Based on the identifiable general SEM, we propose a novel formulation for DAG learning that accounts for the variation in noise variance across variables and observations. We then propose an effective two-phase iterative DAG learning algorithm to address the increasing optimization difficulties and to learn a causal DAG from data with heteroscedastic variable noise under varying variance. We show significant empirical gains of the proposed approaches over state-of-the-art methods on both synthetic data and real data.

Safe Multi-Agent Reinforcement Learning for Formation Control without Individual Reference Targets

  • Authors: Authors: Murad Dawood, Sicong Pan, Nils Dengler, Siqi Zhou, Angela P. Schoellig, Maren Bennewitz
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2312.12861
  • Pdf link: https://arxiv.org/pdf/2312.12861
  • Abstract In recent years, formation control of unmanned vehicles has received considerable interest, driven by the progress in autonomous systems and the imperative for multiple vehicles to carry out diverse missions. In this paper, we address the problem of behavior-based formation control of mobile robots, where we use safe multi-agent reinforcement learning~(MARL) to ensure the safety of the robots by eliminating all collisions during training and execution. To ensure safety, we implemented distributed model predictive control safety filters to override unsafe actions. We focus on achieving behavior-based formation without having individual reference targets for the robots, and instead use targets for the centroid of the formation. This formulation facilitates the deployment of formation control on real robots and improves the scalability of our approach to more robots. The task cannot be addressed through optimization-based controllers without specific individual reference targets for the robots and information about the relative locations of each robot to the others. That is why, for our formulation we use MARL to train the robots. Moreover, in order to account for the interactions between the agents, we use attention-based critics to improve the training process. We train the agents in simulation and later on demonstrate the resulting behavior of our approach on real Turtlebot robots. We show that despite the agents having very limited information, we can still safely achieve the desired behavior.

Federated Learning While Providing Model as a Service: Joint Training and Inference Optimization

  • Authors: Authors: Pengchao Han, Shiqiang Wang, Yang Jiao, Jianwei Huang
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.12863
  • Pdf link: https://arxiv.org/pdf/2312.12863
  • Abstract While providing machine learning model as a service to process users' inference requests, online applications can periodically upgrade the model utilizing newly collected data. Federated learning (FL) is beneficial for enabling the training of models across distributed clients while keeping the data locally. However, existing work has overlooked the coexistence of model training and inference under clients' limited resources. This paper focuses on the joint optimization of model training and inference to maximize inference performance at clients. Such an optimization faces several challenges. The first challenge is to characterize the clients' inference performance when clients may partially participate in FL. To resolve this challenge, we introduce a new notion of age of model (AoM) to quantify client-side model freshness, based on which we use FL's global model convergence error as an approximate measure of inference performance. The second challenge is the tight coupling among clients' decisions, including participation probability in FL, model download probability, and service rates. Toward the challenges, we propose an online problem approximation to reduce the problem complexity and optimize the resources to balance the needs of model training and inference. Experimental results demonstrate that the proposed algorithm improves the average inference accuracy by up to 12%.

Parameterized Projected Bellman Operator

  • Authors: Authors: Théo Vincent, Alberto Maria Metelli, Boris Belousov, Jan Peters, Marcello Restelli, Carlo D'Eramo
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2312.12869
  • Pdf link: https://arxiv.org/pdf/2312.12869
  • Abstract Approximate value iteration~(AVI) is a family of algorithms for reinforcement learning~(RL) that aims to obtain an approximation of the optimal value function. Generally, AVI algorithms implement an iterated procedure where each step consists of (i) an application of the Bellman operator and (ii) a projection step into a considered function space. Notoriously, the Bellman operator leverages transition samples, which strongly determine its behavior, as uninformative samples can result in negligible updates or long detours, whose detrimental effects are further exacerbated by the computationally intensive projection step. To address these issues, we propose a novel alternative approach based on learning an approximate version of the Bellman operator rather than estimating it through samples as in AVI approaches. This way, we are able to (i) generalize across transition samples and (ii) avoid the computationally intensive projection step. For this reason, we call our novel operator projected Bellman operator (PBO). We formulate an optimization problem to learn PBO for generic sequential decision-making problems, and we theoretically analyze its properties in two representative classes of RL problems. Furthermore, we theoretically study our approach under the lens of AVI and devise algorithmic implementations to learn PBO in offline and online settings by leveraging neural network parameterizations. Finally, we empirically showcase the benefits of PBO w.r.t. the regular Bellman operator on several RL problems.

Deep-Unfolding-Based Joint Activity and Data Detection for Grant-Free Transmission in Cell-Free Communication Systems

  • Authors: Authors: Gangle Sun, Wenjin Wang, Wei Xu, Christoph Studer
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2312.12874
  • Pdf link: https://arxiv.org/pdf/2312.12874
  • Abstract Massive grant-free transmission and cell-free wireless communication systems have emerged as pivotal enablers for massive machine-type communication. This paper proposes a deep-unfolding-based joint activity and data detection (DU-JAD) algorithm for massive grant-free transmission in cell-free systems. We first formulate a joint activity and data detection optimization problem, which we solve approximately using forward-backward splitting (FBS). We then apply deep unfolding to FBS to optimize algorithm parameters using machine learning. In order to improve data detection (DD) performance, reduce algorithm complexity, and enhance active user detection (AUD), we employ a momentum strategy, an approximate posterior mean estimator, and a novel soft-output AUD module, respectively. Simulation results confirm the efficacy of DU-JAD for AUD and DD.

BSL: Understanding and Improving Softmax Loss for Recommendation

  • Authors: Authors: Junkang Wu, Jiawei Chen, Jiancan Wu, Wentao Shi, Jizhi Zhang, Xiang Wang
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
  • Arxiv link: https://arxiv.org/abs/2312.12882
  • Pdf link: https://arxiv.org/pdf/2312.12882
  • Abstract Loss functions steer the optimization direction of recommendation models and are critical to model performance, but have received relatively little attention in recent recommendation research. Among various losses, we find Softmax loss (SL) stands out for not only achieving remarkable accuracy but also better robustness and fairness. Nevertheless, the current literature lacks a comprehensive explanation for the efficacy of SL. Toward addressing this research gap, we conduct theoretical analyses on SL and uncover three insights: 1) Optimizing SL is equivalent to performing Distributionally Robust Optimization (DRO) on the negative data, thereby learning against perturbations on the negative distribution and yielding robustness to noisy negatives. 2) Comparing with other loss functions, SL implicitly penalizes the prediction variance, resulting in a smaller gap between predicted values and and thus producing fairer results. Building on these insights, we further propose a novel loss function Bilateral SoftMax Loss (BSL) that extends the advantage of SL to both positive and negative sides. BSL augments SL by applying the same Log-Expectation-Exp structure to positive examples as is used for negatives, making the model robust to the noisy positives as well. Remarkably, BSL is simple and easy-to-implement -- requiring just one additional line of code compared to SL. Experiments on four real-world datasets and three representative backbones demonstrate the effectiveness of our proposal. The code is available at https://github.com/junkangwu/BSL

Machine Mindset: An MBTI Exploration of Large Language Models

  • Authors: Authors: Jiaxi Cui, Liuzhenghao Lv, Jing Wen, Jing Tang, YongHong Tian, Li Yuan
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2312.12999
  • Pdf link: https://arxiv.org/pdf/2312.12999
  • Abstract We present a novel approach for integrating Myers-Briggs Type Indicator (MBTI) personality traits into large language models (LLMs), addressing the challenges of personality consistency in personalized AI. Our method, "Machine Mindset," involves a two-phase fine-tuning and Direct Preference Optimization (DPO) to embed MBTI traits into LLMs. This approach ensures that models internalize these traits, offering a stable and consistent personality profile. We demonstrate the effectiveness of our models across various domains, showing alignment between model performance and their respective MBTI traits. The paper highlights significant contributions in the development of personality datasets and a new training methodology for personality integration in LLMs, enhancing the potential for personalized AI applications. We also open-sourced our model and part of the data at \url{https://github.com/PKU-YuanGroup/Machine-Mindset}.

DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis

  • Authors: Authors: Yuming Gu, Hongyi Xu, You Xie, Guoxian Song, Yichun Shi, Di Chang, Jing Yang, Lingjie Luo
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.13016
  • Pdf link: https://arxiv.org/pdf/2312.13016
  • Abstract We present DiffPortrait3D, a conditional diffusion model that is capable of synthesizing 3D-consistent photo-realistic novel views from as few as a single in-the-wild portrait. Specifically, given a single RGB input, we aim to synthesize plausible but consistent facial details rendered from novel camera views with retained both identity and facial expression. In lieu of time-consuming optimization and fine-tuning, our zero-shot method generalizes well to arbitrary face portraits with unposed camera views, extreme facial expressions, and diverse artistic depictions. At its core, we leverage the generative prior of 2D diffusion models pre-trained on large-scale image datasets as our rendering backbone, while the denoising is guided with disentangled attentive control of appearance and camera pose. To achieve this, we first inject the appearance context from the reference image into the self-attention layers of the frozen UNets. The rendering view is then manipulated with a novel conditional control module that interprets the camera pose by watching a condition image of a crossed subject from the same view. Furthermore, we insert a trainable cross-view attention module to enhance view consistency, which is further strengthened with a novel 3D-aware noise generation process during inference. We demonstrate state-of-the-art results both qualitatively and quantitatively on our challenging in-the-wild and multi-view benchmarks.

MIMO Integrated Sensing and Communication Exploiting Prior Information

  • Authors: Authors: Chan Xu, Shuowen Zhang
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2312.13048
  • Pdf link: https://arxiv.org/pdf/2312.13048
  • Abstract In this paper, we study a multiple-input multiple-output (MIMO) integrated sensing and communication (ISAC) system where one multi-antenna base station (BS) sends information to a user with multiple antennas in the downlink and simultaneously senses the location parameter of a target based on its reflected echo signals received back at the BS receive antennas. We focus on the case where the location parameter to be sensed is unknown and random, for which the prior distribution information is available for exploitation. First, we propose to adopt the posterior Cram'er-Rao bound (PCRB) as the sensing performance metric with prior information, which quantifies a lower bound of the mean-squared error (MSE). Since the PCRB is in a complicated form, we derive a tight upper bound of it to draw more insights. Based on this, we analytically show that by exploiting the prior distribution information, the PCRB is always no larger than the CRB averaged over random location realizations without prior information exploitation. Next, we formulate the transmit covariance matrix optimization problem to minimize the sensing PCRB under a communication rate constraint. We obtain the optimal solution and derive useful properties on its rank. Then, by considering the derived PCRB upper bound as the objective function, we propose a low-complexity suboptimal solution in semi-closed form. Numerical results demonstrate the effectiveness of our proposed designs in MIMO ISAC exploiting prior information.

LRS: Enhancing Adversarial Transferability through Lipschitz Regularized Surrogate

  • Authors: Authors: Tao Wu, Tie Luo, Donald C. Wunsch
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2312.13118
  • Pdf link: https://arxiv.org/pdf/2312.13118
  • Abstract The transferability of adversarial examples is of central importance to transfer-based black-box adversarial attacks. Previous works for generating transferable adversarial examples focus on attacking \emph{given} pretrained surrogate models while the connections between surrogate models and adversarial trasferability have been overlooked. In this paper, we propose {\em Lipschitz Regularized Surrogate} (LRS) for transfer-based black-box attacks, a novel approach that transforms surrogate models towards favorable adversarial transferability. Using such transformed surrogate models, any existing transfer-based black-box attack can run without any change, yet achieving much better performance. Specifically, we impose Lipschitz regularization on the loss landscape of surrogate models to enable a smoother and more controlled optimization process for generating more transferable adversarial examples. In addition, this paper also sheds light on the connection between the inner properties of surrogate models and adversarial transferability, where three factors are identified: smaller local Lipschitz constant, smoother loss landscape, and stronger adversarial robustness. We evaluate our proposed LRS approach by attacking state-of-the-art standard deep neural networks and defense models. The results demonstrate significant improvement on the attack success rates and transferability. Our code is available at https://github.com/TrustAIoT/LRS.

Tuning the activation function to optimize the forecast horizon of a reservoir computer

  • Authors: Authors: Lauren A. Hurley, Juan G. Restrepo, Sean E. Shaheen
  • Subjects: Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2312.13151
  • Pdf link: https://arxiv.org/pdf/2312.13151
  • Abstract Reservoir computing is a machine learning framework where the readouts from a nonlinear system (the reservoir) are trained so that the output from the reservoir, when forced with an input signal, reproduces a desired output signal. A common implementation of reservoir computers is to use a recurrent neural network as the reservoir. The design of this network can have significant effects on the performance of the reservoir computer. In this paper we study the effect of the node activation function on the ability of reservoir computers to learn and predict chaotic time series. We find that the Forecast Horizon (FH), the time during which the reservoir's predictions remain accurate, can vary by an order of magnitude across a set of 16 activation functions used in machine learning. By using different functions from this set, and by modifying their parameters, we explore whether the entropy of node activation levels or the curvature of the activation functions determine the predictive ability of the reservoirs. We find that the FH is low when the activation function is used in a region where it has low curvature, and a positive correlation between curvature and FH. For the activation functions studied we find that the largest FH generally occurs at intermediate levels of the entropy of node activation levels. Our results show that the performance of reservoir computers is very sensitive to the activation function shape. Therefore, modifying this shape in hyperparameter optimization algorithms can lead to improvements in reservoir computer performance.

Experiences Building an MLIR-based SYCL Compiler

  • Authors: Authors: Ettore Tiotto, Víctor Pérez, Whitney Tsang, Lukas Sommer, Julian Oppermann, Victor Lomüller, Mehdi Goli, James Brodman
  • Subjects: Programming Languages (cs.PL); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
  • Arxiv link: https://arxiv.org/abs/2312.13170
  • Pdf link: https://arxiv.org/pdf/2312.13170
  • Abstract Similar to other programming models, compilers for SYCL, the open programming model for heterogeneous computing based on C++, would benefit from access to higher-level intermediate representations. The loss of high-level structure and semantics caused by premature lowering to low-level intermediate representations and the inability to reason about host and device code simultaneously present major challenges for SYCL compilers. The MLIR compiler framework, through its dialect mechanism, allows to model domain-specific, high-level intermediate representations and provides the necessary facilities to address these challenges. This work therefore describes practical experience with the design and implementation of an MLIR-based SYCL compiler. By modeling key elements of the SYCL programming model in host and device code in the MLIR dialect framework, the presented approach enables the implementation of powerful device code optimizations as well as analyses across host and device code. Compared to two LLVM-based SYCL implementations, this yields speedups of up to 4.3x on a collection of SYCL benchmark applications. Finally, this work also discusses challenges encountered in the design and implementation and how these could be addressed in the future.

Learning Fair Policies for Multi-stage Selection Problems from Observational Data

  • Authors: Authors: Zhuangzhuang Jia, Grani A. Hanasusanto, Phebe Vayanos, Weijun Xie
  • Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2312.13173
  • Pdf link: https://arxiv.org/pdf/2312.13173
  • Abstract We consider the problem of learning fair policies for multi-stage selection problems from observational data. This problem arises in several high-stakes domains such as company hiring, loan approval, or bail decisions where outcomes (e.g., career success, loan repayment, recidivism) are only observed for those selected. We propose a multi-stage framework that can be augmented with various fairness constraints, such as demographic parity or equal opportunity. This problem is a highly intractable infinite chance-constrained program involving the unknown joint distribution of covariates and outcomes. Motivated by the potential impact of selection decisions on people's lives and livelihoods, we propose to focus on interpretable linear selection rules. Leveraging tools from causal inference and sample average approximation, we obtain an asymptotically consistent solution to this selection problem by solving a mixed binary conic optimization problem, which can be solved using standard off-the-shelf solvers. We conduct extensive computational experiments on a variety of datasets adapted from the UCI repository on which we show that our proposed approaches can achieve an 11.6% improvement in precision and a 38% reduction in the measure of unfairness compared to the existing selection policy.

DSFormer: Effective Compression of Text-Transformers by Dense-Sparse Weight Factorization

  • Authors: Authors: Rahul Chand, Yashoteja Prabhu, Pratyush Kumar
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2312.13211
  • Pdf link: https://arxiv.org/pdf/2312.13211
  • Abstract With the tremendous success of large transformer models in natural language understanding, down-sizing them for cost-effective deployments has become critical. Recent studies have explored the low-rank weight factorization techniques which are efficient to train, and apply out-of-the-box to any transformer architecture. Unfortunately, the low-rank assumption tends to be over-restrictive and hinders the expressiveness of the compressed model. This paper proposes, DSFormer, a simple alternative factorization scheme which expresses a target weight matrix as the product of a small dense and a semi-structured sparse matrix. The resulting approximation is more faithful to the weight distribution in transformers and therefore achieves a stronger efficiency-accuracy trade-off. Another concern with existing factorizers is their dependence on a task-unaware initialization step which degrades the accuracy of the resulting model. DSFormer addresses this issue through a novel Straight-Through Factorizer (STF) algorithm that jointly learns all the weight factorizations to directly maximize the final task accuracy. Extensive experiments on multiple natural language understanding benchmarks demonstrate that DSFormer obtains up to 40% better compression than the state-of-the-art low-rank factorizers, leading semi-structured sparsity baselines and popular knowledge distillation approaches. Our approach is also orthogonal to mainstream compressors and offers up to 50% additional compression when added to popular distilled, layer-shared and quantized transformers. We empirically evaluate the benefits of STF over conventional optimization practices.

StableKD: Breaking Inter-block Optimization Entanglement for Stable Knowledge Distillation

  • Authors: Authors: Shiu-hong Kao, Jierun Chen, S.H. Gary Chan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.13223
  • Pdf link: https://arxiv.org/pdf/2312.13223
  • Abstract Knowledge distillation (KD) has been recognized as an effective tool to compress and accelerate models. However, current KD approaches generally suffer from an accuracy drop and/or an excruciatingly long distillation process. In this paper, we tackle the issue by first providing a new insight into a phenomenon that we call the Inter-Block Optimization Entanglement (IBOE), which makes the conventional end-to-end KD approaches unstable with noisy gradients. We then propose StableKD, a novel KD framework that breaks the IBOE and achieves more stable optimization. StableKD distinguishes itself through two operations: Decomposition and Recomposition, where the former divides a pair of teacher and student networks into several blocks for separate distillation, and the latter progressively merges them back, evolving towards end-to-end distillation. We conduct extensive experiments on CIFAR100, Imagewoof, and ImageNet datasets with various teacher-student pairs. Compared to other KD approaches, our simple yet effective StableKD greatly boosts the model accuracy by 1% ~ 18%, speeds up the convergence up to 10 times, and outperforms them with only 40% of the training data.

Keyword: adam

There is no result

Keyword: gradient

Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation

  • Authors: Authors: Jiaming Liu, Ran Xu, Senqiao Yang, Renrui Zhang, Qizhe Zhang, Zehui Chen, Yandong Guo, Shanghang Zhang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.12480
  • Pdf link: https://arxiv.org/pdf/2312.12480
  • Abstract Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions, addressing real-world dynamism. Existing CTTA methods mainly rely on entropy minimization or teacher-student pseudo-labeling schemes for knowledge extraction in unlabeled target domains. However, dynamic data distributions cause miscalibrated predictions and noisy pseudo-labels in existing self-supervised learning methods, hindering the effective mitigation of error accumulation and catastrophic forgetting problems during the continual adaptation process. To tackle these issues, we propose a continual self-supervised method, Adaptive Distribution Masked Autoencoders (ADMA), which enhances the extraction of target domain knowledge while mitigating the accumulation of distribution shifts. Specifically, we propose a Distribution-aware Masking (DaM) mechanism to adaptively sample masked positions, followed by establishing consistency constraints between the masked target samples and the original target samples. Additionally, for masked tokens, we utilize an efficient decoder to reconstruct a hand-crafted feature descriptor (e.g., Histograms of Oriented Gradients), leveraging its invariant properties to boost task-relevant representations. Through conducting extensive experiments on four widely recognized benchmarks, our proposed method attains state-of-the-art performance in both classification and segmentation CTTA tasks.

Foreseeing Reconstruction Quality of Gradient Inversion: An Optimization Perspective

  • Authors: Authors: HyeongGwon Hong, Yooshin Cho, Hanbyel Cho, Jaesung Ahn, Junmo Kim
  • Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.12488
  • Pdf link: https://arxiv.org/pdf/2312.12488
  • Abstract Gradient inversion attacks can leak data privacy when clients share weight updates with the server in federated learning (FL). Existing studies mainly use L2 or cosine distance as the loss function for gradient matching in the attack. Our empirical investigation shows that the vulnerability ranking varies with the loss function used. Gradient norm, which is commonly used as a vulnerability proxy for gradient inversion attack, cannot explain this as it remains constant regardless of the loss function for gradient matching. In this paper, we propose a loss-aware vulnerability proxy (LAVP) for the first time. LAVP refers to either the maximum or minimum eigenvalue of the Hessian with respect to gradient matching loss at ground truth. This suggestion is based on our theoretical findings regarding the local optimization of the gradient inversion in proximity to the ground truth, which corresponds to the worst case attack scenario. We demonstrate the effectiveness of LAVP on various architectures and datasets, showing its consistent superiority over the gradient norm in capturing sample vulnerabilities. The performance of each proxy is measured in terms of Spearman's rank correlation with respect to several similarity scores. This work will contribute to enhancing FL security against any potential loss functions beyond L2 or cosine distance in the future.

Tensor Train Decomposition for Adversarial Attacks on Computer Vision Models

  • Authors: Authors: Andrei Chertkov, Ivan Oseledets
  • Subjects: Numerical Analysis (math.NA)
  • Arxiv link: https://arxiv.org/abs/2312.12556
  • Pdf link: https://arxiv.org/pdf/2312.12556
  • Abstract Deep neural networks (DNNs) are widely used today, but they are vulnerable to adversarial attacks. To develop effective methods of defense, it is important to understand the potential weak spots of DNNs. Often attacks are organized taking into account the architecture of models (white-box approach) and based on gradient methods, but for real-world DNNs this approach in most cases is impossible. At the same time, several gradient-free optimization algorithms are used to attack black-box models. However, classical methods are often ineffective in the multidimensional case. To organize black-box attacks for computer vision models, in this work, we propose the use of an optimizer based on the low-rank tensor train (TT) format, which has gained popularity in various practical multidimensional applications in recent years. Combined with the attribution of the target image, which is built by the auxiliary (white-box) model, the TT-based optimization method makes it possible to organize an effective black-box attack by small perturbation of pixels in the target image. The superiority of the proposed approach over three popular baselines is demonstrated for five modern DNNs on the ImageNet dataset.

Optimizing Neural Networks with Gradient Lexicase Selection

  • Authors: Authors: Li Ding, Lee Spector
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
  • Arxiv link: https://arxiv.org/abs/2312.12606
  • Pdf link: https://arxiv.org/pdf/2312.12606
  • Abstract One potential drawback of using aggregated performance measurement in machine learning is that models may learn to accept higher errors on some training cases as compromises for lower errors on others, with the lower errors actually being instances of overfitting. This can lead to both stagnation at local optima and poor generalization. Lexicase selection is an uncompromising method developed in evolutionary computation, which selects models on the basis of sequences of individual training case errors instead of using aggregated metrics such as loss and accuracy. In this paper, we investigate how lexicase selection, in its general form, can be integrated into the context of deep learning to enhance generalization. We propose Gradient Lexicase Selection, an optimization framework that combines gradient descent and lexicase selection in an evolutionary fashion. Our experimental results demonstrate that the proposed method improves the generalization performance of various widely-used deep neural network architectures across three image classification benchmarks. Additionally, qualitative analysis suggests that our method assists networks in learning more diverse representations. Our source code is available on GitHub: https://github.com/ld-ing/gradient-lexicase.

Towards Efficient Verification of Quantized Neural Networks

  • Authors: Authors: Pei Huang, Haoze Wu, Yuting Yang, Ieva Daukantas, Min Wu, Yedi Zhang, Clark Barrett
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
  • Arxiv link: https://arxiv.org/abs/2312.12679
  • Pdf link: https://arxiv.org/pdf/2312.12679
  • Abstract Quantization replaces floating point arithmetic with integer arithmetic in deep neural network models, providing more efficient on-device inference with less power and memory. In this work, we propose a framework for formally verifying properties of quantized neural networks. Our baseline technique is based on integer linear programming which guarantees both soundness and completeness. We then show how efficiency can be improved by utilizing gradient-based heuristic search methods and also bound-propagation techniques. We evaluate our approach on perception networks quantized with PyTorch. Our results show that we can verify quantized networks with better scalability and efficiency than the previous state of the art.

ALMANACS: A Simulatability Benchmark for Language Model Explainability

  • Authors: Authors: Edmund Mills, Shiye Su, Stuart Russell, Scott Emmons
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
  • Arxiv link: https://arxiv.org/abs/2312.12747
  • Pdf link: https://arxiv.org/pdf/2312.12747
  • Abstract How do we measure the efficacy of language model explainability methods? While many explainability methods have been developed, they are typically evaluated on bespoke tasks, preventing an apples-to-apples comparison. To help fill this gap, we present ALMANACS, a language model explainability benchmark. ALMANACS scores explainability methods on simulatability, i.e., how well the explanations improve behavior prediction on new inputs. The ALMANACS scenarios span twelve safety-relevant topics such as ethical reasoning and advanced AI behaviors; they have idiosyncratic premises to invoke model-specific behavior; and they have a train-test distributional shift to encourage faithful explanations. By using another language model to predict behavior based on the explanations, ALMANACS is a fully automated benchmark. We use ALMANACS to evaluate counterfactuals, rationalizations, attention, and Integrated Gradients explanations. Our results are sobering: when averaged across all topics, no explanation method outperforms the explanation-free control. We conclude that despite modest successes in prior work, developing an explanation method that aids simulatability in ALMANACS remains an open challenge.

Model-Based Control with Sparse Neural Dynamics

  • Authors: Authors: Ziang Liu, Genggeng Zhou, Jeff He, Tobia Marcucci, Li Fei-Fei, Jiajun Wu, Yunzhu Li
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2312.12791
  • Pdf link: https://arxiv.org/pdf/2312.12791
  • Abstract Learning predictive models from observations using deep neural networks (DNNs) is a promising new approach to many real-world planning and control problems. However, common DNNs are too unstructured for effective planning, and current control methods typically rely on extensive sampling or local gradient descent. In this paper, we propose a new framework for integrated model learning and predictive control that is amenable to efficient optimization algorithms. Specifically, we start with a ReLU neural model of the system dynamics and, with minimal losses in prediction accuracy, we gradually sparsify it by removing redundant neurons. This discrete sparsification process is approximated as a continuous problem, enabling an end-to-end optimization of both the model architecture and the weight parameters. The sparsified model is subsequently used by a mixed-integer predictive controller, which represents the neuron activations as binary variables and employs efficient branch-and-bound algorithms. Our framework is applicable to a wide variety of DNNs, from simple multilayer perceptrons to complex graph neural dynamics. It can efficiently handle tasks involving complicated contact dynamics, such as object pushing, compositional object sorting, and manipulation of deformable objects. Numerical and hardware experiments show that, despite the aggressive sparsification, our framework can deliver better closed-loop performance than existing state-of-the-art methods.

Dynamic Fairness-Aware Spectrum Auction for Enhanced Licensed Shared Access in 6G Networks

  • Authors: Authors: Mina Khadem, Maryam Ansarifard, Nader Mokari, Mohammadreza Javan, Hamid Saeedi, Eduard A. Jorswieck
  • Subjects: Systems and Control (eess.SY); Computer Science and Game Theory (cs.GT)
  • Arxiv link: https://arxiv.org/abs/2312.12867
  • Pdf link: https://arxiv.org/pdf/2312.12867
  • Abstract This article introduces a new approach to address the spectrum scarcity challenge in 6G networks by implementing the enhanced licensed shared access (ELSA) framework. Our proposed auction mechanism aims to ensure fairness in spectrum allocation to mobile network operators (MNOs) through a novel weighted auction called the fair Vickery-Clarke-Groves (FVCG) mechanism. Through comparison with traditional methods, the study demonstrates that the proposed auction method improves fairness significantly. We suggest using spectrum sensing and integrating UAV-based networks to enhance efficiency of the LSA system. This research employs two methods to solve the problem. We first propose a novel greedy algorithm, named market share based weighted greedy algorithm (MSWGA) to achieve better fairness compared to the traditional auction methods and as the second approach, we exploit deep reinforcement learning (DRL) algorithms, to optimize the auction policy and demonstrate its superiority over other methods. Simulation results show that the deep deterministic policy gradient (DDPG) method performs superior to soft actor critic (SAC), MSWGA, and greedy methods. Moreover, a significant improvement is observed in fairness index compared to the traditional greedy auction methods. This improvement is as high as about 27% and 35% when deploying the MSWGA and DDPG methods, respectively.

HCDIR: End-to-end Hate Context Detection, and Intensity Reduction model for online comments

  • Authors: Authors: Neeraj Kumar Singh, Koyel Ghosh, Joy Mahapatra, Utpal Garain, Apurbalal Senapati
  • Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2312.13193
  • Pdf link: https://arxiv.org/pdf/2312.13193
  • Abstract Warning: This paper contains examples of the language that some people may find offensive. Detecting and reducing hateful, abusive, offensive comments is a critical and challenging task on social media. Moreover, few studies aim to mitigate the intensity of hate speech. While studies have shown that context-level semantics are crucial for detecting hateful comments, most of this research focuses on English due to the ample datasets available. In contrast, low-resource languages, like Indian languages, remain under-researched because of limited datasets. Contrary to hate speech detection, hate intensity reduction remains unexplored in high-resource and low-resource languages. In this paper, we propose a novel end-to-end model, HCDIR, for Hate Context Detection, and Hate Intensity Reduction in social media posts. First, we fine-tuned several pre-trained language models to detect hateful comments to ascertain the best-performing hateful comments detection model. Then, we identified the contextual hateful words. Identification of such hateful words is justified through the state-of-the-art explainable learning model, i.e., Integrated Gradient (IG). Lastly, the Masked Language Modeling (MLM) model has been employed to capture domain-specific nuances to reduce hate intensity. We masked the 50% hateful words of the comments identified as hateful and predicted the alternative words for these masked terms to generate convincing sentences. An optimal replacement for the original hate comments from the feasible sentences is preferred. Extensive experiments have been conducted on several recent datasets using automatic metric-based evaluation (BERTScore) and thorough human evaluation. To enhance the faithfulness in human evaluation, we arranged a group of three human annotators with varied expertise.

StableKD: Breaking Inter-block Optimization Entanglement for Stable Knowledge Distillation

  • Authors: Authors: Shiu-hong Kao, Jierun Chen, S.H. Gary Chan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.13223
  • Pdf link: https://arxiv.org/pdf/2312.13223
  • Abstract Knowledge distillation (KD) has been recognized as an effective tool to compress and accelerate models. However, current KD approaches generally suffer from an accuracy drop and/or an excruciatingly long distillation process. In this paper, we tackle the issue by first providing a new insight into a phenomenon that we call the Inter-Block Optimization Entanglement (IBOE), which makes the conventional end-to-end KD approaches unstable with noisy gradients. We then propose StableKD, a novel KD framework that breaks the IBOE and achieves more stable optimization. StableKD distinguishes itself through two operations: Decomposition and Recomposition, where the former divides a pair of teacher and student networks into several blocks for separate distillation, and the latter progressively merges them back, evolving towards end-to-end distillation. We conduct extensive experiments on CIFAR100, Imagewoof, and ImageNet datasets with various teacher-student pairs. Compared to other KD approaches, our simple yet effective StableKD greatly boosts the model accuracy by 1% ~ 18%, speeds up the convergence up to 10 times, and outperforms them with only 40% of the training data.

Keyword: super-resolution

How Good Are Deep Generative Models for Solving Inverse Problems?

  • Authors: Authors: Shichong Peng, Alireza Moazeni, Ke Li
  • Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2312.12691
  • Pdf link: https://arxiv.org/pdf/2312.12691
  • Abstract Deep generative models, such as diffusion models, GANs, and IMLE, have shown impressive capability in tackling inverse problems. However, the validity of model-generated solutions w.r.t. the forward problem and the reliability of associated uncertainty estimates remain understudied. This study evaluates recent diffusion-based, GAN-based, and IMLE-based methods on three inverse problems, i.e., $16\times$ super-resolution, colourization, and image decompression. We assess the validity of these models' outputs as solutions to the inverse problems and conduct a thorough analysis of the reliability of the models' estimates of uncertainty over the solution. Overall, we find that the IMLE-based CHIMLE method outperforms other methods in terms of producing valid solutions and reliable uncertainty estimates.

zoq avatar Dec 21 '23 07:12 zoq