arxiv-updates New submissions for Wed, 11 Oct 23

New submissions for Wed, 11 Oct 23

Open zoq opened this issue 1 year ago • 0 comments

Keyword: sgd

Augmenting Vision-Based Human Pose Estimation with Rotation Matrix

Authors: Authors: Milad Vazan, Fatemeh Sadat Masoumi, Ruizhi Ou, Reza Rawassizadeh
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.06068
Pdf link: https://arxiv.org/pdf/2310.06068
Abstract Fitness applications are commonly used to monitor activities within the gym, but they often fail to automatically track indoor activities inside the gym. This study proposes a model that utilizes pose estimation combined with a novel data augmentation method, i.e., rotation matrix. We aim to enhance the classification accuracy of activity recognition based on pose estimation data. Through our experiments, we experiment with different classification algorithms along with image augmentation approaches. Our findings demonstrate that the SVM with SGD optimization, using data augmentation with the Rotation Matrix, yields the most accurate results, achieving a 96% accuracy rate in classifying five physical activities. Conversely, without implementing the data augmentation techniques, the baseline accuracy remains at a modest 64%.

Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition

Authors: Authors: Zhongtian Chen, Edmund Lau, Jake Mendel, Susan Wei, Daniel Murfet
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.06301
Pdf link: https://arxiv.org/pdf/2310.06301
Abstract We investigate phase transitions in a Toy Model of Superposition (TMS) using Singular Learning Theory (SLT). We derive a closed formula for the theoretical loss and, in the case of two hidden dimensions, discover that regular $k$-gons are critical points. We present supporting theory indicating that the local learning coefficient (a geometric invariant) of these $k$-gons determines phase transitions in the Bayesian posterior as a function of training sample size. We then show empirically that the same $k$-gon critical points also determine the behavior of SGD training. The picture that emerges adds evidence to the conjecture that the SGD learning trajectory is subject to a sequential learning mechanism. Specifically, we find that the learning process in TMS, be it through SGD or Bayesian learning, can be characterized by a journey through parameter space from regions of high loss and low complexity to regions of low loss and high complexity.

Asynchronous Federated Learning with Incentive Mechanism Based on Contract Theory

Authors: Authors: Danni Yang, Yun Ji, Zhoubin Kou, Xiaoxiong Zhong, Sheng Zhang
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2310.06448
Pdf link: https://arxiv.org/pdf/2310.06448
Abstract To address the challenges posed by the heterogeneity inherent in federated learning (FL) and to attract high-quality clients, various incentive mechanisms have been employed. However, existing incentive mechanisms are typically utilized in conventional synchronous aggregation, resulting in significant straggler issues. In this study, we propose a novel asynchronous FL framework that integrates an incentive mechanism based on contract theory. Within the incentive mechanism, we strive to maximize the utility of the task publisher by adaptively adjusting clients' local model training epochs, taking into account factors such as time delay and test accuracy. In the asynchronous scheme, considering client quality, we devise aggregation weights and an access control algorithm to facilitate asynchronous aggregation. Through experiments conducted on the MNIST dataset, the simulation results demonstrate that the test accuracy achieved by our framework is 3.12% and 5.84% higher than that achieved by FedAvg and FedProx without any attacks, respectively. The framework exhibits a 1.35% accuracy improvement over the ideal Local SGD under attacks. Furthermore, aiming for the same target accuracy, our framework demands notably less computation time than both FedAvg and FedProx.

Correlated Noise Provably Beats Independent Noise for Differentially Private Learning

Authors: Authors: Christopher A. Choquette-Choo, Krishnamurthy Dvijotham, Krishna Pillutla, Arun Ganesh, Thomas Steinke, Abhradeep Thakurta
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.06771
Pdf link: https://arxiv.org/pdf/2310.06771
Abstract Differentially private learning algorithms inject noise into the learning process. While the most common private learning algorithm, DP-SGD, adds independent Gaussian noise in each iteration, recent work on matrix factorization mechanisms has shown empirically that introducing correlations in the noise can greatly improve their utility. We characterize the asymptotic learning utility for any choice of the correlation function, giving precise analytical bounds for linear regression and as the solution to a convex program for general convex functions. We show, using these bounds, how correlated noise provably improves upon vanilla DP-SGD as a function of problem parameters such as the effective dimension and condition number. Moreover, our analytical expression for the near-optimal correlation function circumvents the cubic complexity of the semi-definite program used to optimize the noise correlation matrix in previous work. We validate our theory with experiments on private deep learning. Our work matches or outperforms prior work while being efficient both in terms of compute and memory.

Keyword: optimization

Augmenting Vision-Based Human Pose Estimation with Rotation Matrix

Authors: Authors: Milad Vazan, Fatemeh Sadat Masoumi, Ruizhi Ou, Reza Rawassizadeh
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.06068
Pdf link: https://arxiv.org/pdf/2310.06068
Abstract Fitness applications are commonly used to monitor activities within the gym, but they often fail to automatically track indoor activities inside the gym. This study proposes a model that utilizes pose estimation combined with a novel data augmentation method, i.e., rotation matrix. We aim to enhance the classification accuracy of activity recognition based on pose estimation data. Through our experiments, we experiment with different classification algorithms along with image augmentation approaches. Our findings demonstrate that the SVM with SGD optimization, using data augmentation with the Rotation Matrix, yields the most accurate results, achieving a 96% accuracy rate in classifying five physical activities. Conversely, without implementing the data augmentation techniques, the baseline accuracy remains at a modest 64%.

High Dimensional Causal Inference with Variational Backdoor Adjustment

Authors: Authors: Daniel Israel, Aditya Grover, Guy Van den Broeck
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.06100
Pdf link: https://arxiv.org/pdf/2310.06100
Abstract Backdoor adjustment is a technique in causal inference for estimating interventional quantities from purely observational data. For example, in medical settings, backdoor adjustment can be used to control for confounding and estimate the effectiveness of a treatment. However, high dimensional treatments and confounders pose a series of potential pitfalls: tractability, identifiability, optimization. In this work, we take a generative modeling approach to backdoor adjustment for high dimensional treatments and confounders. We cast backdoor adjustment as an optimization problem in variational inference without reliance on proxy variables and hidden confounders. Empirically, our method is able to estimate interventional likelihood in a variety of high dimensional settings, including semi-synthetic X-ray medical data. To the best of our knowledge, this is the first application of backdoor adjustment in which all the relevant variables are high dimensional.

OptiMUS: Optimization Modeling Using mip Solvers and large language models

Authors: Authors: Ali AhmadiTeshnizi, Wenzhi Gao, Madeleine Udell
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.06116
Pdf link: https://arxiv.org/pdf/2310.06116
Abstract Optimization problems are pervasive across various sectors, from manufacturing and distribution to healthcare. However, most such problems are still solved heuristically by hand rather than optimally by state-of-the-art solvers, as the expertise required to formulate and solve these problems limits the widespread adoption of optimization tools and techniques. We introduce OptiMUS, a Large Language Model (LLM)-based agent designed to formulate and solve MILP problems from their natural language descriptions. OptiMUS is capable of developing mathematical models, writing and debugging solver code, developing tests, and checking the validity of generated solutions. To benchmark our agent, we present NLP4LP, a novel dataset of linear programming (LP) and mixed integer linear programming (MILP) problems. Our experiments demonstrate that OptiMUS is able to solve 67% more problems compared to a basic LLM prompting strategy. OptiMUS code and NLP4LP dataset are available at \href{https://github.com/teshnizi/OptiMUS}{https://github.com/teshnizi/OptiMUS}

Delay-Optimal Service Chain Forwarding and Offloading in Collaborative Edge Computing

Authors: Authors: Jinkun Zhang, Edmund Yeh
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2310.06141
Pdf link: https://arxiv.org/pdf/2310.06141
Abstract Collaborative edge computing (CEC) is an emerging paradigm for heterogeneous devices to collaborate on edge computation jobs. For congestible links and computing units, delay-optimal forwarding and offloading for service chain tasks (e.g., DNN with vertical split) in CEC remains an open problem. In this paper, we formulate the service chain forwarding and offloading in CEC with arbitrary topology and heterogeneous transmission/computation capability, and aim to minimize the network aggregated cost. We consider congestion-aware nonlinear cost functions that cover various performance metrics and constraints, such as average queueing delay with limited processor capacity. We solve the non-convex optimization problem globally by analyzing the KKT condition and proposing a sufficiency optimality condition. We propose a polynomial-time distributed algorithm that converges to the global optimum. The algorithm adapts to changes in input rates and network topology, and can be implemented as an online algorithm. Numerical evaluation shows that our method significantly outperforms baselines in multiple network instances, especially in congested scenarios.

Reinforcement Learning in the Era of LLMs: What is Essential? What is needed? An RL Perspective on RLHF, Prompting, and Beyond

Authors: Authors: Hao Sun
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.06147
Pdf link: https://arxiv.org/pdf/2310.06147
Abstract Recent advancements in Large Language Models (LLMs) have garnered wide attention and led to successful products such as ChatGPT and GPT-4. Their proficiency in adhering to instructions and delivering harmless, helpful, and honest (3H) responses can largely be attributed to the technique of Reinforcement Learning from Human Feedback (RLHF). In this paper, we aim to link the research in conventional RL to RL techniques used in LLM research. Demystify this technique by discussing why, when, and how RL excels. Furthermore, we explore potential future avenues that could either benefit from or contribute to RLHF research. Highlighted Takeaways: 1. RLHF is Online Inverse RL with Offline Demonstration Data. 2. RLHF $>$ SFT because Imitation Learning (and Inverse RL) $>$ Behavior Cloning (BC) by alleviating the problem of compounding error. 3. The RM step in RLHF generates a proxy of the expensive human feedback, such an insight can be generalized to other LLM tasks such as prompting evaluation and optimization where feedback is also expensive. 4. The policy learning in RLHF is more challenging than conventional problems studied in IRL due to their high action dimensionality and feedback sparsity. 5. The main superiority of PPO over off-policy value-based methods is its stability gained from (almost) on-policy data and conservative policy updates.

Multi-Robot Task Assignment and Path Finding for Time-Sensitive Missions with Online Task Generation

Authors: Authors: David Thorne, Brett T. Lopez
Subjects: Multiagent Systems (cs.MA); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.06153
Pdf link: https://arxiv.org/pdf/2310.06153
Abstract Executing time-sensitive multi-robot missions involves two distinct problems: Multi-Robot Task Assignment (MRTA) and Multi-Agent Path Finding (MAPF). Computing safe paths that complete every task and minimize the time to mission completion, or makespan, is a significant computational challenge even for small teams. In many missions, tasks can be generated during execution which is typically handled by either recomputing task assignments and paths from scratch, or by modifying existing plans using approximate approaches. While performing task reassignment and path finding from scratch produces theoretically optimal results, the computational load makes it too expensive for online implementation. In this work, we present Time-Sensitive Online Task Assignment and Navigation (TSOTAN), a framework which can quickly incorporate online generated tasks while guaranteeing bounded suboptimal task assignment makespans. It does this by assessing the quality of partial task reassignments and only performing a complete reoptimization when the makespan exceeds a user specified suboptimality bound. Through experiments in 2D environments we demonstrate TSOTAN's ability to produce quality solutions with computation times suitable for online implementation.

Motion Memory: Leveraging Past Experiences to Accelerate Future Motion Planning

Authors: Authors: Dibyendu Das, Yuanjie Lu, Erion Plaku, Xuesu Xiao
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.06198
Pdf link: https://arxiv.org/pdf/2310.06198
Abstract When facing a new motion-planning problem, most motion planners solve it from scratch, e.g., via sampling and exploration or starting optimization from a straight-line path. However, most motion planners have to experience a variety of planning problems throughout their lifetimes, which are yet to be leveraged for future planning. In this paper, we present a simple but efficient method called Motion Memory, which allows different motion planners to accelerate future planning using past experiences. Treating existing motion planners as either a closed or open box, we present a variety of ways that Motion Memory can contribute to reduce the planning time when facing a new planning problem. We provide extensive experiment results with three different motion planners on three classes of planning problems with over 30,000 problem instances and show that planning speed can be significantly reduced by up to 89% with the proposed Motion Memory technique and with increasing past planning experiences.

CAT-RRT: Motion Planning that Admits Contact One Link at a Time

Authors: Authors: Nataliya Nechyporenko, Caleb Escobedo, Shreyas Kadekodi, Alessandro Roncone
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.06210
Pdf link: https://arxiv.org/pdf/2310.06210
Abstract Current motion planning approaches rely on binary collision checking to evaluate the validity of a state and thereby dictate where the robot is allowed to move. This approach leaves little room for robots to engage in contact with an object, as is often necessary when operating in densely cluttered spaces. In this work, we propose an alternative method that considers contact states as high-cost states that the robot should avoid but can traverse if necessary to complete a task. More specifically, we introduce Contact Admissible Transition-based Rapidly exploring Random Trees (CAT-RRT), a planner that uses a novel per-link cost heuristic to find a path by traversing high-cost obstacle regions. Through extensive testing, we find that state-of-the-art optimization planners tend to over-explore low-cost states, which leads to slow and inefficient convergence to contact regions. Conversely, CAT-RRT searches both low and high-cost regions simultaneously with an adaptive thresholding mechanism carried out at each robot link. This leads to paths with a balance between efficiency, path length, and contact cost.

Federated Multi-Level Optimization over Decentralized Networks

Authors: Authors: Shuoguang Yang, Xuezhou Zhang, Mengdi Wang
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.06217
Pdf link: https://arxiv.org/pdf/2310.06217
Abstract Multi-level optimization has gained increasing attention in recent years, as it provides a powerful framework for solving complex optimization problems that arise in many fields, such as meta-learning, multi-player games, reinforcement learning, and nested composition optimization. In this paper, we study the problem of distributed multi-level optimization over a network, where agents can only communicate with their immediate neighbors. This setting is motivated by the need for distributed optimization in large-scale systems, where centralized optimization may not be practical or feasible. To address this problem, we propose a novel gossip-based distributed multi-level optimization algorithm that enables networked agents to solve optimization problems at different levels in a single timescale and share information through network propagation. Our algorithm achieves optimal sample complexity, scaling linearly with the network size, and demonstrates state-of-the-art performance on various applications, including hyper-parameter tuning, decentralized reinforcement learning, and risk-averse optimization.

Transmission Investment Coordination using MILP Lagrange Dual Decomposition and Auxiliary Problem Principle

Authors: Authors: Sambuddha Chakrabarti, Hosna Khajeh, Thomas R Nudell, Mohammad Reza Hesamzadeh, Ross Baldick
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.06231
Pdf link: https://arxiv.org/pdf/2310.06231
Abstract This paper considers the investment coordination problem for the long term transmission capacity expansion in a situation where there are multiple regional Transmission Planners (TPs), each acting in order to maximize the utility in only its own region. In such a setting, any particular TP does not normally have any incentive to cooperate with the neighboring TP(s), although the optimal investment decision of each TP is contingent upon those of the neighboring TPs. A game-theoretic interaction among the TPs does not necessarily lead to this overall social optimum. We, therefore, introduce a social planner and call it the Transmission Planning Coordinator (TPC) whose goal is to attain the optimal possible social welfare for the bigger geographical region. In order to achieve this goal, this paper introduces a new incentive mechanism, based on distributed optimization theory. This incentive mechanism can be viewed as a set of rules of the transmission expansion investment coordination game, set by the social planner TPC, such that, even if the individual TPs act selfishly, it will still lead to the TPC's goal of attaining overall social optimum. Finally, the effectiveness of our approach is demonstrated through several simulation studies.

Spiking PointNet: Spiking Neural Networks for Point Clouds

Authors: Authors: Dayong Ren, Zhe Ma, Yuanpei Chen, Weihang Peng, Xiaode Liu, Yuhan Zhang, Yufei Guo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.06232
Pdf link: https://arxiv.org/pdf/2310.06232
Abstract Recently, Spiking Neural Networks (SNNs), enjoying extreme energy efficiency, have drawn much research attention on 2D visual recognition and shown gradually increasing application potential. However, it still remains underexplored whether SNNs can be generalized to 3D recognition. To this end, we present Spiking PointNet in the paper, the first spiking neural model for efficient deep learning on point clouds. We discover that the two huge obstacles limiting the application of SNNs in point clouds are: the intrinsic optimization obstacle of SNNs that impedes the training of a big spiking model with large time steps, and the expensive memory and computation cost of PointNet that makes training a big spiking point model unrealistic. To solve the problems simultaneously, we present a trained-less but learning-more paradigm for Spiking PointNet with theoretical justifications and in-depth experimental analysis. In specific, our Spiking PointNet is trained with only a single time step but can obtain better performance with multiple time steps inference, compared to the one trained directly with multiple time steps. We conduct various experiments on ModelNet10, ModelNet40 to demonstrate the effectiveness of Spiking PointNet. Notably, our Spiking PointNet even can outperform its ANN counterpart, which is rare in the SNN field thus providing a potential research direction for the following work. Moreover, Spiking PointNet shows impressive speedup and storage saving in the training phase.

Distributed Multi-Time Slot Power Balancing Control of Power Systems with Energy Storage Devices

Authors: Authors: Luwei Yang, Tao Liu, David J. Hill
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.06240
Pdf link: https://arxiv.org/pdf/2310.06240
Abstract This paper studies a crucial problem in power system balancing control, i.e., the multi-time slot economic dispatch (MTSED) problem, for power grids with substantial renewables, synchronous generators (SGs), and energy storage devices (ESDs). The target of MTSED is to optimally coordinate active/reactive power outputs of all controllable units to meet a forecast net demand profile over multiple time slots within a receding finite time horizon. Firstly, the MTSED is formulated as an optimization problem with operational constraints, including the limits on the output of each controllable unit, ramping rates of SGss, energy levels of ESDs, and bus voltages. Then, a novel projection-based algorithm is developed to solve the problem in a distributed way. In particular, the distributed algorithm is not limited to solving the MTSED problem but also applies to more general optimization problems with both generic convex objective functions and hard feasibility constraints. Finally, case studies verify the effectiveness of the proposed method.

Sample-Efficient Multi-Agent RL: An Optimization Perspective

Authors: Authors: Nuoya Xiong, Zhihan Liu, Zhaoran Wang, Zhuoran Yang
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2310.06243
Pdf link: https://arxiv.org/pdf/2310.06243
Abstract We study multi-agent reinforcement learning (MARL) for the general-sum Markov Games (MGs) under the general function approximation. In order to find the minimum assumption for sample-efficient learning, we introduce a novel complexity measure called the Multi-Agent Decoupling Coefficient (MADC) for general-sum MGs. Using this measure, we propose the first unified algorithmic framework that ensures sample efficiency in learning Nash Equilibrium, Coarse Correlated Equilibrium, and Correlated Equilibrium for both model-based and model-free MARL problems with low MADC. We also show that our algorithm provides comparable sublinear regret to the existing works. Moreover, our algorithm combines an equilibrium-solving oracle with a single objective optimization subprocedure that solves for the regularized payoff of each deterministic joint policy, which avoids solving constrained optimization problems within data-dependent constraints (Jin et al. 2020; Wang et al. 2023) or executing sampling procedures with complex multi-objective optimization problems (Foster et al. 2023), thus being more amenable to empirical implementation.

A Unified View on Solving Objective Mismatch in Model-Based Reinforcement Learning

Authors: Authors: Ran Wei, Nathan Lambert, Anthony McDonald, Alfredo Garcia, Roberto Calandra
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.06253
Pdf link: https://arxiv.org/pdf/2310.06253
Abstract Model-based Reinforcement Learning (MBRL) aims to make agents more sample-efficient, adaptive, and explainable by learning an explicit model of the environment. While the capabilities of MBRL agents have significantly improved in recent years, how to best learn the model is still an unresolved question. The majority of MBRL algorithms aim at training the model to make accurate predictions about the environment and subsequently using the model to determine the most rewarding actions. However, recent research has shown that model predictive accuracy is often not correlated with action quality, tracing the root cause to the \emph{objective mismatch} between accurate dynamics model learning and policy optimization of rewards. A number of interrelated solution categories to the objective mismatch problem have emerged as MBRL continues to mature as a research area. In this work, we provide an in-depth survey of these solution categories and propose a taxonomy to foster future research.

Bi-Level Offline Policy Optimization with Limited Exploration

Authors: Authors: Wenzhuo Zhou
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST)
Arxiv link: https://arxiv.org/abs/2310.06268
Pdf link: https://arxiv.org/pdf/2310.06268
Abstract We study offline reinforcement learning (RL) which seeks to learn a good policy based on a fixed, pre-collected dataset. A fundamental challenge behind this task is the distributional shift due to the dataset lacking sufficient exploration, especially under function approximation. To tackle this issue, we propose a bi-level structured policy optimization algorithm that models a hierarchical interaction between the policy (upper-level) and the value function (lower-level). The lower level focuses on constructing a confidence set of value estimates that maintain sufficiently small weighted average Bellman errors, while controlling uncertainty arising from distribution mismatch. Subsequently, at the upper level, the policy aims to maximize a conservative value estimate from the confidence set formed at the lower level. This novel formulation preserves the maximum flexibility of the implicitly induced exploratory data distribution, enabling the power of model extrapolation. In practice, it can be solved through a computationally efficient, penalized adversarial estimation procedure. Our theoretical regret guarantees do not rely on any data-coverage and completeness-type assumptions, only requiring realizability. These guarantees also demonstrate that the learned policy represents the "best effort" among all policies, as no other policies can outperform it. We evaluate our model using a blend of synthetic, benchmark, and real-world datasets for offline RL, showing that it performs competitively with state-of-the-art methods.

Reducing Detailed Vehicle Energy Dynamics to Physics-Like Models

Authors: Authors: Nour Khoudari, Sulaiman Almatrudi, Rabie Ramadan, Joy Carpio, Mengsha Yao, Kenneth Butts, Alexandre M. Bayen, Jonathan W. Lee, Benjamin Seibold
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.06297
Pdf link: https://arxiv.org/pdf/2310.06297
Abstract The energy demand of vehicles, particularly in unsteady drive cycles, is affected by complex dynamics internal to the engine and other powertrain components. Yet, in many applications, particularly macroscopic traffic flow modeling and optimization, structurally simple approximations to the complex vehicle dynamics are needed that nevertheless reproduce the correct effective energy behavior. This work presents a systematic model reduction pipeline that starts from complex vehicle models based on the Autonomie software and derives a hierarchy of simplified models that are fast to evaluate, easy to disseminate in open-source frameworks, and compatible with optimization frameworks. The pipeline, based on a virtual chassis dynamometer and subsequent approximation strategies, is reproducible and is applied to six different vehicle classes to produce concrete explicit energy models that represent an average vehicle in each class and leverage the accuracy and validation work of the Autonomie software.

CrowdRec: 3D Crowd Reconstruction from Single Color Images

Authors: Authors: Buzhen Huang, Jingyi Ju, Yangang Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.06332
Pdf link: https://arxiv.org/pdf/2310.06332
Abstract This is a technical report for the GigaCrowd challenge. Reconstructing 3D crowds from monocular images is a challenging problem due to mutual occlusions, server depth ambiguity, and complex spatial distribution. Since no large-scale 3D crowd dataset can be used to train a robust model, the current multi-person mesh recovery methods can hardly achieve satisfactory performance in crowded scenes. In this paper, we exploit the crowd features and propose a crowd-constrained optimization to improve the common single-person method on crowd images. To avoid scale variations, we first detect human bounding-boxes and 2D poses from the original images with off-the-shelf detectors. Then, we train a single-person mesh recovery network using existing in-the-wild image datasets. To promote a more reasonable spatial distribution, we further propose a crowd constraint to refine the single-person network parameters. With the optimization, we can obtain accurate body poses and shapes with reasonable absolute positions from a large-scale crowd image using a single-person backbone. The code will be publicly available at~\url{https://github.com/boycehbz/CrowdRec}.

Mutual Information Metrics for Uplink MIMO-OFDM Integrated Sensing and Communication System

Authors: Authors: Jinghui Piao, Zhiqing Wei, Xin Yuan, Xiaoyu Yang, Huici Wu, Zhiyong Feng
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2310.06382
Pdf link: https://arxiv.org/pdf/2310.06382
Abstract As the uplink sensing has the advantage of easy implementation, it attracts great attention in integrated sensing and communication (ISAC) system. This paper presents an uplink ISAC system based on multi-input multi-output orthogonal frequency division multiplexing (MIMO-OFDM) technology. The mutual information (MI) is introduced as a unified metric to evaluate the performance of communication and sensing. In this paper, firstly, the upper and lower bounds of communication and sensing MI are derived in details based on the interaction between communication and sensing. And the ISAC waveform is optimized by maximizing the weighted sum of sensing and communication MI. The Monte Carlo simulation results show that, compared with other waveform optimization schemes, the proposed ISAC scheme has the best overall performance.

Lo-Hi: Practical ML Drug Discovery Benchmark

Authors: Authors: Simon Steshin
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.06399
Pdf link: https://arxiv.org/pdf/2310.06399
Abstract Finding new drugs is getting harder and harder. One of the hopes of drug discovery is to use machine learning models to predict molecular properties. That is why models for molecular property prediction are being developed and tested on benchmarks such as MoleculeNet. However, existing benchmarks are unrealistic and are too different from applying the models in practice. We have created a new practical \emph{Lo-Hi} benchmark consisting of two tasks: Lead Optimization (Lo) and Hit Identification (Hi), corresponding to the real drug discovery process. For the Hi task, we designed a novel molecular splitting algorithm that solves the Balanced Vertex Minimum $k$-Cut problem. We tested state-of-the-art and classic ML models, revealing which works better under practical settings. We analyzed modern benchmarks and showed that they are unrealistic and overoptimistic. Review: https://openreview.net/forum?id=H2Yb28qGLV Lo-Hi benchmark: https://github.com/SteshinSS/lohi_neurips2023 Lo-Hi splitter library: https://github.com/SteshinSS/lohi_splitter

Plane Constraints Aided Multi-Vehicle Cooperative Positioning Using Factor Graph Optimization

Authors: Authors: Chen Zhuang, Hongbo Zhao
Subjects: Robotics (cs.RO); Signal Processing (eess.SP); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.06414
Pdf link: https://arxiv.org/pdf/2310.06414
Abstract The development of vehicle-to-vehicle (V2V) communication facil-itates the study of cooperative positioning (CP) techniques for vehicular applications. The CP methods can improve the posi-tioning availability and accuracy by inter-vehicle ranging and data exchange between vehicles. However, the inter-vehicle rang-ing can be easily interrupted due to many factors such as obsta-cles in-between two cars. Without inter-vehicle ranging, the other cooperative data such as vehicle positions will be wasted, leading to performance degradation of range-based CP methods. To fully utilize the cooperative data and mitigate the impact of inter-vehicle ranging loss, a novel cooperative positioning method aided by plane constraints is proposed in this paper. The positioning results received from cooperative vehicles are used to construct the road plane for each vehicle. The plane parameters are then introduced into CP scheme to impose constraints on positioning solutions. The state-of-art factor graph optimization (FGO) algo-rithm is employed to integrate the plane constraints with raw data of Global Navigation Satellite Systems (GNSS) as well as inter-vehicle ranging measurements. The proposed CP method has the ability to resist the interruptions of inter-vehicle ranging since the plane constraints are computed by just using position-related data. A vehicle can still benefit from the position data of cooperative vehicles even if the inter-vehicle ranging is unavaila-ble. The experimental results indicate the superiority of the pro-posed CP method in positioning performance over the existing methods, especially when the inter-ranging interruptions occur.

Approaches to the Algorithmic Allocation of Public Resources: A Cross-disciplinary Review

Authors: Authors: Saba Esnaashari, Jonathan Bright, John Francis, Youmna Hashem, Vincent Straub, Deborah Morgan
Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2310.06475
Pdf link: https://arxiv.org/pdf/2310.06475
Abstract Allocation of scarce resources is a recurring challenge for the public sector: something that emerges in areas as diverse as healthcare, disaster recovery, and social welfare. The complexity of these policy domains and the need for meeting multiple and sometimes conflicting criteria has led to increased focus on the use of algorithms in this type of decision. However, little engagement between researchers across these domains has happened, meaning a lack of understanding of common problems and techniques for approaching them. Here, we performed a cross disciplinary literature review to understand approaches taken for different areas of algorithmic allocation including healthcare, organ transplantation, homelessness, disaster relief, and welfare. We initially identified 1070 papers by searching the literature, then six researchers went through them in two phases of screening resulting in 176 and 75 relevant papers respectively. We then analyzed the 75 papers from the lenses of optimization goals, techniques, interpretability, flexibility, bias, ethical considerations, and performance. We categorized approaches into human-oriented versus resource-oriented perspective, and individual versus aggregate and identified that 76% of the papers approached the problem from a human perspective and 60% from an aggregate level using optimization techniques. We found considerable potential for performance gains, with optimization techniques often decreasing waiting times and increasing success rate by as much as 50%. However, there was a lack of attention to responsible innovation: only around one third of the papers considered ethical issues in choosing the optimization goals while just a very few of them paid attention to the bias issues. Our work can serve as a guide for policy makers and researchers wanting to use an algorithm for addressing a resource allocation problem.

Memory efficient location recommendation through proximity-aware representation

Authors: Authors: Xuan Luo, Rui Lv, Hui Zhao
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.06484
Pdf link: https://arxiv.org/pdf/2310.06484
Abstract Sequential location recommendation plays a huge role in modern life, which can enhance user experience, bring more profit to businesses and assist in government administration. Although methods for location recommendation have evolved significantly thanks to the development of recommendation systems, there is still limited utilization of geographic information, along with the ongoing challenge of addressing data sparsity. In response, we introduce a Proximity-aware based region representation for Sequential Recommendation (PASR for short), built upon the Self-Attention Network architecture. We tackle the sparsity issue through a novel loss function employing importance sampling, which emphasizes informative negative samples during optimization. Moreover, PASR enhances the integration of geographic information by employing a self-attention-based geography encoder to the hierarchical grid and proximity grid at each GPS point. To further leverage geographic information, we utilize the proximity-aware negative samplers to enhance the quality of negative samples. We conducted evaluations using three real-world Location-Based Social Networking (LBSN) datasets, demonstrating that PASR surpasses state-of-the-art sequential location recommendation methods

Self-Supervised Set Representation Learning for Unsupervised Meta-Learning

Authors: Authors: Dong Bok Lee, Seanie Lee, Joonho Ko, Kenji Kawaguchi, Juho Lee, Sung Ju Hwang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.06511
Pdf link: https://arxiv.org/pdf/2310.06511
Abstract Dataset distillation methods have achieved remarkable success in distilling a large dataset into a small set of representative samples. However, they are not designed to produce a distilled dataset that can be effectively used for facilitating self-supervised pre-training. To this end, we propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL). We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is \textit{biased} due to the randomness originating from data augmentations or masking. To address this issue, we propose to minimize the mean squared error (MSE) between a model's representations of the synthetic examples and their corresponding learnable target feature representations for the inner objective, which does not introduce any randomness. Our primary motivation is that the model obtained by the proposed inner optimization can mimic the \textit{self-supervised target model}. To achieve this, we also introduce the MSE between representations of the inner model and the self-supervised target model on the original full dataset for outer optimization. Lastly, assuming that a feature extractor is fixed, we only optimize a linear head on top of the feature extractor, which allows us to reduce the computational cost and obtain a closed-form solution of the head with kernel ridge regression. We empirically validate the effectiveness of our method on various applications involving transfer learning.

An Edge-Aware Graph Autoencoder Trained on Scale-Imbalanced Data for Travelling Salesman Problems

Authors: Authors: Shiqing Liu, Xueming Yan, Yaochu Jin
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.06543
Pdf link: https://arxiv.org/pdf/2310.06543
Abstract Recent years have witnessed a surge in research on machine learning for combinatorial optimization since learning-based approaches can outperform traditional heuristics and approximate exact solvers at a lower computation cost. However, most existing work on supervised neural combinatorial optimization focuses on TSP instances with a fixed number of cities and requires large amounts of training samples to achieve a good performance, making them less practical to be applied to realistic optimization scenarios. This work aims to develop a data-driven graph representation learning method for solving travelling salesman problems (TSPs) with various numbers of cities. To this end, we propose an edge-aware graph autoencoder (EdgeGAE) model that can learn to solve TSPs after being trained on solution data of various sizes with an imbalanced distribution. We formulate the TSP as a link prediction task on sparse connected graphs. A residual gated encoder is trained to learn latent edge embeddings, followed by an edge-centered decoder to output link predictions in an end-to-end manner. To improve the model's generalization capability of solving large-scale problems, we introduce an active sampling strategy into the training process. In addition, we generate a benchmark dataset containing 50,000 TSP instances with a size from 50 to 500 cities, following an extremely scale-imbalanced distribution, making it ideal for investigating the model's performance for practical applications. We conduct experiments using different amounts of training data with various scales, and the experimental results demonstrate that the proposed data-driven approach achieves a highly competitive performance among state-of-the-art learning-based methods for solving TSPs.

Zero-Level-Set Encoder for Neural Distance Fields

Authors: Authors: Stefan Rhys Jeske, Jonathan Klein, Dominik L. Michels, Jan Bender
Subjects: Machine Learning (cs.LG); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2310.06644
Pdf link: https://arxiv.org/pdf/2310.06644
Abstract Neural shape representation generally refers to representing 3D geometry using neural networks, e.g., to compute a signed distance or occupancy value at a specific spatial position. Previous methods tend to rely on the auto-decoder paradigm, which often requires densely-sampled and accurate signed distances to be known during training and testing, as well as an additional optimization loop during inference. This introduces a lot of computational overhead, in addition to having to compute signed distances analytically, even during testing. In this paper, we present a novel encoder-decoder neural network for embedding 3D shapes in a single forward pass. Our architecture is based on a multi-scale hybrid system incorporating graph-based and voxel-based components, as well as a continuously differentiable decoder. Furthermore, the network is trained to solve the Eikonal equation and only requires knowledge of the zero-level set for training and inference. Additional volumetric samples can be generated on-the-fly, and incorporated in an unsupervised manner. This means that in contrast to most previous work, our network is able to output valid signed distance fields without explicit prior knowledge of non-zero distance values or shape occupancy. In other words, our network computes approximate solutions to the boundary-valued Eikonal equation. It also requires only a single forward pass during inference, instead of the common latent code optimization. We further propose a modification of the loss function in case that surface normals are not well defined, e.g., in the context of non-watertight surface-meshes and non-manifold geometry. We finally demonstrate the efficacy, generalizability and scalability of our method on datasets consisting of deforming 3D shapes, single class encoding and multiclass encoding, showcasing a wide range of possible applications.

Diversity from Human Feedback

Authors: Authors: Ren-Jian Wang, Ke Xue, Yutong Wang, Peng Yang, Haobo Fu, Qiang Fu, Chao Qian
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2310.06648
Pdf link: https://arxiv.org/pdf/2310.06648
Abstract Diversity plays a significant role in many problems, such as ensemble learning, reinforcement learning, and combinatorial optimization. How to define the diversity measure is a longstanding problem. Many methods rely on expert experience to define a proper behavior space and then obtain the diversity measure, which is, however, challenging in many scenarios. In this paper, we propose the problem of learning a behavior space from human feedback and present a general method called Diversity from Human Feedback (DivHF) to solve it. DivHF learns a behavior descriptor consistent with human preference by querying human feedback. The learned behavior descriptor can be combined with any distance measure to define a diversity measure. We demonstrate the effectiveness of DivHF by integrating it with the Quality-Diversity optimization algorithm MAP-Elites and conducting experiments on the QDax suite. The results show that DivHF learns a behavior space that aligns better with human requirements compared to direct data-driven approaches and leads to more diverse solutions under human preference. Our contributions include formulating the problem, proposing the DivHF method, and demonstrating its effectiveness through experiments.

A Parallelized, Adam-Based Solver for Reserve and Security Constrained AC Unit Commitment

Authors: Authors: Samuel Chevalier
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.06650
Pdf link: https://arxiv.org/pdf/2310.06650
Abstract Power system optimization problems which include the nonlinear AC power flow equations require powerful and robust numerical solution algorithms. Within this sub-field of nonlinear optimization, interior point methods have come to dominate the solver landscape. Over the last decade, however, a number of efficient numerical optimizers have emerged from the field of Machine Learning (ML). One algorithm in particular, Adam, has become the optimizer-of-choice for a massive percentage of ML training problems (including, e.g., the training of GPT-3), solving some of the largest unconstrained optimization problems ever conceived of. Inspired by such progress, this paper designs a parallelized Adam-based numerical solver to overcome one of the most challenging power system optimization problems: security and reserve constrained AC Unit Commitment. The resulting solver, termed quasiGrad, recently competed in the third ARPA-E Grid Optimization (GO3) competition. In the day-ahead market clearing category (with systems ranging from 3 to 23,643 buses over 48 time periods), quasiGrad's aggregated market surplus scores were within 5% of the winningest market surplus scores. The quasiGrad solver is now released as an open-source Julia package: quasiGrad.jl. The internal gradient-based solver (Adam) can easily be substituted for other ML-inspired solvers (e.g., AdaGrad, AdaDelta, RMSProp, etc.). Test results from large experiments are provided.

Approximating Nash Equilibria in Normal-Form Games via Stochastic Optimization

Authors: Authors: Ian Gemp, Luke Marris, Georgios Piliouras
Subjects: Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2310.06689
Pdf link: https://arxiv.org/pdf/2310.06689
Abstract We propose the first, to our knowledge, loss function for approximate Nash equilibria of normal-form games that is amenable to unbiased Monte Carlo estimation. This construction allows us to deploy standard non-convex stochastic optimization techniques for approximating Nash equilibria, resulting in novel algorithms with provable guarantees. We complement our theoretical analysis with experiments demonstrating that stochastic gradient descent can outperform previous state-of-the-art approaches.

HiFi-123: Towards High-fidelity One Image to 3D Content Generation

Authors: Authors: Wangbo Yu, Li Yuan, Yan-Pei Cao, Xiangjun Gao, Xiaoyu Li, Long Quan, Ying Shan, Yonghong Tian
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.06744
Pdf link: https://arxiv.org/pdf/2310.06744
Abstract Recent advances in text-to-image diffusion models have enabled 3D generation from a single image. However, current image-to-3D methods often produce suboptimal results for novel views, with blurred textures and deviations from the reference image, limiting their practical applications. In this paper, we introduce HiFi-123, a method designed for high-fidelity and multi-view consistent 3D generation. Our contributions are twofold: First, we propose a reference-guided novel view enhancement technique that substantially reduces the quality gap between synthesized and reference views. Second, capitalizing on the novel view enhancement, we present a novel reference-guided state distillation loss. When incorporated into the optimization-based image-to-3D pipeline, our method significantly improves 3D generation quality, achieving state-of-the-art performance. Comprehensive evaluations demonstrate the effectiveness of our approach over existing methods, both qualitatively and quantitatively.

Comparing AI Algorithms for Optimizing Elliptic Curve Cryptography Parameters in Third-Party E-Commerce Integrations: A Pre-Quantum Era Analysis

Authors: Authors: Felipe Tellez, Jorge Ortiz
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.06752
Pdf link: https://arxiv.org/pdf/2310.06752
Abstract This paper presents a comparative analysis between the Genetic Algorithm (GA) and Particle Swarm Optimization (PSO), two vital artificial intelligence algorithms, focusing on optimizing Elliptic Curve Cryptography (ECC) parameters. These encompass the elliptic curve coefficients, prime number, generator point, group order, and cofactor. The study provides insights into which of the bio-inspired algorithms yields better optimization results for ECC configurations, examining performances under the same fitness function. This function incorporates methods to ensure robust ECC parameters, including assessing for singular or anomalous curves and applying Pollard's rho attack and Hasse's theorem for optimization precision. The optimized parameters generated by GA and PSO are tested in a simulated e-commerce environment, contrasting with well-known curves like secp256k1 during the transmission of order messages using Elliptic Curve-Diffie Hellman (ECDH) and Hash-based Message Authentication Code (HMAC). Focusing on traditional computing in the pre-quantum era, this research highlights the efficacy of GA and PSO in ECC optimization, with implications for enhancing cybersecurity in third-party e-commerce integrations. We recommend the immediate consideration of these findings before quantum computing's widespread adoption.

Efficient Graduated Non-Convexity for Pose Graph Optimization

Authors: Authors: Wonseok Kang, Jaehyun Kim, Jiseong Chung, Seungwon Choi, Tae-wan Kim
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.06765
Pdf link: https://arxiv.org/pdf/2310.06765
Abstract We propose a novel approach to Graduated Non-Convexity (GNC) and demonstrate its efficacy through its application in robust pose graph optimization, a key component in SLAM backends. Traditional GNC methods often rely on heuristic methods for GNC schedule, updating control parameter {\mu} for escalating the non-convexity. In contrast, our approach leverages the properties of convex functions and convex optimization to identify the boundary points beyond which convexity is no longer guaranteed, thereby eliminating redundant optimization steps in existing methodologies and enhancing both speed and robustness. We show that our method outperforms the state-of-the-art method in terms of speed and accuracy when used for robust back-end pose graph optimization via GNC. Our work builds upon and enhances the open-source riSAM framework. Our implementation can be accessed from: https://github.com/SNU-DLLAB/EGNC-PGO

$f$-Policy Gradients: A General Framework for Goal Conditioned RL using $f$-Divergences

Authors: Authors: Siddhant Agarwal, Ishan Durugkar, Peter Stone, Amy Zhang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.06794
Pdf link: https://arxiv.org/pdf/2310.06794
Abstract Goal-Conditioned Reinforcement Learning (RL) problems often have access to sparse rewards where the agent receives a reward signal only when it has achieved the goal, making policy optimization a difficult problem. Several works augment this sparse reward with a learned dense reward function, but this can lead to sub-optimal policies if the reward is misaligned. Moreover, recent works have demonstrated that effective shaping rewards for a particular problem can depend on the underlying learning algorithm. This paper introduces a novel way to encourage exploration called $f$-Policy Gradients, or $f$-PG. $f$-PG minimizes the f-divergence between the agent's state visitation distribution and the goal, which we show can lead to an optimal policy. We derive gradients for various f-divergences to optimize this objective. Our learning paradigm provides dense learning signals for exploration in sparse reward settings. We further introduce an entropy-regularized policy optimization objective, that we call $state$-MaxEnt RL (or $s$-MaxEnt RL) as a special case of our objective. We show that several metric-based shaping rewards like L2 can be used with $s$-MaxEnt RL, providing a common ground to study such metric-based shaping rewards with efficient exploration. We find that $f$-PG has better performance compared to standard policy gradient methods on a challenging gridworld as well as the Point Maze and FetchReach environments. More information on our website https://agarwalsiddhant10.github.io/projects/fpg.html.

Teaching Language Models to Hallucinate Less with Synthetic Tasks

Authors: Authors: Erik Jones, Hamid Palangi, Clarisse Simões, Varun Chandrasekaran, Subhabrata Mukherjee, Arindam Mitra, Ahmed Awadallah, Ece Kamar
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.06827
Pdf link: https://arxiv.org/pdf/2310.06827
Abstract Large language models (LLMs) frequently hallucinate on abstractive summarization tasks such as document-based question-answering, meeting summarization, and clinical report generation, even though all necessary information is included in context. However, optimizing LLMs to hallucinate less on these tasks is challenging, as hallucination is hard to efficiently evaluate at each optimization step. In this work, we show that reducing hallucination on a synthetic task can also reduce hallucination on real-world downstream tasks. Our method, SynTra, first designs a synthetic task where hallucinations are easy to elicit and measure. It next optimizes the LLM's system message via prefix-tuning on the synthetic task, and finally transfers the system message to realistic, hard-to-optimize tasks. Across three realistic abstractive summarization tasks, SynTra reduces hallucination for two 13B-parameter LLMs using only a synthetic retrieval task for supervision. We also find that optimizing the system message rather than the model weights can be critical; fine-tuning the entire model on the synthetic task can counterintuitively increase hallucination. Overall, SynTra demonstrates that the extra flexibility of working with synthetic data can help mitigate undesired behaviors in practice.

Keyword: adam

A Parallelized, Adam-Based Solver for Reserve and Security Constrained AC Unit Commitment

Authors: Authors: Samuel Chevalier
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.06650
Pdf link: https://arxiv.org/pdf/2310.06650
Abstract Power system optimization problems which include the nonlinear AC power flow equations require powerful and robust numerical solution algorithms. Within this sub-field of nonlinear optimization, interior point methods have come to dominate the solver landscape. Over the last decade, however, a number of efficient numerical optimizers have emerged from the field of Machine Learning (ML). One algorithm in particular, Adam, has become the optimizer-of-choice for a massive percentage of ML training problems (including, e.g., the training of GPT-3), solving some of the largest unconstrained optimization problems ever conceived of. Inspired by such progress, this paper designs a parallelized Adam-based numerical solver to overcome one of the most challenging power system optimization problems: security and reserve constrained AC Unit Commitment. The resulting solver, termed quasiGrad, recently competed in the third ARPA-E Grid Optimization (GO3) competition. In the day-ahead market clearing category (with systems ranging from 3 to 23,643 buses over 48 time periods), quasiGrad's aggregated market surplus scores were within 5% of the winningest market surplus scores. The quasiGrad solver is now released as an open-source Julia package: quasiGrad.jl. The internal gradient-based solver (Adam) can easily be substituted for other ML-inspired solvers (e.g., AdaGrad, AdaDelta, RMSProp, etc.). Test results from large experiments are provided.

Keyword: gradient

Fingerprint Attack: Client De-Anonymization in Federated Learning

Authors: Authors: Qiongkai Xu, Trevor Cohn, Olga Ohrimenko
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.05960
Pdf link: https://arxiv.org/pdf/2310.05960
Abstract Federated Learning allows collaborative training without data sharing in settings where participants do not trust the central server and one another. Privacy can be further improved by ensuring that communication between the participants and the server is anonymized through a shuffle; decoupling the participant identity from their data. This paper seeks to examine whether such a defense is adequate to guarantee anonymity, by proposing a novel fingerprinting attack over gradients sent by the participants to the server. We show that clustering of gradients can easily break the anonymization in an empirical study of learning federated language models on two language corpora. We then show that training with differential privacy can provide a practical defense against our fingerprint attack.

Learning Layer-wise Equivariances Automatically using Gradients

Authors: Authors: Tycho F.A. van der Ouderaa, Alexander Immer, Mark van der Wilk
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.06131
Pdf link: https://arxiv.org/pdf/2310.06131
Abstract Convolutions encode equivariance symmetries into neural networks leading to better generalisation performance. However, symmetries provide fixed hard constraints on the functions a network can represent, need to be specified in advance, and can not be adapted. Our goal is to allow flexible symmetry constraints that can automatically be learned from data using gradients. Learning symmetry and associated weight connectivity structures from scratch is difficult for two reasons. First, it requires efficient and flexible parameterisations of layer-wise equivariances. Secondly, symmetries act as constraints and are therefore not encouraged by training losses measuring data fit. To overcome these challenges, we improve parameterisations of soft equivariance and learn the amount of equivariance in layers by optimising the marginal likelihood, estimated using differentiable Laplace approximations. The objective balances data fit and model complexity enabling layer-wise symmetry discovery in deep networks. We demonstrate the ability to automatically learn layer-wise equivariances on image classification tasks, achieving equivalent or improved performance over baselines with hard-coded symmetry.

Provably Accelerating Ill-Conditioned Low-rank Estimation via Scaled Gradient Descent, Even with Overparameterization

Authors: Authors: Cong Ma, Xingyu Xu, Tian Tong, Yuejie Chi
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.06159
Pdf link: https://arxiv.org/pdf/2310.06159
Abstract Many problems encountered in science and engineering can be formulated as estimating a low-rank object (e.g., matrices and tensors) from incomplete, and possibly corrupted, linear measurements. Through the lens of matrix and tensor factorization, one of the most popular approaches is to employ simple iterative algorithms such as gradient descent (GD) to recover the low-rank factors directly, which allow for small memory and computation footprints. However, the convergence rate of GD depends linearly, and sometimes even quadratically, on the condition number of the low-rank object, and therefore, GD slows down painstakingly when the problem is ill-conditioned. This chapter introduces a new algorithmic approach, dubbed scaled gradient descent (ScaledGD), that provably converges linearly at a constant rate independent of the condition number of the low-rank object, while maintaining the low per-iteration cost of gradient descent for a variety of tasks including sensing, robust principal component analysis and completion. In addition, ScaledGD continues to admit fast global convergence to the minimax-optimal solution, again almost independent of the condition number, from a small random initialization when the rank is over-specified in the presence of Gaussian noise. In total, ScaledGD highlights the power of appropriate preconditioning in accelerating nonconvex statistical estimation, where the iteration-varying preconditioners promote desirable invariance properties of the trajectory with respect to the symmetry in low-rank factorization without hurting generalization.

DockGame: Cooperative Games for Multimeric Rigid Protein Docking

Authors: Authors: Vignesh Ram Somnath, Pier Giuseppe Sessa, Maria Rodriguez Martinez, Andreas Krause
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.06177
Pdf link: https://arxiv.org/pdf/2310.06177
Abstract Protein interactions and assembly formation are fundamental to most biological processes. Predicting the assembly structure from constituent proteins -- referred to as the protein docking task -- is thus a crucial step in protein design applications. Most traditional and deep learning methods for docking have focused mainly on binary docking, following either a search-based, regression-based, or generative modeling paradigm. In this paper, we focus on the less-studied multimeric (i.e., two or more proteins) docking problem. We introduce DockGame, a novel game-theoretic framework for docking -- we view protein docking as a cooperative game between proteins, where the final assembly structure(s) constitute stable equilibria w.r.t. the underlying game potential. Since we do not have access to the true potential, we consider two approaches - i) learning a surrogate game potential guided by physics-based energy functions and computing equilibria by simultaneous gradient updates, and ii) sampling from the Gibbs distribution of the true potential by learning a diffusion generative model over the action spaces (rotations and translations) of all proteins. Empirically, on the Docking Benchmark 5.5 (DB5.5) dataset, DockGame has much faster runtimes than traditional docking methods, can generate multiple plausible assembly structures, and achieves comparable performance to existing binary docking baselines, despite solving the harder task of coordinating multiple protein chains.

CAST: Cluster-Aware Self-Training for Tabular Data

Authors: Authors: Minwook Kim, Juseong Kim, Kibeom Kim, Donggil Kang, Giltae Song
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.06380
Pdf link: https://arxiv.org/pdf/2310.06380
Abstract Self-training has gained attraction because of its simplicity and versatility, yet it is vulnerable to noisy pseudo-labels. Several studies have proposed successful approaches to tackle this issue, but they have diminished the advantages of self-training because they require specific modifications in self-training algorithms or model architectures. Furthermore, most of them are incompatible with gradient boosting decision trees, which dominate the tabular domain. To address this, we revisit the cluster assumption, which states that data samples that are close to each other tend to belong to the same class. Inspired by the assumption, we propose Cluster-Aware Self-Training (CAST) for tabular data. CAST is a simple and universally adaptable approach for enhancing existing self-training algorithms without significant modifications. Concretely, our method regularizes the confidence of the classifier, which represents the value of the pseudo-label, forcing the pseudo-labels in low-density regions to have lower confidence by leveraging prior knowledge for each class within the training data. Extensive empirical evaluations on up to 20 real-world datasets confirm not only the superior performance of CAST but also its robustness in various setups in self-training contexts.

Variance Reduced Online Gradient Descent for Kernelized Pairwise Learning with Limited Memory

Authors: Authors: Hilal AlQuabeh, Bhaskar Mukhoty, Bin Gu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.06483
Pdf link: https://arxiv.org/pdf/2310.06483
Abstract Pairwise learning is essential in machine learning, especially for problems involving loss functions defined on pairs of training examples. Online gradient descent (OGD) algorithms have been proposed to handle online pairwise learning, where data arrives sequentially. However, the pairwise nature of the problem makes scalability challenging, as the gradient computation for a new sample involves all past samples. Recent advancements in OGD algorithms have aimed to reduce the complexity of calculating online gradients, achieving complexities less than $O(T)$ and even as low as $O(1)$. However, these approaches are primarily limited to linear models and have induced variance. In this study, we propose a limited memory OGD algorithm that extends to kernel online pairwise learning while improving the sublinear regret. Specifically, we establish a clear connection between the variance of online gradients and the regret, and construct online gradients using the most recent stratified samples with a limited buffer of size of $s$ representing all past data, which have a complexity of $O(sT)$ and employs $O(\sqrt{T}\log{T})$ random Fourier features for kernel approximation. Importantly, our theoretical results demonstrate that the variance-reduced online gradients lead to an improved sublinear regret bound. The experiments on real-world datasets demonstrate the superiority of our algorithm over both kernelized and linear online pairwise learning algorithms.

Self-Supervised Set Representation Learning for Unsupervised Meta-Learning

Authors: Authors: Dong Bok Lee, Seanie Lee, Joonho Ko, Kenji Kawaguchi, Juho Lee, Sung Ju Hwang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.06511
Pdf link: https://arxiv.org/pdf/2310.06511
Abstract Dataset distillation methods have achieved remarkable success in distilling a large dataset into a small set of representative samples. However, they are not designed to produce a distilled dataset that can be effectively used for facilitating self-supervised pre-training. To this end, we propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL). We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is \textit{biased} due to the randomness originating from data augmentations or masking. To address this issue, we propose to minimize the mean squared error (MSE) between a model's representations of the synthetic examples and their corresponding learnable target feature representations for the inner objective, which does not introduce any randomness. Our primary motivation is that the model obtained by the proposed inner optimization can mimic the \textit{self-supervised target model}. To achieve this, we also introduce the MSE between representations of the inner model and the self-supervised target model on the original full dataset for outer optimization. Lastly, assuming that a feature extractor is fixed, we only optimize a linear head on top of the feature extractor, which allows us to reduce the computational cost and obtain a closed-form solution of the head with kernel ridge regression. We empirically validate the effectiveness of our method on various applications involving transfer learning.

The Lattice Overparametrization Paradigm for the Machine Learning of Lattice Operators

Authors: Authors: Diego Marcondes, Junior Barrera
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.06639
Pdf link: https://arxiv.org/pdf/2310.06639
Abstract The machine learning of lattice operators has three possible bottlenecks. From a statistical standpoint, it is necessary to design a constrained class of operators based on prior information with low bias, and low complexity relative to the sample size. From a computational perspective, there should be an efficient algorithm to minimize an empirical error over the class. From an understanding point of view, the properties of the learned operator need to be derived, so its behavior can be theoretically understood. The statistical bottleneck can be overcome due to the rich literature about the representation of lattice operators, but there is no general learning algorithm for them. In this paper, we discuss a learning paradigm in which, by overparametrizing a class via elements in a lattice, an algorithm for minimizing functions in a lattice is applied to learn. We present the stochastic lattice gradient descent algorithm as a general algorithm to learn on constrained classes of operators as long as a lattice overparametrization of it is fixed, and we discuss previous works which are proves of concept. Moreover, if there are algorithms to compute the basis of an operator from its overparametrization, then its properties can be deduced and the understanding bottleneck is also overcome. This learning paradigm has three properties that modern methods based on neural networks lack: control, transparency and interpretability. Nowadays, there is an increasing demand for methods with these characteristics, and we believe that mathematical morphology is in a unique position to supply them. The lattice overparametrization paradigm could be a missing piece for it to achieve its full potential within modern machine learning.

A Parallelized, Adam-Based Solver for Reserve and Security Constrained AC Unit Commitment

Authors: Authors: Samuel Chevalier
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.06650
Pdf link: https://arxiv.org/pdf/2310.06650
Abstract Power system optimization problems which include the nonlinear AC power flow equations require powerful and robust numerical solution algorithms. Within this sub-field of nonlinear optimization, interior point methods have come to dominate the solver landscape. Over the last decade, however, a number of efficient numerical optimizers have emerged from the field of Machine Learning (ML). One algorithm in particular, Adam, has become the optimizer-of-choice for a massive percentage of ML training problems (including, e.g., the training of GPT-3), solving some of the largest unconstrained optimization problems ever conceived of. Inspired by such progress, this paper designs a parallelized Adam-based numerical solver to overcome one of the most challenging power system optimization problems: security and reserve constrained AC Unit Commitment. The resulting solver, termed quasiGrad, recently competed in the third ARPA-E Grid Optimization (GO3) competition. In the day-ahead market clearing category (with systems ranging from 3 to 23,643 buses over 48 time periods), quasiGrad's aggregated market surplus scores were within 5% of the winningest market surplus scores. The quasiGrad solver is now released as an open-source Julia package: quasiGrad.jl. The internal gradient-based solver (Adam) can easily be substituted for other ML-inspired solvers (e.g., AdaGrad, AdaDelta, RMSProp, etc.). Test results from large experiments are provided.

Latent Diffusion Counterfactual Explanations

Authors: Authors: Karim Farid, Simon Schrodi, Max Argus, Thomas Brox
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.06668
Pdf link: https://arxiv.org/pdf/2310.06668
Abstract Counterfactual explanations have emerged as a promising method for elucidating the behavior of opaque black-box models. Recently, several works leveraged pixel-space diffusion models for counterfactual generation. To handle noisy, adversarial gradients during counterfactual generation -- causing unrealistic artifacts or mere adversarial perturbations -- they required either auxiliary adversarially robust models or computationally intensive guidance schemes. However, such requirements limit their applicability, e.g., in scenarios with restricted access to the model's training data. To address these limitations, we introduce Latent Diffusion Counterfactual Explanations (LDCE). LDCE harnesses the capabilities of recent class- or text-conditional foundation latent diffusion models to expedite counterfactual generation and focus on the important, semantic parts of the data. Furthermore, we propose a novel consensus guidance mechanism to filter out noisy, adversarial gradients that are misaligned with the diffusion model's implicit classifier. We demonstrate the versatility of LDCE across a wide spectrum of models trained on diverse datasets with different learning paradigms. Finally, we showcase how LDCE can provide insights into model errors, enhancing our understanding of black-box model behavior.

Approximating Nash Equilibria in Normal-Form Games via Stochastic Optimization

Authors: Authors: Ian Gemp, Luke Marris, Georgios Piliouras
Subjects: Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2310.06689
Pdf link: https://arxiv.org/pdf/2310.06689
Abstract We propose the first, to our knowledge, loss function for approximate Nash equilibria of normal-form games that is amenable to unbiased Monte Carlo estimation. This construction allows us to deploy standard non-convex stochastic optimization techniques for approximating Nash equilibria, resulting in novel algorithms with provable guarantees. We complement our theoretical analysis with experiments demonstrating that stochastic gradient descent can outperform previous state-of-the-art approaches.

A Supervised Embedding and Clustering Anomaly Detection method for classification of Mobile Network Faults

Authors: Authors: R. Mosayebi, H. Kia, A. Kianpour Raki
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.06779
Pdf link: https://arxiv.org/pdf/2310.06779
Abstract The paper introduces Supervised Embedding and Clustering Anomaly Detection (SEMC-AD), a method designed to efficiently identify faulty alarm logs in a mobile network and alleviate the challenges of manual monitoring caused by the growing volume of alarm logs. SEMC-AD employs a supervised embedding approach based on deep neural networks, utilizing historical alarm logs and their labels to extract numerical representations for each log, effectively addressing the issue of imbalanced classification due to a small proportion of anomalies in the dataset without employing one-hot encoding. The robustness of the embedding is evaluated by plotting the two most significant principle components of the embedded alarm logs, revealing that anomalies form distinct clusters with similar embeddings. Multivariate normal Gaussian clustering is then applied to these components, identifying clusters with a high ratio of anomalies to normal alarms (above 90%) and labeling them as the anomaly group. To classify new alarm logs, we check if their embedded vectors' two most significant principle components fall within the anomaly-labeled clusters. If so, the log is classified as an anomaly. Performance evaluation demonstrates that SEMC-AD outperforms conventional random forest and gradient boosting methods without embedding. SEMC-AD achieves 99% anomaly detection, whereas random forest and XGBoost only detect 86% and 81% of anomalies, respectively. While supervised classification methods may excel in labeled datasets, the results demonstrate that SEMC-AD is more efficient in classifying anomalies in datasets with numerous categorical features, significantly enhancing anomaly detection, reducing operator burden, and improving network maintenance.

Enhancing Predictive Capabilities in Data-Driven Dynamical Modeling with Automatic Differentiation: Koopman and Neural ODE Approaches

Authors: Authors: C. Ricardo Constante-Amores, Alec J. Linot, Michael D. Graham
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.06790
Pdf link: https://arxiv.org/pdf/2310.06790
Abstract Data-driven approximations of the Koopman operator are promising for predicting the time evolution of systems characterized by complex dynamics. Among these methods, the approach known as extended dynamic mode decomposition with dictionary learning (EDMD-DL) has garnered significant attention. Here we present a modification of EDMD-DL that concurrently determines both the dictionary of observables and the corresponding approximation of the Koopman operator. This innovation leverages automatic differentiation to facilitate gradient descent computations through the pseudoinverse. We also address the performance of several alternative methodologies. We assess a 'pure' Koopman approach, which involves the direct time-integration of a linear, high-dimensional system governing the dynamics within the space of observables. Additionally, we explore a modified approach where the system alternates between spaces of states and observables at each time step -- this approach no longer satisfies the linearity of the true Koopman operator representation. For further comparisons, we also apply a state space approach (neural ODEs). We consider systems encompassing two and three-dimensional ordinary differential equation systems featuring steady, oscillatory, and chaotic attractors, as well as partial differential equations exhibiting increasingly complex and intricate behaviors. Our framework significantly outperforms EDMD-DL. Furthermore, the state space approach offers superior performance compared to the 'pure' Koopman approach where the entire time evolution occurs in the space of observables. When the temporal evolution of the Koopman approach alternates between states and observables at each time step, however, its predictions become comparable to those of the state space approach.

$f$-Policy Gradients: A General Framework for Goal Conditioned RL using $f$-Divergences

Authors: Authors: Siddhant Agarwal, Ishan Durugkar, Peter Stone, Amy Zhang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.06794
Pdf link: https://arxiv.org/pdf/2310.06794
Abstract Goal-Conditioned Reinforcement Learning (RL) problems often have access to sparse rewards where the agent receives a reward signal only when it has achieved the goal, making policy optimization a difficult problem. Several works augment this sparse reward with a learned dense reward function, but this can lead to sub-optimal policies if the reward is misaligned. Moreover, recent works have demonstrated that effective shaping rewards for a particular problem can depend on the underlying learning algorithm. This paper introduces a novel way to encourage exploration called $f$-Policy Gradients, or $f$-PG. $f$-PG minimizes the f-divergence between the agent's state visitation distribution and the goal, which we show can lead to an optimal policy. We derive gradients for various f-divergences to optimize this objective. Our learning paradigm provides dense learning signals for exploration in sparse reward settings. We further introduce an entropy-regularized policy optimization objective, that we call $state$-MaxEnt RL (or $s$-MaxEnt RL) as a special case of our objective. We show that several metric-based shaping rewards like L2 can be used with $s$-MaxEnt RL, providing a common ground to study such metric-based shaping rewards with efficient exploration. We find that $f$-PG has better performance compared to standard policy gradient methods on a challenging gridworld as well as the Point Maze and FetchReach environments. More information on our website https://agarwalsiddhant10.github.io/projects/fpg.html.

Keyword: super-resolution

There is no result

Oct 11 '23 06:10 zoq

arxiv-updates arxiv-updates copied to clipboard

New submissions for Wed, 11 Oct 23

Keyword: sgd

Augmenting Vision-Based Human Pose Estimation with Rotation Matrix

Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition

Asynchronous Federated Learning with Incentive Mechanism Based on Contract Theory

Correlated Noise Provably Beats Independent Noise for Differentially Private Learning

Keyword: optimization

Augmenting Vision-Based Human Pose Estimation with Rotation Matrix

High Dimensional Causal Inference with Variational Backdoor Adjustment

OptiMUS: Optimization Modeling Using mip Solvers and large language models

Delay-Optimal Service Chain Forwarding and Offloading in Collaborative Edge Computing

Reinforcement Learning in the Era of LLMs: What is Essential? What is needed? An RL Perspective on RLHF, Prompting, and Beyond

Multi-Robot Task Assignment and Path Finding for Time-Sensitive Missions with Online Task Generation

Motion Memory: Leveraging Past Experiences to Accelerate Future Motion Planning

CAT-RRT: Motion Planning that Admits Contact One Link at a Time

Federated Multi-Level Optimization over Decentralized Networks

Transmission Investment Coordination using MILP Lagrange Dual Decomposition and Auxiliary Problem Principle

Spiking PointNet: Spiking Neural Networks for Point Clouds

Distributed Multi-Time Slot Power Balancing Control of Power Systems with Energy Storage Devices

Sample-Efficient Multi-Agent RL: An Optimization Perspective

A Unified View on Solving Objective Mismatch in Model-Based Reinforcement Learning

Bi-Level Offline Policy Optimization with Limited Exploration

Reducing Detailed Vehicle Energy Dynamics to Physics-Like Models

CrowdRec: 3D Crowd Reconstruction from Single Color Images

Mutual Information Metrics for Uplink MIMO-OFDM Integrated Sensing and Communication System

Lo-Hi: Practical ML Drug Discovery Benchmark

Plane Constraints Aided Multi-Vehicle Cooperative Positioning Using Factor Graph Optimization

Approaches to the Algorithmic Allocation of Public Resources: A Cross-disciplinary Review

Memory efficient location recommendation through proximity-aware representation

Self-Supervised Set Representation Learning for Unsupervised Meta-Learning

An Edge-Aware Graph Autoencoder Trained on Scale-Imbalanced Data for Travelling Salesman Problems

Zero-Level-Set Encoder for Neural Distance Fields

Diversity from Human Feedback

A Parallelized, Adam-Based Solver for Reserve and Security Constrained AC Unit Commitment

Approximating Nash Equilibria in Normal-Form Games via Stochastic Optimization

HiFi-123: Towards High-fidelity One Image to 3D Content Generation

Comparing AI Algorithms for Optimizing Elliptic Curve Cryptography Parameters in Third-Party E-Commerce Integrations: A Pre-Quantum Era Analysis

Efficient Graduated Non-Convexity for Pose Graph Optimization

$f$-Policy Gradients: A General Framework for Goal Conditioned RL using $f$-Divergences

Teaching Language Models to Hallucinate Less with Synthetic Tasks

Keyword: adam

A Parallelized, Adam-Based Solver for Reserve and Security Constrained AC Unit Commitment

Keyword: gradient

Fingerprint Attack: Client De-Anonymization in Federated Learning

Learning Layer-wise Equivariances Automatically using Gradients

Provably Accelerating Ill-Conditioned Low-rank Estimation via Scaled Gradient Descent, Even with Overparameterization

DockGame: Cooperative Games for Multimeric Rigid Protein Docking

CAST: Cluster-Aware Self-Training for Tabular Data

Variance Reduced Online Gradient Descent for Kernelized Pairwise Learning with Limited Memory

Self-Supervised Set Representation Learning for Unsupervised Meta-Learning

The Lattice Overparametrization Paradigm for the Machine Learning of Lattice Operators

A Parallelized, Adam-Based Solver for Reserve and Security Constrained AC Unit Commitment

Latent Diffusion Counterfactual Explanations

Approximating Nash Equilibria in Normal-Form Games via Stochastic Optimization

A Supervised Embedding and Clustering Anomaly Detection method for classification of Mobile Network Faults

Enhancing Predictive Capabilities in Data-Driven Dynamical Modeling with Automatic Differentiation: Koopman and Neural ODE Approaches

$f$-Policy Gradients: A General Framework for Goal Conditioned RL using $f$-Divergences

Keyword: super-resolution

arxiv-updates
arxiv-updates copied to clipboard