ml-explorer-drone DroneAgent explanation

Hello, I have come with concrete questions about the DroneAgent. Firstly, thanks again for providing such a great example of how we can use MLAgents for drone exploration. I have been investing a lot of time trying to understand how the code works and see potential improvements. I have come up with a few questions in the DroneAgent script.

// why does the position need to be divided by the leaf node size in GetVector3Int?
    
// if drone is in new position add it to Data
// QUESTION: why drone also adds the scanpoint ?
// check how many new leaf nodes in scan
// QUESTION: reward = nodecount / look radius, whats the intuition here?

// QUESTION: what is stepUpdate?
// drone has a penalty for lingering in the same gridblock
// QUESTION: drone gets a penalty for promixity (why multiplied by velocity?)

// observations
// QUESTION: why obs "linger -1f"
// QUESTION: how does the obs proximity bring anything to the drone?
// QUESTION: what is proximity.w * 2f - 1?
// QUESTION: what are the IntersectRatios
// QUESTION: What is the scanBuffer? the whole scanned area?
// QUESTION: what is the point of "LookRadiusNorm" as an observation?
// QUESTION: how do the above observations motivate the drone to explore better? i think that it could already scan
// very well with half of all these observations?

public override void CollectObservations(VectorSensor sensor)
    {
        Vector3 pos = Drone.Position;
        if (IsNewGridPosition(pos))
        {
            Data.AddPoint(new Point(PointType.DronePos, pos, Time.time));
        }

        Data.AddPoint(scanPoint);
        // Number of new leaf nodes created by this scan.
        int nodeCount = Data.Tree.Intersect(pos, scanPoint.Position);
        float scanReward = (nodeCount * 0.1f) / Data.LookRadius;
        AddReward(scanReward);
        
        Data.StepUpdate(pos);

        float linger = lingerCount / 100f; // 0 - 2
        float lingerPenalty = -linger * 0.1f;
        AddReward(lingerPenalty);

        Vector3 velocity = Drone.VelocityNorm;
        Vector4 proximity = Drone.GetForwardProximity();
        float proxPenalty = (1f - 1f / Mathf.Max(proximity.w, 0.1f)) * velocity.sqrMagnitude * 0.25f;
        AddReward(proxPenalty);

        sensor.AddObservation(linger - 1f); // 1
        sensor.AddObservation(velocity); // 3 
        sensor.AddObservation((Vector3)proximity); // 3
        sensor.AddObservation(proximity.w * 2f - 1f); // 1 
        sensor.AddObservation(Data.LookRadiusNorm); // 1 
        sensor.AddObservation(Data.NodeDensities); // 8
        sensor.AddObservation(Data.IntersectRatios); // 8 
        sensor.AddObservation(Drone.ScanBuffer.ToArray()); // 30
    }

and lastly, at onActionReceived()

// QUESTION: when would the world.stepUpdate return true?
   if (!World.StepUpdate())
        {
            OnEpisodeBegin();
        }

I see

        // Depending on settings, there might be holes the drone can slip through.
        return blocks2D[dronePos].InnerBounds.Contains(drone.LocalPosition);

does this mean that the world ends at some point?

Jun 14 '21 14:06 Ademord

Hi @Ademord! Sorry, but I can't go into much more detail other than what I've written on youtube, without having to reverse engineer the code myself ;) Thing is, I made this project over two years ago and it was among my very first experiments with Unity and ML-Agents. There are probably a lot of things I'd do differently now. That's why I haven't put a readme page on the repo, I really just kept it up here for posterity. I'd like to update the project at some point though. Ideally, I would want to figure out how agents could observe spatial data and process it using CNNs (I started a forum thread on this topic, feel free to post any thoughts you might have on it!) Back then, I added somewhat crude observations, basically just scan densities for the adjacent octants, giving the agent some rough sense of preferred direction. However, the basic rewarding approach would likely stay the same: assign rewards proportional to the number of spatial nodes the agent has detected. I think you're right about there being unnecessarily many rewards - if the agent knows it can score by moving around and pointing its ray into different directions, it might not need a linger penalty after all. If I remember correctly, I used the buffer to store and observe the last couple of ray directions, so that the agent knows where it has already looked.

Jun 15 '21 15:06 mbaske

thanks, this week i will decide how i move forward with my thesis and either if i continue with an approach similar to what you do or a variant with some influence from it. i'll keep you posted here and on the forum, let me know if u have a discord also.

Jun 15 '21 16:06 Ademord

hello @mbaske i just found out about your gridsensor and im gonna read the docs tomorrow to see if maybe i can use it in any way.

i was wondering if you knew a way i could get depth info and/or point cloud info from a game camera? i am trying to teach an ML agent to do some scene reconstruction (kinectfusion maybe?), and i cant seem to get past this wall of "how to get depth / PC info in unity". I found the XR kit has a subsystem for this but idk how to get that info and pass it to an agent either... any help would be gold.

Jun 17 '21 22:06 Ademord

I'm currently updating the grid sensor for ML-Agents release 2. Should be done sometime next week. It's possible to get depth info from a camera, using depth textures https://docs.unity3d.com/2020.1/Documentation/Manual/SL-DepthTextures.html My guess is you would need to write a shader and send its output to the render texture sensor https://docs.unity3d.com/Packages/[email protected]/api/Unity.MLAgents.Sensors.RenderTextureSensor.html Both approaches have the same limitation though, namely that observations are always just projections of a 3D scene. That's why I was hoping CNNs would support more dimensions at some point, so they could detect features in spatial data. If that were possible, one should be able to do something like lidar sensing with raycasts, store the detected points, and then feed all of them to the CNN. Found this overview listing some methods for handling spatial data with deep learning https://arxiv.org/pdf/2006.12567.pdf

Jun 18 '21 07:06 mbaske

~~A bit of a silly question from my side, why would I need to send the depth to a render texture sensor ?~~

~~In my first iteration I will not send spatial data to the agent (only the raycasts he sees).~~

~~But I think you propose to add a RenderTextureSensor to an agent for it to collect observations or... ? i am missing some detail in your explanation here i think.~~

Jun 18 '21 16:06 Ademord

~~@mbaske also, another silly question, when I camera > get depth > pass to ICP to create point cloud > where do I reward the agent? onActionReceived or on Update after moving ?~~ ~~In some very basic pseudocode my reward would be~~ ~~addReward(get_reward_from_number_of_points(newPointCloud - oldPointCloud)) # proportional to how many new points somehow~~

Jun 18 '21 20:06 Ademord

@mbaske i figured out how to solve the problem without using depth and point clouds, please ignore all my previous questions. im looking forward to the gridsensor for release2!

Jun 18 '21 23:06 Ademord

Quick question, what is the point of adding the observation sensor.AddObservation(linger - 1f); // 1 compared to just adding linger as an observation?

Sep 07 '21 14:09 Ademord

The value for linger is between 0 and 2, so subtracting 1 normalizes it for observations -1/+1.

Sep 07 '21 14:09 mbaske