factorio-learning-environment icon indicating copy to clipboard operation
factorio-learning-environment copied to clipboard

removing superfluous actions

Open kiankyars opened this issue 5 months ago • 27 comments

We need to make a list of the actions which will be removed from the codebase, such as connect entities, because they are not possible to map to human gameplay data. We want to only conserve atomic actions as opposed to actions which themselves contain logic.

kiankyars avatar Jul 07 '25 17:07 kiankyars

If we put an underscore in front of a tool directory (e.g _connect_entities) it will not be visible to the agent.

JackHopkins avatar Jul 07 '25 17:07 JackHopkins

good to know, let's coordinate tomorrow

kiankyars avatar Jul 08 '25 12:07 kiankyars

I tried sorting the agent actions in the codebase and this is what I have:

core actions (must include)

  • craft_item
  • extract_item
  • harvest_resource
  • insert_item
  • launch_rocket
  • move_to
  • pickup_entity
  • place_entity
  • rotate_entity
  • send_message
  • set_entity_recipe
  • set_research
  • sleep

observation actions (include by default)

  • get_entities
  • get_research_progress
  • inspect_inventory
  • score (broaden to all flows)
  • get_resource_patch (maybe just the nearest one?)

observation actions (want to include but not sure how)

  • get_entity (to reuse entities in a program which are referred to over and over)
  • print (for arbitrary stuff)
  • get_prototype_recipe (how do we inform agents of recipes??)

cognitive/hidden actions (must exclude)

  • can_place_entity
  • connect_entities
  • get_connection_amount
  • nearest
  • nearest_buildable
  • place_entity_next_to
  • shift_entity

kantneel avatar Jul 09 '25 10:07 kantneel

Referring to @MortenTobiasNielsen 's API proposal: https://github.com/MortenTobiasNielsen/FLE-API-specification/issues/5

The core actions in FLE fully cover the core actions in the API proposal. To leverage human gameplay we will likely want to support the optional actions (set_priority, set_filter, limit_inventory, equip). I looked into shooting/throwing and this is harder to do when manipulating characters through the modding API. Luckily I think a lot of people also play without biters so we try to work with only peaceful replays.

kantneel avatar Jul 09 '25 10:07 kantneel

I think the next step is for us to work together on the data collected from @hrshtt 's process to analyze if there are any gaps and how to convert events into these actions since there will no doubt be a lot of object dereferencing needed in the events.

kantneel avatar Jul 09 '25 10:07 kantneel

I wanted to think of the actions from the perspective of mutations on the game state.

Game State (S_T)

At any tick T, the full game state can be defined as:

S_T = (𝕄_T, ℰ_T, ℙ_T, Θ_T, ℛ_T, Φ_T, Π_T)

Where each component is:

  • 𝕄_T (Map): Spatial layout, placed entities, tile types.
  • ℰ_T (Entities): All non-resource entities with dynamic properties (position, health, inventory, recipe, circuit state, etc).
  • ℙ_T (Players): Player position, velocity, held item, inventory, mining progress.
  • Θ_T (Tech): Unlocked techs, current research queue & progress.
  • ℛ_T (Resources): Resource entities (ore, oil) and remaining yield (these can probably be represented inside Map but not sure yet).
  • Φ_T (Power): Electric network states (buffer, drain, input/output).
  • Π_T (Pollution): Pollution levels per chunk.

Actions (a_T)

All actions are typed as:

a_T = (domain, verb, args)

Each action is a transformation applied to an object in one of the domains:

Domain Verbs (Examples)
map place(proto, pos, ori), remove(entity_id), build_ghost(...), cancel_ghost
entity set_prop(key, value), set_recipe, rotate, set_filter, set_inventory_limit
player move(pos), set_cursor, transfer_items(src, dst, item, count)
research enqueue(tech_id), dequeue(tech_id), pause

No direct verbs for resources, power, or pollution – they evolve as side effects of other verbs.

So rather than flat actions, actions are defined as the domain they can be applied to.

So pickup and remove entities goes to map rather than entity.

hrshtt avatar Jul 09 '25 12:07 hrshtt

I think the state description makes sense. What is the advantage of having actions segmented by domains? If we want to describe a trajectory as a list of action strings then I would want it to be as easily interpretable for an LLM as possible. I'd default to an action being (verb, args) with the domain being one of the args if it's necessary (doesn't seem strictly necessary for any of them except entity)

kantneel avatar Jul 09 '25 13:07 kantneel

imo keeping domain explicit is a structural prior that:

  1. Reduces effective action entropy, if the true distribution has multiple-modes (map ops vs. inventory ops vs. research ops), modelling it as one flat categorical wastes bits.
  2. Aligns with object-centric world models that already boost RL in visual games.
  3. Will let us give crisp error semantics and future-proof extensibility. As we only need to compare in-category interactions of actions.
  4. Already being utilised exactly for tool calling https://link.springer.com/article/10.1007/s41019-025-00296-9:
Image

hrshtt avatar Jul 09 '25 13:07 hrshtt

Not to state the obvious, but I would do what I suggested a few weeks back. 😅

In this context I don't see the benefit of splitting actions into sub categories.

MortenTobiasNielsen avatar Jul 09 '25 14:07 MortenTobiasNielsen

Not to state the obvious, but I would do what I suggested a few weeks back. 😅

In this context I don't see the benefit of splitting actions into sub categories.

That is fair, but I would want to couple game state and actions regardless and doing this from first principles gives me more clarity.

The domain could simply be syntactic sugar, if not some external structure.

hrshtt avatar Jul 09 '25 15:07 hrshtt

if the true distribution has multiple-modes (map ops vs. inventory ops vs. research ops), modelling it as one flat categorical wastes bits.

I don't imagine us encoding actions as one-hot categories, since the arguments to functions will be arbitrary strings and numbers. Instead I imagine actions as code text. Like I said, all the categories except entity are sort of redundant and can be wrapped into the name - just have enqueue_research instead of research.enqueue.

Will let us give crisp error semantics and future-proof extensibility. As we only need to compare in-category interactions of actions.

Not sure what this means. If a function is called with incorrect argument types or values you get the same error semantics. Maybe I don't know how you intend the domains to be used.

Already being utilised exactly for tool calling

This sort of makes sense but again I think with properly named functions this effect is really minimal if it even exists. If we're training then it certainly won't matter.

kantneel avatar Jul 09 '25 15:07 kantneel

FYI Here is how openai dota2 does it: Image It's a four-dimensional array, with the first element being the primary action, which maps to core actions in our case. And the other three components of the action are in the diagram. Delay exists because the agent takes actions only every 4 frames

kiankyars avatar Jul 10 '25 13:07 kiankyars

You can only really hope to express things in vectorized form if you know a prior the maximum number of entities. In data 2 there are 5 champions per side and a fixed number of other entities of interest. Even with that they have 1.2k categorical values and 14.5k continuous/boolean values to observe. In factorio I just don't think this would be possible given that the number of entities to observe is basically unbounded.

Image

kantneel avatar Jul 11 '25 09:07 kantneel

Yes, that's correct now I realize after reading the whole paper that it's not 1-1 transferable

kiankyars avatar Jul 11 '25 11:07 kiankyars

for FLE: the following observation/helpers probably can be implemented easily:

  • get_research_progress
  • inspect_inventory
  • score
  • get_prototype_recipe

actions that should be replaced by some broader combination of vision + game state representation:

  • get_resource_patch
  • nearest
  • nearest_buildable
  • print
  • get_entity

IMO:

some annotated version of the map would help here, something with the player position, map directions, possible grid.

maybe a combined state query method:

  1. we add two methods, request_map, get_position: a. request_map: returns a broad annotated image of the nearby playing area b. get_position: method that can take in resource name, water body name, etc.
  2. LLM would then know what to request based on the nearby entities and resources.
  3. Now LLM has access to most positions on the map

https://github.com/hrshtt/factorio-data-collector/issues/4#issue-3239228486

hrshtt avatar Jul 17 '25 11:07 hrshtt

‘render’ already exists to get a visual representation of the map.Sent from my iPhoneOn 17 Jul 2025, at 12:36, Harshit Sharma @.***> wrote:hrshtt left a comment (JackHopkins/factorio-learning-environment#257) for FLE: the following observation/helpers probably can be implemented easily:

get_research_progress inspect_inventory score get_prototype_recipe

actions that should be replaced by some broader combination of vision + game state representation:

get_resource_patch nearest nearest_buildable print get_entity

IMO: some annotated version of the map would help here, something with the player position, map directions, possible grid. maybe a combined state query method:

we add two methods, request_map, get_position: a. request_map: returns a broad annotated image of the nearby playing area b. get_position: method that can take in resource name, water body name, etc. LLM would then know what to request based on the nearby entities and resources. Now LLM has access to most positions on the map

hrshtt/factorio-data-collector#4 (comment)

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

JackHopkins avatar Jul 17 '25 11:07 JackHopkins

then render + a combined querying method should be good enough for traces to also record observations. but the replay and the headless renders would need to possibly look the same

but i still think annotations would be needed for a image to let the LLM know what resource is what

hrshtt avatar Jul 17 '25 11:07 hrshtt

the render tool does have some annotations which are super helpful for a zero-shot agent. See the side-by-side images in the "Vision Agents" section of https://jackhopkins.github.io/factorio-learning-environment/0.2.0.html .

With vision fine-tuning we may not strictly need annotations if the multimodal model is trained to understand what furnaces, belts and other factorio entities look like. Jack's current work on visual observations includes making the rendering faithful to factorio by using the real sprites for game entities.

kantneel avatar Jul 17 '25 12:07 kantneel

ahh yes ive seen the render. but im saying that for traces from replays we have immediate access to screenshots from the game directly already, so we may want to come to an agreement on a usage + representation for observation so i can incorporate it into the existing logging scheme.

hrshtt avatar Jul 17 '25 12:07 hrshtt

I reckon we should standardise around using the new renderer for consistency. What do you think?Sent from my iPhoneOn 17 Jul 2025, at 13:22, Harshit Sharma @.***> wrote:hrshtt left a comment (JackHopkins/factorio-learning-environment#257) ahh yes ive seen the render. but im saying that for traces from replays we have immediate access to screenshots from the game directly already, so we may want to come to an agreement on a usage + representation so i can incorporate it into the existing logging scheme.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

JackHopkins avatar Jul 17 '25 12:07 JackHopkins

Yes, we should use the new renderer. We just need to figure out what the center and zoom level should be to line up with the player's view when they were doing a replay. That should be given as arguments to the renderer and we also use it to pre or post-filter the entities to send to the renderer.

kantneel avatar Jul 17 '25 12:07 kantneel

that works, i just wished we could use the client for this as the existing rendering pipeline is so much more reliable.

there is a POC here to convert the client binary into a headless server and try to use the rendering pipeline so we can have best of both worlds (free rendering and scalable docker images) unless we have tried it before

hrshtt avatar Jul 17 '25 13:07 hrshtt

yes that would be ideal for sure. To my knowledge no one on the team has attempted to convert the client binary into a headless server and try to use the rendering pipeline.

kantneel avatar Jul 17 '25 13:07 kantneel

tried it out, its not possible to get screenshots without a connected player/client on running the client with --start-server.

hrshtt avatar Jul 17 '25 15:07 hrshtt

we can use a connected client as a screenshot bot, but @JackHopkins 's renderer probably scales better with experiements

hrshtt avatar Jul 17 '25 15:07 hrshtt

we can use a connected client as a screenshot bot, but @JackHopkins 's renderer probably scales better with experiements

You're saying it scales better because if it was using a bot that would mean one client per container

kiankyars avatar Jul 17 '25 15:07 kiankyars

tried it out, its not possible to get screenshots without a connected player/client on running the client with --start-server.

Yeah this is a known limitation.

Having a renderer that operates headlessly will make it easier to scale.

JackHopkins avatar Jul 17 '25 15:07 JackHopkins