factorio-learning-environment icon indicating copy to clipboard operation
factorio-learning-environment copied to clipboard

Features for general-sum multi-agent experiments

Open kantneel opened this issue 6 months ago • 10 comments

The features we've discussed so far:

  1. Inter-agent inventory trade. To be implemented with tools:
propose_trade(offered_items: dict, desired_items: dict, agent: Agent) -> TradeOffer
respond_to_trade_offer(trade_offer: TradeOffer, accept: bool)
  1. Pollution tracking. Need to get this information from the environment as another kind of "flow" similar to power and resource throughput. Add support in tasks to add constraints/penalties on pollution.

  2. Differential agent production capabilities. The simplest idea is to have an environmental effect where when an agent picks up a stack of items, a certain fraction of those items will be vaporized, effectively altering their resource production capability. Different agents will have this applied for different entities so it achieves of differential capabilities. Implementing this is the hardest of the features and still could have some unforeseen challenges.

kantneel avatar Jun 12 '25 05:06 kantneel

These are interesting feature additions. This is my take on it.

  1. Inter-agent inventory trade. This has a lot of potential for interesting interactions. I think the proposed solution should be relatively achievable. I suggest that a simple transfer_items function is added to LUA, which can then be used for all sorts of interesting interactions implemented in Python. This will shift a lot of the responsibility from the environment (Factorio) onto the integration, but that might not be a problem.

  2. Pollution tracking. There are a few functions in the Factorio API which is interesting: LuaSurface exposes get_pollution, get_total_pollution, and for resetting clear_pollution and pollute. Furthermore, LuaGameScript exposes pollution_statistics, and I guess this should be the first one to try. I do though have a feeling that this is for the entire game, which wouldn't serve our purpose if we e.g. want pollution per team or per agent - so they e.g. can trade CO2 quotas or be taxed in some way.

  3. Differential agent production capabilities. I think some sort of special character prototype would be beneficial here. Where FLE exposes different types of characters which can be defined in the config of the experiments. A special character could be an extractor which is good at building mining infrastructure, and an electrical engineer which is good at power production, a scientist which excels at research, etc.

On the other hand, this could be extremely frustrating if a researcher wants to expose a specific type of (dis)ability in some way, given that they would only be able to use the predefined FLE characters unless they want to dig deep into the weeds of Factorio Lua coding.

MortenTobiasNielsen avatar Jun 12 '25 07:06 MortenTobiasNielsen

Differential agent production capabilities. I think some sort of special character prototype would be beneficial here. Where FLE exposes different types of characters which can be defined in the config of the experiments. A special character could be an extractor which is good at building mining infrastructure, and an electrical engineer which is good at power production, a scientist which excels at research, etc.

I like this idea, I had also thought of a hierarchy of agents (manager, worker, etc.,) as a research direction.

kiankyars avatar Jun 13 '25 09:06 kiankyars

Yeah, that would also be pretty neat. :)

MortenTobiasNielsen avatar Jun 13 '25 11:06 MortenTobiasNielsen

This description pretty much captures everything of interest, thanks Neel!

Clarification on a potential confusion: for differential agent production capabilities, this is intended to bottleneck growth rate rather than production throughput, if not trading.

  • You may notice the "destroy on pickup" scheme would not affect intermediates e.g. circuits. Only entities that need to be picked up then placed, e.g. factory entities like belts, smelters, miners, power poles, etc. would be affected by this, which is sufficient for our use case.
  • Why make trade affect growth rate instead of production throughput: in real life, trade affects both, but bottlenecking growth rate is a lighter penalty for not trading while still mattering a lot. You could go either way on this; the reasons for this are numerous but weak.
    • LLMs could struggle more if they were too dependent on trade, or be incapable of rejecting trade when necessary.
    • Aesthetically, Factorio players might find it annoying to see large portions of factories being useless/unbalanced if not trading (or OTOH, trading intermediates is useless if factories are balanced around assuming no trade)
    • It's harder to implement throughput bottlenecking: you need to allow trade to chests rather than inventory so characters don't need to walk around dumping intermediates into buffer chests.

SimpleGeometry avatar Jun 13 '25 17:06 SimpleGeometry

Do you mean that you would like for the feature to be limited to a "when something is picked up some of it might be destroyed" functionality. It would be fairly normal to pick up different types of intermediaries, we could though say that the feature doesn't affect those.

It is unclear to me if what you seek to achieve is best covered by this type of feature.

MortenTobiasNielsen avatar Jun 13 '25 18:06 MortenTobiasNielsen

Any implementation for differential production capabilities is good, so it need not be this one if another is easier. It's fine for the feature to apply for intermediates, though we don't need to use that capability. I mainly brought this up in case people had concerns about the destruction not working for intermediates which aren't picked up, which I argue is ok.

SimpleGeometry avatar Jun 13 '25 18:06 SimpleGeometry

I would like to provide you with the best possible feature, and I therefore hope you will spend the time giving an example of the type of interaction you would like to see, e.g. agent 1 does this, then agent two does this to achieve that, etc. I think I understand the feature you request, but it is unclear (to me) whether it will give you the interaction you seek.

MortenTobiasNielsen avatar Jun 13 '25 20:06 MortenTobiasNielsen

I appreciate that! A more concrete example: agent 1 can build miners, smelters, and belts at standard speed, while it costs them 8x more and takes 8x longer to build inserters, assemblers, and labs (or, they are produced as normal, but if they pick up those entities into their inventory from a chest or from a belt, 7/8 get destroyed). Vice versa for agent 2.

Then they are strongly encouraged to trade with one another, though in case they stop trading, existing setups work as usual, and they may still slowly expand in aspects where they are disadvantaged.

SimpleGeometry avatar Jun 13 '25 20:06 SimpleGeometry

Is this separate from feature 1, or do feature 1 and 3 work together with the objective of the agents to realize that it would be beneficial for them to trade and then see how they interact when they do?

MortenTobiasNielsen avatar Jun 13 '25 20:06 MortenTobiasNielsen

Yeah, feature 3 (my previous posts) is meant to support feature 1, i.e. incentivize trading. I think either feature 1 or feature 3 on its own is not very useful for a general-sum MARL environment.

SimpleGeometry avatar Jun 13 '25 21:06 SimpleGeometry