3 thoughts on “3-1 Julia Freytag. Cooperative Behaviors in Sequential Social Dilemmas using Multi-Agent Reinforcement Learning”
Colm Connaughton
Thank you Julia – this is really nice work. I would be interested to hear some more of the technical details at some point, particularly whether you have ways of directly optimising the time average in the RL setting? The time average doesn’t seem to fit neatly into the framework of the Bellman equation.
If I understood correctly, then this is an exploration of a space of finite time and finite ensemble. If learning is based on averages obtained over short times but large ensembles (infinite if it’s really expectation values) then cooperation is not identified as a beneficial behavior. If, on the other hand, learning takes place over a long time in a small ensemble, then the benefit of cooperation is identified, and agents will cooperate.
It’s great to see this in actual machine learning algorithms, as I was always suspicious of the expectation value operator in the Bellmann equation. It seems that this can really mislead the algorithms. It’s easy to imagine serious problems in Monte Carlo simulations.
One technical question: at 8:26 in your presentation in the right panel, I see mostly blue but with vertical stripes of green. Are these stripes sudden global breakdowns of cooperation? Why do they happen? I may well be misinterpreting this figure.
Finally, is any of this published in a paper?
Thanks again!
Thank you Julia – this is really nice work. I would be interested to hear some more of the technical details at some point, particularly whether you have ways of directly optimising the time average in the RL setting? The time average doesn’t seem to fit neatly into the framework of the Bellman equation.
I enjoyed this talk a lot. Thank you, Julia!
If I understood correctly, then this is an exploration of a space of finite time and finite ensemble. If learning is based on averages obtained over short times but large ensembles (infinite if it’s really expectation values) then cooperation is not identified as a beneficial behavior. If, on the other hand, learning takes place over a long time in a small ensemble, then the benefit of cooperation is identified, and agents will cooperate.
It’s great to see this in actual machine learning algorithms, as I was always suspicious of the expectation value operator in the Bellmann equation. It seems that this can really mislead the algorithms. It’s easy to imagine serious problems in Monte Carlo simulations.
One technical question: at 8:26 in your presentation in the right panel, I see mostly blue but with vertical stripes of green. Are these stripes sudden global breakdowns of cooperation? Why do they happen? I may well be misinterpreting this figure.
Finally, is any of this published in a paper?
Thanks again!
Could you use this framework to learn an optimal behavior for predators in a Lotka-Volterra model (ie not overprey)?