如果我有所动作 它们会有相应的防御措施
So, if I go for a move, they're going to move to defend that.
你想超车时它们会阻拦你吗
They can try and block you as you try to overtake?
是的 它们对我的对策做出了反应
Yeah, they react to my reactions,
所以我必须再次做出反应
so then I have to react again.
与之对抗让游戏变得
And it just makes it so much more fun
更加有趣
to be able to play against that
因为会有无限可能
because there's endless possibilities then.
如你所见 它们移到了右手边
As you can see, they're going to the right-hand side.
现在真的很有趣
It's just really good fun now,
因为我基本上已无路可走 -好吧
cos now I've got nowhere to go, essentially. - Ok!
马丁 我注意到
Now, Martin, I can't help but notice
你是穿袜子
you appear to be playing this game
玩这个游戏的
wearing a pair of socks.
这是玩GT的专业技巧吗
Is that a pro tip for playing Gran Turismo?
这是专业技巧
It is a pro tip.
把鞋脱了 穿着袜子玩
So take your shoes off, play in your socks.
对踏板的感觉会更灵敏
You get a more sensitive feel on the pedal
比赛时的反应会更快
and it allows you to just race that a little bit quicker.
听从专家的建议 孩子们
OK, listen to the experts, kids.
谢谢你 马丁
OK. Thank you, Martin.
考什克 感谢你来为我们展示该游戏
Thank you, Kaushik, for coming and showing us this game.
谢谢
Thank you.
人工智能是如何变得如此优秀的呢
How did the AI get so good?
我们听考什克提到"强化学习"
Well, we heard Kaushik mention reinforcement learning.
这就是考什克
That's what Kaushik
及其团队训练GT索菲的方法
and his team used to train GT Sophy.
那么什么是强化学习 它又是如何工作的
So what is reinforcement learning and how does it work?
为了解释这一点 我需要些帮助
Now, to explain, I need some help,
但我的下位嘉宾有点紧张
but my next guest is a little bit nervous.
当她进来时 请大家默默鼓掌
So, when she comes on, please, let's just have silent applause.
请保持安静 别吓到她
Please keep it quiet. Don't alarm her.
下面请欢迎 凯特琳和她的狗弗蕾亚
So, please, let's welcome Caitlin and her dog, Freya.
你们好
Hello!
你好 凯特琳
Hello, Caitlin.
你好 弗蕾亚 -你好
Hello, Freya. - Hi.
欢迎来圣诞讲座现场
Welcome to the Christmas lectures.
谢谢
Thank you.
凯特琳 你和弗蕾亚
So, Caitlin, how long have you been
一起工作多久了
working with Freya?
弗蕾亚和我一起工作了大概三年
So, Freya and I have been working together for about three years.
在我受训成为驯犬师时 她的妈妈
Her mum kindly let us work together when I was training
非常好心的让我们一起工作 现在...
to be a puppy school tutor, and now...
...刚开始进行技巧训练
...just started some trick training together.
所以她很爱叫
So she very much likes to use her voice,
我向大家的耳朵道歉
so I apologise for the ears.
好
Ok.
今天你要给我们展示什么 凯特琳
So what are you going to show us today, Caitlin?
你如何训练弗蕾亚吗 -对
How are you going to show us how you train Freya? - Yes!
今天弗蕾亚会给我们展示
So, today, Freya is going to show us
如何从一个人的口袋里
how to pickpocket someone
取出一方手帕
with a handkerchief.
我们要做的是利用正面强化法
What we're going to do is use positive reinforcement
告诉她 取到手帕时
to tell her that when she pulls on the handkerchief,
她就会得到奖励
then she gets a reward for that.
所以我会用一个"口令"
So I use something called a marker.
当我告诉她去取物时 会告诉她"对"
So, when I tell her to get it, I'll tell her yes
然后奖励她的行为
and reward her for what she's done.
那就是正确的做法
That's the correct thing to do.
所以我会让她去取物
So I'm going to tell her to get it.
对 好姑娘 好
Yes! Good girl! Ok.
非常好
Very nice.
首先我要让她对此感兴趣
So first I want to get her interested in this.
对 -好 我觉得她感兴趣了
Yes! - OK, I think she's interested.
很好
Nice!
我的方法就是物品变得有趣
So the way I do that is just make it interesting like this.
如果她拉走 对
If she pulls on it... Yes!
然后给她奖励
And then give her that reward.
随后我会增加一个信♥号♥♥
And then I want to add a cue to it.
比如...弗蕾亚 去取
So a little... Freya, get it!
对 好姑娘
Yes! Good girl.
再给奖励
And then reward.
就是这个模式 -对
So it's just following that pattern. - Ok.
看起来奖励对弗蕾亚很重要
Now, rewards seem to be really important to Freya.
那是她学习的重要部分吗
Is this a big part of how she learns?
没错 有奖励就重复的正反馈过程
Absolutely. So what gets rewarded gets repeated.
她做了我想让她做的 我就奖励她
So the more I reward her for the thing I want her to do,
她就越可能再去做这件事
the more it's going to happen.
我要做的还有保证
But what I want to make sure of
要让她听口令做动作
is that she's also responding to the cue.
所以她这么拉手帕 我不会给奖励
So, when she pulled it there, I didn't give her the reward
因为我想让她听到口令再行动
because I want her to respond to that to get it.
因为刚才你没给口令
Because you hadn't given her the cue.
没错 是的
Exactly, yeah.
现在最后一步就是口袋取物
And now the last part of this is the pickpocket part,
这一步需要你的协助 -好
which I will need your assistance. - Ok.
可以吗 -可以
If that's OK? - Yes.
我会把这个给你
So I'm going to give you that.
请把手帕搭在你裤子后面口袋
And if you just pop that in your back pocket for me.
弗蕾亚 来 来
Freya, come, come!
好姑娘 坐下 好
Good girl! Sit! Ok.
很好 别动 -为了圣诞讲座我豁出去了
Nice. Stay. - The things I do for the Christmas lectures.
弗蕾亚 去取那个
Freya, get it!
好 做得好
Yes! Good job!
很好
Very nice!
弗蕾亚
Freya
弗蕾亚 你比我的狗聪明多了
Freya, you're a lot smarter than my dog,
我必须得说
I have to say.
非常棒
Well done.
非常感谢 凯特琳 谢谢你 弗蕾亚
OK, thank you so much, Caitlin. Thank you, Freya.
大家要记得 礼貌而克制的掌声
Remember, everybody, polite quiet applause, please.
非常感谢 谢谢你
Thank you so much. Thank you!
来 弗蕾亚 我们走
Come on, Freya! Let's go!
她还不想走
She doesn't want to leave!
弗蕾亚是怎么学习的
So how is Freya learning?
每次完成一项任务 她就会被奖励
She's given a treat every time she does a task well.
在人工智能界 这被称作强化学习
And in AI, we call that reinforcement learning.
上场讲座 我们看到人工智能如何从数据
In the last lecture, we saw how AI learns from data,
从训练数据中学习
from training data.
强化学习也很类似
Now, reinforcement learning is similar,
但训练数据在该情境中以奖励的形式出现
but the training data in this case comes in the form of rewards.
强化学习对《跑车浪漫旅》这类游戏
And reinforcement learning is really good for games
也大有益处
like Gran Turismo
因为大多数游戏都计分
because most games have a score
我们可以用积分作为奖励
and we can use scores as the reward.
当人工智能得一分 就是一个奖励
When the AI scores a point, that's a reward.
人工智能的设计就是学习
And what the AI is designed to do is to learn
如何将其奖励最大化
how to maximise its reward,
将其得分最大化
to maximise its score,
以最快最可能的途径获得最多积分
to get as many rewards as possible as quickly as possible,
就像弗蕾亚获得她的奖励那样
just like Freya getting her treats.
我们可以用神经网络来完成学习
And we can use neural networks to learn all that.
这被叫做深度强化学习
That's called deep reinforcement learning.
但有时强化学习
But sometimes reinforcement learning
不能给出我们期望中
might not give us the outcomes
想要的结果
that we were hoping for,
我们将通过一个游戏展示这一点
and we're going to have a game to illustrate this idea.
我需要一名志愿者
Can I have a volunteer, please?
那位穿绿色衣服 对 请走下来
In the green there? Yeah, you come down.
对 你被选中了
Yes, you've been selected!
请下来 站在这里 你叫什么 -妮维雅
Come on down. Just stand here. What's your name? - Nivea.
电影精选列表