Are you a Baloo or a Bagheera? Explore vs Exploit Concept - FutureIQ
4,754 views
Wait, is this logic right? •
Nov 08, 2024
Slog Reference: Baloo vs Bagheera
Description
Are you a Baloo or a Bagheera? Sounds like a very silly question but has a deeper meaning to it. Should you try to rise in your current company or explore other companies? Should you get better at Java programming or pick up Machine Learning? All these decisions are related to the explore-exploit concept. We take up real-life scenarios and explain this dynamic concept. When should you switch from exploring to exploiting? This is answered by the epsilon fixed algorithm. Watch the video till the end to understand this algorithm and see what works best for you.
You might also like:
Embrace Failure: https://youtu.be/OAAYnb9LUsU
Make Brilliant Decisions: https://youtu.be/aQg7dAJWqyk
why you are addicted to shopping: https://youtu.be/lEvzk05XzOg
Click below to buy this book: Million Dollar Weekend by Noah Kagan
https://tapthe.link/milliondollarweekend
Hope you enjoyed FutureIQ by Navin Kabra and Shrikant Joshi. Do hit us up on Twitter:
@ngkabra http://twitter.com/ngkabra
@shrikant https://twitter.com/shrikant
Listen it on the podcast provider of your choice: https://tapthe.link/FutureIQRSS
Watch other episodes of The FutureIQ podcast: https://www.youtube.com/playlist?list=PLAppTB0r5_TaYueZ0adD42Wiw5X-wTE4v
Chapters:
00:00 Are you Baloo or Bagheera
00:40 What if I am both?
05:41 When is the right time to switch over?
12:36 Restrictions on exploration?
15:03 Psychology of explore-exploit concept
#futureiq
You might also like:
Embrace Failure: https://youtu.be/OAAYnb9LUsU
Make Brilliant Decisions: https://youtu.be/aQg7dAJWqyk
why you are addicted to shopping: https://youtu.be/lEvzk05XzOg
Click below to buy this book: Million Dollar Weekend by Noah Kagan
https://tapthe.link/milliondollarweekend
Hope you enjoyed FutureIQ by Navin Kabra and Shrikant Joshi. Do hit us up on Twitter:
@ngkabra http://twitter.com/ngkabra
@shrikant https://twitter.com/shrikant
Listen it on the podcast provider of your choice: https://tapthe.link/FutureIQRSS
Watch other episodes of The FutureIQ podcast: https://www.youtube.com/playlist?list=PLAppTB0r5_TaYueZ0adD42Wiw5X-wTE4v
Chapters:
00:00 Are you Baloo or Bagheera
00:40 What if I am both?
05:41 When is the right time to switch over?
12:36 Restrictions on exploration?
15:03 Psychology of explore-exploit concept
#futureiq
Related Slog Matches
Baloo vs Bagheera
56.00
Date Proximity
Transcript
Are you a Baloo or a Bagheera like in Jungle Book? Bagheera the panther was very focused on getting Mowgli to the Village whereas Baloo was much more chilled and wanting to explore and do this and do that but also get things done along the way so what it means is for you like in your current company should you try to rise or should you explore other companies right should you get better at what you are doing right now like Java progam pramming or should you try to jump to machine learning and Ai and all over your life not just career you have to take this decision right question yeah aren't we technically both like there are times when I
feel like a Balo especially during the weekends and then there are times when I feel like a B I want to get this task done I want to get this achievement I want to get this uh uh progress in my career correct so in fact let me ask the question more precisely right when should you be a Balo and when should you be a bagira right and this is a very well-known well studied problem called the explore exploit conflict okay Balo wants to explore you know wants to check out all the different restaurants in town bagira wants to exploit bagira has found one good restaurant and bagira just goes there every time because bagira wants to maximize enjoyment right right so which
one should you be that's that's a question uh I would ideally if I'm really really hungry go bagira and I'm like I'm going to the restaurant I'm eating and then I'm then I'll think of everything else but if I have time I'll go Balo and you know explore restaurants and whatnot is there is there like a rule of thumb when I should do which one I me so you have hit upon the simplest rule of thumb which is that if you don't have enough time and you already know something right then you should exploit just go for it you don't have time to explore right whereas when you have lots of time then first you do exploration right ah more broadly the rule
of thumb could be that in the initial part of any new domain or any new project or any new area of expertise right do exploration and then later on once you have a bunch of candidates and you have found a good one then you do exploit right for example if you ever been in a company Innovation seminar thing they will talk about you know first there is just brainstorming exploration without judgment and only then you start narrowing down on the ideas right okay but this is simplistic right life is more uh complex than that yeah more generally you can say something like if the cost of exploration is low then you should explore okay whereas if a cost of making a mistake is
high then exploit then exploit you shouldn't be exploring right so restaurants it doesn't cost you much to uh go to a new restaurant if you go to new restaurants regularly correct but if you are like my mom who goes to a restaurant once a year then you probably want to exploit look new restaurant sometimes might not agree well with your system so that's that's a cost of exploration as you said so the two points here is that uh depending on the cost you explore or exploit and the second is this is different for different people and your stage in life right right say you're talking about a degree if you pick the wrong degree the cost is
reasonably high so you can't go around exploring a lot but for example liberal arts colleges in India have started giving you the choice where you can explore without it having a high cost you spend the first one and a half year exploring before you decide which one branch to exploit right yeah makes sense makes sense another possibility to think about is that when uncertainty is high you don't know which one is a good one to exploit so you have to explore right because even if you think this is good today right I mean you say that well you know JavaScript programming is the future but the world is changing so fast especially with AI coming in that the uncertainty might
kill you you might be stuck in the wrong exploit right so if uncertainty is high you have to have explore going on at the same time right which is why you constantly keep saying that start exploring chat GPT right now yes yes and third thing is that if the rewards are highly uneven right as in something might give you such a tiny return and another might give you a huge return Then explore makes sense to have some chance of finding the high return thing this is where we did an entire episode on how to think like a VC where 10 different ideas nine of them fail and 10th one gives you enough of a return that it
makes up for all the other n in a sense VCS are actually exploring more than exploiting absolutely absolutely it's just that you know when out of their 10 companies one really starts to take off right then they might choose to exploit in the sense that they might invest in later stages of that company usually VC's uh invest in early stage companies that is explore exploit is when you uh go for the IPO and things like that right this entire discussion sort of tells me that there is a point where you switch from explore to exploit is that right no so it can be more complex than that but let me come to that right before that I've
been talking in terms of cost right but the cost doesn't mean only cost in terms of money it could be cost in terms of effort or it could be cost in terms of time right so for example example if you gone on a vacation should you go to just like the five places that are world famous or should you try out a whole bunch of back alleys and so on depends on whether it's a 5day vacation or a 3 week vacation or you're there for 6 months and also depends on like you said the person himself because there are people who might actually only explore the back alleys even though they have a 5-day vacation is that possible possible
right so yeah um their cost function is different yeah right but also one thing I do want to point out here it's a bit of a distraction is that we did an episode called make 10e plans don't make one-ear plans um which means that if you make a 10-year plan you have much more room to explore correct so in general I like saying that but of course you have to exploit otherwise you will end up being poor and sad right so but in those 10 year plans you essentially have accounted for the cost of time yes yes absolutely now coming back to your question about what is the right time to switch over right turns out that this is a
very well studied problem in mathematics okay it's called the multi-armed Bandit problem multi-armed Bandit yes if you have ever been to a casino you would have seen a slot machine right yes you put money in you pull the arm and the money goes away no well supposed to win money but largely those are terrible machines okay they return very little so that is why it's called the one armed Bandit it is just stealing your money okay now imagine that there are 10 different one armed Bandit standing in front of you they are made by different companies okay so one of them returns 95% of the money one returns 87% of the money one returns 99.8% of the money and so on right
so you want to maximize your returns you're still losing money but whatever returns you're trying to maximize yeah right oh let us just assume that there are there is one that returns 110% and one that returns 150% also just for the sake of maths right yeah so now you have these 10 machines in front of youuh and you want to maximize your return what do you do right uh you realize that this is exactly the same problem we've been talking about you have to first explore and then when you find the best one you have to exploit correct right and how do you find the best one to find the best one you have to explore but it's all
probabilistic so you still don't know when to stop exploring yeah because just one pull of the lever is not going to give me the reward so people as in mathematicians have come up with a whole number of uh different algorithms okay uh to solve this right first one simplest one is called Epsilon fixed okay okay pick up a number Epsilon say 20% okay and then you say that every time you have to choose one option okay with 80% probability you will choose to exploit the best you have found so far okay and with 20% probability you will explore randomly okay okay so that is you have a fixed Epsilon Epsilon is the chance that you
will explore and throughout your life you have a 20% chance of exploring anytime you have to make a choice 20% chance that you will try something new 80% chance you will go with the best you have okay but at the beginning of this uh thing I don't have any Clarity on which machine I'm supposed to exploit right yes you're right that is one of the weaknesses of this algorithm okay because in the beginning you are exploring 100% of the time because there is nothing to exploit and after a time you build up uh what you are doing right a different algorithm is called Epsilon decreasing okay you start with 100% Salon which means you are
always exploring then you decrease to 90% then 80% then 70% or so on or you have some other schedule for example there is this person Henri Carlson on Twitter whose Epsilon decreasing strategy is that any new project he breaks up into 30 months first 10 months he is in full explore mode okay next 10 months he is 2/3 Explorer 1/3 exploit next 10 months he is 2/3 exploit and one3 explore and so on that actually reminds me of another similar thing I heard recently by this guy called Noah Kagan now NOA KAG is basically the founder of a site called appsumo tocom yeah and he has come out with a book called million dooll weekend yeah in
that he says uh take 48 hours to develop an idea and uh in those 48 hours if you can find three customers yeah then you focus on that idea if you can't find three customers you move on to the next idea correct so he's basically advising exploring 52 ideas in 52 weekends and then exploiting whichever one of them actually hits correct so that is uh Epsilon decreasing right which is like you come up with a schedule saying that for this much period I'm going to explore after that I'm going to exploit right going back to the first algorithm fixed Epsilon that's called Epsilon greedy right do you know a great example of how it was used uh you said 20 % Epsilon right yes oh are
you talking about Google and orot and all of those correct so Google in the early days used to give their employees the choice to spend 20% of their time working on any idea that the employee thought could be interesting to Google's business this was back when Google had the Company motto don't be evil that Company motto doesn't exist anymore this was called the 20% time or the 20% rule and for example Gmail has come out of one employees 20% project right or could which used to be the social network before Facebook uh and was quite popular in Brazil and India that also came out of the employee named ORS 20% so these are the two simplest algorithms there are more complex
algorithms that involve more calculations but can give you even better results if you want to look up multiarm Bandit on uh Wikipedia right um I have a question yeah all of this assumes that we are able to explore what if there are restrictions on exploration what if we can't explore freely oh that's a lovely question right and there is a very interesting uh problem this worries me but go on it's called the arranged marriage problem this so keep in mind that this is mathematicians we are talking about and uh they have restrictions on their imagination okay so imagine there is a sultan and he's going to get married and sutors from all over the world politically
important people are bringing potential brides and the sultan talks to her and decides whether he wants to marry her or not he has to say yes or no okay problem is that if he says no he can't go back and uh say Yes again to to the same person no is a no right I believe this is a reality TV series called The Bachelor but go sure yeah um now here is the problem here is an explore exploit conflict without the chance of exploration right I mean when you are talking to the first person you don't know what the remaining 99 are going to be like absolutely right so do you say yes or no and mathematicians have actually studied
this and they have come up with a number saying that if you know beforehand how many are the total number of brides right you just reject the first 37% okay okay 37 comes from the fact that 37 is 100 upon e e is Oiler constant 2.71 7 okay but reject the first 37% and then after that say yes to the first one who is better than all the previous ones naturally of course naturally because it's a natural la bad joke I mean more generally the point is that when you have you can't go back and explore freely right set aside roughly 37% of the time or the effort or whatever for exploration and then after that you switch to exploit somebody should
actually go back and uh you know do a study on the episodes and seasons of Bachelor and see if this 37 per kind of holds and works or not uh but then again uh you had to take the example of marriage and with [Music] marriage that is another lovely point you have brought up because it meant as a joke no so it is important so the psychology of the explore exploit conflict is this that you can regret it either ways right before you have taken the decision you are stressed oh how should I take a decision what if I make the wrong decision yes uh how do I pick the perfect one and so on right so you're fully stressed and you are
unhappy right afterwards once you have taken the decision whatever decision you have taken you keep worrying oh should I have done that right if you have chosen to exploit one you keep wondering if I had explored wasn't it better whereas if you have chosen to explore you're like oh God look at those guys they're doing so well right so um you're going to regret either way true right so probably the correct way is to pick a decent algorithm right any one of the algorithms we talked about either fixed 20% or a schedule of reducing this or 37% rule pick one algorithm stick to it and then do not regret your decision tell yourself that I picked the correct process
and I have right only to following the process I do not have right to the results thereof right yeah so whether you are a BAL or a decide and live with it more precisely first of all be aware that there exist these two choices okay second is be aware that you don't have to be one or the other right there are these algorithms which let you choose dynamically and of course that uh it's different for different areas of your life right and finally pick one strategy and don't sweat the results K right that's an episode we did very early on in the uh Series in this uh Channel and uh we'll put that up for you to check out do check it out and do
let us know your honest thoughts like you've been letting us know on the comments thank you so much uh but yeah Balu bagira choose decide uh for whatever project you're doing and go with the flow this is future IQ I am Balo he's bagira or vice versa I am Balo he's bagira SRI naven