**Disclaimer:** Now that the HiMCM 2020 has officially ended and 2 additional weeks have passed, I believe that now it's safe to discuss the problem. To view 2020 problem B, please access the link below!

__https://www.comap.com/highschool/contests/himcm/2020_HiMCM_Problem_B.pdf__

Also, the full report can be seen in: __https://www.seanyoonbio.com/free-notes__

Besides the awards and competition, this problem was very very interesting in my opinion. Although it seemed like a very classic modelling problem, it was very challenging than expected. Note that this problem is an NP-difficulty problem, so we chose to take one of the **traditional lines of attack for NP-problems,** which are either:

Bash it with a supercomputer

Devise an algorithm that finds the suboptimal solutions

For 1, there won't be much to research, and the spirit of the topic probably wouldn't be bashing with a supercomputer, so we chose to devise an algorithm that finds many suboptimal solutions instead. Because many models are better than one, we chose to employ a **multi-model analysis **that would give us many suboptimal solutions. Among them, we would be able to choose the one that is the best, which would not necessarily be the best of all cases, but still reasonably effective compared to our computing power and time. Until this part was fairly easy and intuitive.

As efficiency is making the most of investments, **the combination of plants that maximized benefit relative to funds required** was considered as the "suboptimal solutions." At first, we didn't have any idea on how to determine this combination. But after some discussion, I thought of the **greedy algorithm**. Maybe by "removing" one plant at a time might give us the answer! By starting with 48 plants, removing each one of them that maximizes our return, we were able to find that a certain combination of 31 plants was the best. But this might not be the optimal solution.

Hence, our second model was the **multi-armed bandit problem**. So what is the multi-armed bandit problem? Say that there is a gambler and *n* slot machines (bandits), all of which the gambler does not know the probability of winning a jackpot. The multi-armed bandit problem basically aims to find the distribution of *x* dollars to *n *bandits to maximize his/her chance of winning.

To this end, in the first round, the gambler begins with equal distribution to all of the slot machines. Then, in the second round, he distributes more money to the ones where the jackpot came out, and allocates less to the slot machines where the jackpot didn't come out. We reiterate this process, and plot the probability distribution curves for the chance of winning for each of the bandits like below.

* Fig 1. the probability of winning for five bandits*

**But how exactly do we obtain these curves? **The answer is the** Beta distribution.** It is defined as

where B is defined as a fraction involving the Gamma function

Then, with these set of probabilities, using basic probabilities, we would be able to distribute the *x* dollars to *n *bandits. I thought the picture below really encapsulates this reinforcement learning method.

(And btw, I was quite amazed by this multi-armed bandit problem, and I will post a separate post to discuss more about it)

**Then, how can this problem be applied to our research topic? **Our intuition was to use each of the "number of plants that we choose" as each of the bandits. We do not know the combination of plants, but we will only consider the number of plants that are chosen. So, choosing one plant is the first bandit, choosing two plants is the second, choosing three is the third, and so on. In total, we would have 46 bandits because we have 46 plants! We then obtained a set of nice "basin" shaped 46 graphs. After 8 trials with 5000 iterations for each, the question became: then, how should we decide the combination?

Then, Eureka! We can apply the multi-armed bandit again, so that **the plants are the bandits** to choose the combination of plants! In essence, we applied it to choosing the number of plants, then the combination of plants. In other words, if choosing 31 plants is the optimum, each of the plants are the bandit. Then, we can choose the 31 bandits with the top 31 chances of winning.

From these, we obtained a total of 9 combinations, (1 from the greedy algorithm+8 from the trials in the second method) The order of when to begin funding each plant was decided later. We used a very simple algorithm of going back and forth 30 years to allocate the plant funding times, but I actually found a very nice resource that addressed the same problem. I regret that I didn't find this 2 weeks earlier.

__http://www2.hawaii.edu/~janst/311_past/311_f18/Notes/Topic-13.html__