Research Blog for Department of Computer Science @ Rensselaer Polytechnic Institute

Optimal agents for sequential decision making problems

Meenal Chhabra, PhD student in RPI/CS


My name is Meenal Chhabra. I am a fourth year Ph.D student in Computer Science working with Professor Sanmay Das. My main area of research is Computational Economics. I work on some of the problems in search, multi-agent systems and online learning.

I have finished my B.Tech (equivalent to BS in US) in Computer Science from IIT Roorkee in 2007. I worked as a software developer for 2 years and then joined RPI to pursue Ph.D. I also worked for SCNARC (Social Cognitive Networks Academic Research Center) under Prof. Boleslaw Szymanski at RPI for couple semesters.

Sequential decision making is a part of our day to day lives. Some decisions happen as a reflex, and we do not even realize that we made a decision. Some of these problems require learning from previous experiments. For instance, if there are multiple routes from an individual’s home to office, then everyday he makes a choice by picking up a route. He may have different objectives while choosing a path, like take less time or use toll free path etc. and choose the path accordingly.
Consider a person looking to buy a used a car. She will most likely investigate the options sequentially till he buys the one. The best naive strategy would be to examine all possible opportunities and choose the best one. However, there is a cost involved in observing each of the opportunity for example, the cost involved in scheduling an appointment to go and test driving the car. The person needs to trade off the potential benefit of continuing the search and finding a possibly more valuable opportunity with the cost of doing so. Also, the car owner might not wait for the interested buyers to look all the cars that she is interested in. If another offer is made, it might be accepted. So waiting poses a risk too.

In my research, I concentrate on designing an optimal strategy for finding the best option in a search. There are many problems in which an agent (i.e. the computer) has to search for the best solution to a problem. For example, consider Watson, the computer program that beat the current champions in the game show Jeopardy. Watson has to search for all possible answers to a question from a vast store of information, texts of English language, etc. before deciding on a single answer.  Searching the whole set of solutions takes too much time and resources. Searching very little will not guarantee that the best solution can be found. So, a strategy is needed to make sure the search spends just the right amount of energy in finding a solution.

Some solutions to this problem suggest that the decision maker should stop searcher as soon as she finds an acceptable solution, one that is exceeds a criteria of value based on a threshold. But, in real world, the exact real value is not easy to assess. Often we see a value that is close to the real value, but may not be exact. We will call this a noisy value correlated with the true value. For example, in the case of buying a used a car, the car may look really nice but may have a bad drive train which might not be visible on a simple test drive. In such a scenario, a marketplace for experts who can provide special services of providing a more accurate value of an opportunity is created. In the used-car example, an individual looking to a buy a used car may consult online services like Carfax or take the car to a mechanic or both to get the more accurate value of the opportunity. However, these services are not free so an individual needs to decide the possible value provided by these services. For example, if an individual is checking a car owned by his friend or relative then probably he already knows the history of the car and can decide without consulting these services.

In my research, I concentrate on the optimal search problem when the searcher agent observes a noisy value correlated with the original value, and can obtain true value of an opportunity for some fee by consulting an expert. The optimal strategy for the searcher is dependent on the search cost and the query fee of an expert. Assuming both the searcher and the expert as rational agents, the optimal query fee for an expert is calculated after characterizing the optimal strategy for the searcher.

Our findings show that under a simple assumption, there is a simple solution to this problem.  Consider any two observed values s1 and s2.  Suppose for all values of s1 and s2 such that s1 > s2, the conditional distribution of the true value given s1, the higher observed value, first-order stochastically dominates the conditional distribution of the true value given s2 (the lower observed value).  This condition ensures that the probability that the actual value is greater than any particular observed value v is monotonically increasing in the true value. In this case, the optimal strategy is threshold based; there exists thresholds tl, tu such that the optimal strategy is given by (V, tl, tu) where:

1)  the searcher should reject all the opportunities with observed value less than tl,
2)  the searcher should accept the opportunity with observed value greater than tu
3) the searcher should query an expert to get the true value if the observed value lies in between tl and tu; if the true value is greater than V, the searcher’s utility from the search process, then the searcher should accept the opportunity other the searcher should resume the search.

The optimal strategy is shown in the figure below.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s


This entry was posted on May 25, 2012 by in Agent Based Systems.
%d bloggers like this: