Educação matemática pela arte
Gusmão, Lucimar Donizete
2013-08-28
Date
2000
Description
Commerce in information goods is one of the earliest emerging applications for intelligent agents in commerce. However, the fundamental characteristics of information goods mean that they can and likely will be offered in widely varying configurations. Participating agents will need to deal with uncertainty about both prices and location in multi-dimensional product space. Thus, studying the behavior of learning agents is central to understanding and designing for agent-based information economies. Since uncertainty will exist on both sides of transactions, and interactions between learning agents that are negotiating and transacting with other learning agents may lead to unexpected dynamics, it is important to study two-sided learning.
We present a simple but powerful model of an information bundling economy with a single producer and multiple consumer agents. We explore the pricing and purchasing behavior of these agents when articles can be bundled. In this initial exploration, we study the dynamics of this economy when consumer agents are uninformed about the distribution of article values. We discover that a reasonable albeit naive consumer learning strategy can lead to disastrous market behavior. We find a simple explanation for this market failure, and develop a simple improvement to the producer agent's strategy that largely ameliorates the problem. But in the process we learn an important lesson: dynamic market interactions when there is substantial uncertainty can lead to pathological outcomes if agents are designed with "reasonable" but not sufficiently adaptive strategies. Thus, in programmed agent environments it may be essential to dramatically increase our understanding of adaptivity and learning if we want to obtain good aggregate outcomes.
We present a simple but powerful model of an information bundling economy with a single producer and multiple consumer agents. We explore the pricing and purchasing behavior of these agents when articles can be bundled. In this initial exploration, we study the dynamics of this economy when consumer agents are uninformed about the distribution of article values. We discover that a reasonable albeit naive consumer learning strategy can lead to disastrous market behavior. We find a simple explanation for this market failure, and develop a simple improvement to the producer agent's strategy that largely ameliorates the problem. But in the process we learn an important lesson: dynamic market interactions when there is substantial uncertainty can lead to pathological outcomes if agents are designed with "reasonable" but not sufficiently adaptive strategies. Thus, in programmed agent environments it may be essential to dramatically increase our understanding of adaptivity and learning if we want to obtain good aggregate outcomes.
Identifier
in Agent-mediated Electronic Commerce, Lecture Notes in Artificial Intelligence. Berlin: Springer-Verlag, 2000.
Language
Show preview
Hide preview
Two-sided learning in an agent economy for
information bundles
Je�rey O. Kephart
1
, Rajarshi Das
1
, and Je�rey K. MacKie-Mason
2
1
IBM Research
PO Box 704, Yorktown Heights, NY 10598
fkephart,rajarshig@watson.ibm.com
2
School of Information and Department of Economics
University of Michigan, Ann Arbor, MI 48109
jmm@umich.edu
Abstract. Commerce in information goods is one of the earliest emerg-
ing applications for intelligent agents in commerce. However, the funda-
mental characteristics of information goods mean that they can and likely
will be o�ered in widely varying con�gurations. Participating agents
will need to deal with uncertainty about both prices and location in
multi-dimensional product space. Thus, studying the behavior of learning
agents is central to understanding and designing for agent-based informa-
tion economies. Since uncertainty will exist on both sides of transactions,
and interactions between learning agents that are negotiating and trans-
acting with other learning agents may lead to unexpected dynamics, it
is important to study two-sided learning.
We present a simple but powerful model of an information bundling
economy with a single producer and multiple consumer agents. We ex-
plore the pricing and purchasing behavior of these agents when articles
can be bundled. In this initial exploration, we study the dynamics of this
economy when consumer agents are uninformed about the distribution of
article values. We discover that a reasonable albeit naive consumer learn-
ing strategy can lead to disastrous market behavior. We �nd a simple
explanation for this market failure, and develop a simple improvement
to the producer agent's strategy that largely ameliorates the problem.
But in the process we learn an important lesson: dynamic market in-
teractions when there is substantial uncertainty can lead to pathological
outcomes if agents are designed with \reasonable" but not su�ciently
adaptive strategies. Thus, in programmed agent environments it may be
essential to dramatically increase our understanding of adaptivity and
learning if we want to obtain good aggregate outcomes.
Keywords: Information economy, information bundling, two-
sided learning, economic agents.
1 Introduction
Extraordinary rates of advance in computation and communication technolo-
gies
1
have fueled the much-noted proliferation of electronic commerce. Within a
few years, we anticipate that software agents will participate in a wide variety of
commercial transactions, and may even become economic players in their own
right [5]. One important domain for agent economies is the production and dis-
tribution of information goods and services, such as news articles, entertainment
and other service reviews, and instructional materials. Digital information goods
are unusually con�gurable, and negotiations over the composition and prices for
bundles are a natural application for software agents. However, to do so will
require that broker agents be able to explore not just prices, but also locations
in a multi-dimensional bundled product space.
Very little is understood about strategic search over the joint information
good price and product space (that is, search for which products to o�er, in
which combinations, at what prices).
2
Agents competing in this space must
learn about the distribution of preferences across a heterogeneous and changing
customer population, must learn about the strategies being followed by their
competitors, and then must optimize their strategies to take into account the
new understanding of customers and competitors. Consumers, on the other hand,
must learn about the quantity, quality and price of bundled items o�ered by the
various producers. Learning and optimization are both di�cult search problems,
and in the market context they are closely related.
In earlier work we considered strategic pricing for bundles when all of the
relevant parameters are known by brokers [3]. We distinguished between condi-
tions in which �rms prefer to o�er comprehensive bundles at a single price and
those in which they prefer to sell items individually. Although ours is the �rst
work focusing on competitive strategies, the bundling and pricing of information
goods has been an active area in recent research [1], [2]. We have also studied
price dynamics when competing brokers search a restricted price and product
space [5, 6, 4]. Price instability, characterized as \price wars", was prevalent.
In this paper we examine a considerably more general problem, although
for this initial foray we maintain some simplifying restrictions. We envision an
agent economy with one information broker agent and many consumer agents.
The consumer agents are heterogeneous: they value each item di�erently, and
these values are drawn from a di�erent distribution for each agent. To set pro�t-
maximizing prices, the broker agent desires, but may not know, the parameters
of the consumers' valuation distributions. The consumers also may not know
their valuation parameters until they gain experience with the broker's o�ered
information goods. Therefore, both sides wish to learn. The broker can set prices
strategically, learning about consumer valuations from their purchasing behavior
at di�erent prices. The consumers can purchase strategically, learning about the
distribution of information good values by sampling.
1
See, e.g., [7].
2
For some recent work on multi-agent search, see, e.g., [14, 13].
This environment is very rich and will permit us to explore many interesting
questions. In this paper we focus on one surprising result: that when consumer
agents follow a plausible but overly naive learning strategy, even if the producer is
fully informed (but also somewhat naive), the economy can continuously degen-
erate with disastrous overall performance. We �nd a simple explanation for the
initial failure, and develop a simple improvement to the producer agent's strategy
that largely ameliorates the problem. But in the process we learn an important
lesson: dynamic market interactions when there is substantial uncertainty can
lead to pathological outcomes if agents are designed with \reasonable" but not
su�ciently adaptive strategies. Thus, in programmed agent environments it may
be essential to dramatically increase our understanding of adaptivity and learn-
ing if we want to obtain good aggregate outcomes.
This paper is the �rst in a series of studies directed towards understanding
and developing robust adaptive learning techniques that are e�ective for indi-
vidual agents and lead to acceptable collective (market) behavior. We are also
extending the model to study the much more common setting in which there is
competition among multiple producers of information goods. Of course, this set-
ting exacerbates the learning problem, since the producers need to learn about
each other's strategies in addition to learning about consumer valuations.
In the next section we discuss the model. We introduce two relevant dis-
tributions: �rst, a parameterized distribution g from which a given consumer's
valuations are drawn, and second, a distribution h describing the population of
consumers | the distribution from which an individual consumer's valuation
distribution parameters are drawn. For tractability, we assume that the broker
is restricted to two-part tari� pricing schemes [9]: it can charge consumer agents
a subscription price to examine the current items, and a per item price for each
item the consumer subsequently purchases.
In a general analysis section, we derive the optimal strategies for fully in-
formed broker and consumer agents as functions of the article value and con-
sumer type distributions g and h. Then in Section 4 we explore speci�c cases of
exploration and exploitation through analysis and simulation. When agents are
underinformed and engage in learning, we consider plausible (but not necessarily
fully optimal) agent strategies, and examine the resulting system outcomes. We
close with a summary of our current results and plans for future work.
2 Model
A single producer periodically (at discrete times t = 0; 1; 2; : : :) generates sets of
N articles. It sets a subscription fee F and a price schedule P = fP (1); : : : ; P (N )g,
where P (k) represents the price it charges for a subset of k 2 f1; : : : ; Ng of the
N articles.
At a given time t, each of M consumers are informed about (F;P ), and they
decide whether to subscribe. Then, each subscribing consumer receives abstracts
of all N articles, and uses them to assess its value w
j
from reading each article,
for j 2 f1; : : : ; Ng. We assume that these values are generated randomly ac-
cording to a distribution g(q
i
;w), where q
i
is the set of parameters that de�ne
the distribution for consumer i. The parameters q
i
that represent consumer i's
valuation distribution are themselves generated randomly prior to time 0 from a
distribution h(r; q). Once the q
i
parameters are generated for consumer i, they
remain �xed for the rest of time (though the agents may not know their true
values for a while, if ever).
After assessing the value of each article, a subscriber decides which articles
to purchase. It does this by choosing a set of articles K to maximize surplus
s = (
P
j2K
w
j
) � P (jKj). Henceforth we assume that the articles have been
sorted by consumer j so that the w
j
are ordered highest to lowest, and thus the
set K consists of the �rst k = jKj articles.
The subscription decision depends on consumer expectations. If a consumer
believes correctly that its valuations are drawn from a distribution with param-
eters q
i
, then its expected surplus from purchasing k articles would be
hs
k
i =
*
k
X
j=1
w
j
+
� P (k): (1)
The consumer can then derive the vector of values p
k
, the probabilities that any
k is the expected surplus maximizing number of articles. Then the consumer's
optimized expected surplus is
hsi =
N
X
k=1
p
k
hs
k
i : (2)
The consumer should subscribe if and only if the expectation hsi exceeds the
subscription fee F .
3
If consumer i does not know its own q
i
for the articles o�ered by this broker,
then values w^
j
are the consumer's beliefs drawn from a distribution g(
^
q
i
;w) after
reviewing the abstracts, where the
^
q
i
are the agent's current best estimate of the
valuation parameters q
i
. When the agent purchases and reads articles, it learns
their true valuesw
j
and can then use this sample information to update its beliefs
^
q
i
about the distribution of article values. Therefore, a good consumer strategy
should take into account the value of learning. For example, when uncertainty
about q
i
is high, the consumer might deliberately subscribe even when its esti-
mated surplus is less than F , simply to experience more articles to improve its
valuation estimates. Or, having subscribed the consumer might purchase more
articles than would maximize expected surplus from current reading.
Turning to the producer's problem, it can choose a subscription fee F and
a price schedule P in each period. These should be chosen to maximize some
3
We assume consumers are risk neutral. A risk averse agent would want to optimize
the expectation of a concave function of surplus, and optimal behavior would depend
on second and possibly higher moments of the induced distribution of surplus, not
just the expected surplus. See, e.g., [8].
function of expected current and future pro�ts, where current pro�t can be
expressed as:
� =
M
X
i=1
�
i
(F + P (k
i
)�C(k
i
)) : (3)
with �
i
= 1 if consumer i chooses to subscribe, and 0 otherwise. The cost of
delivering k
i
articles to consumer i is denoted as C(k
i
).
In performing its maximization, the producer must take into account the
e�ect of P and F on the consumer's subscription decisions, and the e�ect of P
on the distribution of k
i
across the set of subscribers. Higher prices will decrease
the number of subscribers and the expected number of articles purchased, of
course. To compute the optimal subscription fee and price schedule the broker
wants to know the distribution h(r; q) from which the consumers' parameters
were generated, and the consumer strategies for subscribing and purchasing.
Based on its current beliefs about r and consumer strategies, the broker can
simulate a consumer population and its responses to various (F;P ) schedules,
and then pick (F;P ) to maximize a value function. In practice, the broker may
not know the consumer type parameters r, nor the consumer strategies. Thus,
the broker may choose (F;P ) to balance current expected pro�t according to
equation (3), against the increase in expected future pro�t from learning about
consumer preferences by observing their behavior when confronted with varying
(F;P ) combinations.
3 General Analysis
In this section, we �rst analyze the expected surplus and number of articles
purchased per subscription period for a rational, fully-informed consumer with a
given valuation distribution g. Then, we derive an expression for a monopolistic
producer's expected pro�t as a function of its price schedule and the distribution
h of valuation parameters q across the consumer population, assuming that all
consumers are rational and fully-informed about their valuation distribution.
A rational, fully-informed producer would choose its price schedule so as to
maximize its expected pro�t.
Suppose that a given consumer has its valuations w drawn from a probabil-
ity density function g(q;w). (The distribution parameters q may vary from one
consumer to another.) For the sake of simplicity, we assume that the producer
constrains itself to a linear price schedule: P (k) = k�. Then a rational consumer
will purchase k articles, where k is the number of articles with valuations ex-
ceeding the threshold P (k) � P (k � 1) = �. The probability p
k
for exactly k
articles to have valuations w > � is
p
k
=
�
N
k
�
G(q;�)
N�k
(1�G(q;�))
k
(4)
where G(q;w) represents the cumulative distribution function that corresponds
to g(q;w), i.e. it is the probability for an article to be valued at less than w.
From this we can compute the expected number of articles purchased:
hk(q;�)i =
N
X
k=1
k p
k
= N [1� G(q;�)] (5)
where the last equality follows from simple manipulations of binomial coe�-
cients.
The expected surplus hsi can be obtained from Eq. 2, provided that we �rst
compute hs
k
i, the expected surplus given that exactly k articles prove to have
valuations w > �. The conditional probability distribution for a single draw
from g(q;w) given that w > � is
~g(q;w) =
g(q;w)
1�G(q;�)
�(w ��) (6)
where �(x) represents the step function, equal to 1 if x > 0 and 0 otherwise.
The expected valuation for this distribution is
�w(q;�) =
Z
1
�
dww ~g(q;w) (7)
The expected sum of k draws from this conditional distribution is k �w(q;�).
Subtracting the price P (k) = k�, we obtain:
hs
k
i = k( �w(�)��) (8)
Inserting this result into Eq. 2, we obtain:
hs(q;�)i =
N
X
k=1
p
k
k [ �w(q;�)��] = [ �w(q;�)��] hk(q;�)i (9)
= N [ �w(q;�)��] [1� G(q;�)]
Note that the consumer will subscribe if and only if hs(q;�)i is greater than the
subscription fee F .
Now we take the producer's perspective. If we assume (for simplicity) that
the cost of producing and delivering k items is C(k) = k , then the expected
pro�t is
h�(F;�)i =
M
X
i=1
�(hs(q
i
;�)i � F )(F + (�� ) hk(q
i
;�)i) (10)
�
Z
q
dqh(r; q)�(hs(q;�)i � F ) (F + (�� ) hk(q;�)i)
where h(r; q) is the probability distribution for the consumers' q parameters, as
de�ned previously. In e�ect, the last approximation replaces the actual realized
set of consumer distribution parameters fq
i
g with an ensemble average over all
possible realizations of a set of M consumers generated from the distribution
h, and this approximation grows increasingly accurate in the limit of large M .
Note that a producer that knows h can compute this pro�t landscape. A fully
knowledgeable and rational producer would set � and F so as to maximize
h�(F;�)i.
4 Rational and Bounded-Rational Players
In the remainder of the paper, we shall assume simple functional forms for g and
h, and explore what happens when the consumers and/or the producer must
learn these distributions. In particular, we suppose that g is a one-parameter
exponential distribution given by g(�;w) = �e
��w
, and that h(�) is a uniform
distribution in the interval from �
min
to �
max
.
4.1 Fully-informed producer and consumers
As a reference point, we analyze the case where the consumers are fully informed
about their individual values of �, the producer knows the distribution h(�), and
the producer and the consumers act so as to maximize their expected gain.
Integrating g, we obtain the cumulative distribution G(�;w) = 1 � e
��w
.
From Eq. 6 we can compute the conditional distribution
~g(�;w) =
g(�;w)
1�G(�)
�(w ��) (11)
= �e
��(w��)
�(w ��):
The average valuation for this conditional distribution is �w(�;�) = � + �
�1
.
Using Eqs. 5 and 9, we obtain the expected number of purchased articles and
the expected surplus (assuming the consumer subscribes):
hk(�)i = Ne
���
(12)
hs(�)i =
N
�
e
���
:
To compute the producer's expected pro�t as a function of � and F , we can
substitute Eqs. 12 into Eq.10, which yields:
h�(F;�)i =
1
(�
max
� �
min
)
Z
�
0
�
min
d� (F + N (�� )e
���
) (13)
=
(�
0
� �
min
)F +N (1�
�
)
�
e
���
min
� e
���
0
�
(�
max
� �
min
)
;
where �
0
is de�ned as �
0
= min(�
max
; ~�) and ~� is de�ned as the unique solution
to Ne
�~��
= F ~�.
profit 0.4 0.3 0.2 0.1
0.005
0 0.25
0.5 0.75
1 0 0.5 1 1.5
2 2.5 3
0
0.2
0.4
fee
price
profit profit
0.225 0.2 0.1
0.005
0 0.25
0.5 0.75
1 0 0.5 1 1.5
2 2.5 3
-0.5
-0.3
-0.1
0.1
0.3
fee
price
profit
Fig. 1. Pro�t landscape �(f;�). h is a uniform distribution with �
min
= 0:5, �
min
= 2:0.
a) Production cost = 0:1. b) Production cost = 0:5.
Eq. 13 can be visualized as a pro�t landscape in which the expected pro�t
is plotted as a function of F and �. It is convenient to de�ne a normalized
expected pro�t � = �=N and a normalized fee f = F=N . Fig. 1 illustrates the
landscape for two di�erent production costs: = 0:1 and = 0:5.
In each such landscape, there are two ridges. The lower ridge, which demar-
cates the boundary beyond which the producer attracts no consumers and thus
makes no pro�t, is de�ned by e
��
min
�
= f�
min
. The upper ridge is described
by a piecewise joining of two nonlinear curves, the simpler one being described
by the relation e
��
max
�
= f�
max
. This portion of the ridge has the following
characteristics:
� it is formed by a discontinuous derivative with respect to f and � (it
changes abruptly from positive to negative)
� for price settings along this ridge, all consumers subscribe
� if the production cost <
crit
, the optimal setting of (f;�) occurs along
this part of the ridge. For the examples of Fig. 1, the critical production
cost
crit
is found to be approximately 0.28
4
, so that Fig. 1a ( = 0:1)
is in this regime, while Fig. 1b ( = 0:5) is not.
The other portion of the ridge is de�ned by a more complex nonlinear relation
between f and�. It is less sharp, resulting from derivatives with respect to f and
� being zero rather than jumping discontinuously from positive to negative.
5
4
It can be shown that, for exponential g and uniform h,
crit
=
�(
crit
)�
min
�
max
+�
min
�
2
max
�
�
min
�
2
max
, which is positive when �
min
> 0.
5
These characteristics only pertain to the case where g is exponential and h is uniform;
we have explored other combinations of functional forms for g and h that yield pro�t
landscapes that are topographically di�erent. For example, if g is exponential and
Figure 2 shows the dependence of the optimal f and � upon the production
cost . There is a discontinuous derivative at =
crit
, due to the switchover
between the two nonlinear curves that de�ne the upper ridge in the landscape.
As one might guess, the pro�t � decreases monotonically with the production
cost. The proportion p of consumers that subscribe is 1 for all �
crit
; for
exceeding this threshold the proportion of subscribers is strictly less than one,
and is given by
~���
min
�
max
��
min
.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0
0.2
0.4
0.6
0.8
1.0
µmin = 0.5 µmax = 2.0
∆
f
pi
p
production cost γ
∆,
f
Optimal ∆ and f vs. γ
Fig. 2. Optimal �, f , �, and p vs. , where h is a uniform distribution with �
min
= 0:5,
�
min
= 2:0.
4.2 Uninformed consumers
For a variety of reasons, a consumer may have to rely on adaptive estimates of
its valuation distribution, g. Suppose for example that consumer i knows that
g is an exponential distribution, but does not know its individual parameter �
i
.
Depending on the consumer's beliefs about the dynamics of the environment, it
might wish to place more or less weight on recent observations. Since the mean of
the distribution g(�;w) = � exp
��w
is 1=�, one reasonable and exible approach
to estimating � is to start with an assumed prior �
0
and after each period of
subscription to update the estimate �^ according to
�^
t+1
= (� �m
t
+ (1� �)(�^
t
)
�1
)
�1
; (14)
where m
t
is the mean of the N valuations received during the subscription
period t, and � is the consumer's \ ightiness" factor. This factor could be set to
h consists of a number of well-separated mass points, the landscape can contain
multiple ridges and peaks.
a constant, or be time-varying; in fact, by setting � = 1=t, the consumer weights
all observations equally.
Suppose that the producer knows the distribution h, and sets f and � to
the \optimal" values given by Fig. 2. (For = 0:1, these values are (f;�) =
(0:30878; 0:24098), at which the \ideal" pro�t � = 0:4137 and the fraction of sub-
scribers p = 1.) Furthermore, suppose that the consumers all start with exactly
the right estimates of their � parameters, but that they update their estimates
in accordance with Eq. 14. Fig. 3 illustrates what happens to consumer pro�ts
and subscription rates as a function of time for two values of �. In both cases,
the proportion p of subscribed consumers continues to diminish inde�nitely, and,
correspondingly, so does the pro�t �.
0
0.2
0.4
0.6
0.8
1
0 40 80 120 160 200
Pr of
it, C
on su
m er
iteration
profit consumers
profit = 0.4137
0
0.2
0.4
0.6
0.8
1
0 40 80 120 160 200
Pr of
it, C
on su
m er
iteration
profit consumers
profit = 0.4137
Fig. 3. Pro�t � and proportion of subscribed consumers p vs. time (in subscription
periods) when consumers estimate their � values, and the ightiness parameter is a)
� = 0:1 and b) � = 0:9. Consumer leakage occurs for � = 0:1, and is even more
pronounced for � = 0:9. Pro�ts diminish inde�nitely, although the rate of reduction
slows with time. The market consists of M = 1000 consumers and one seller o�ering
N = 10 articles per subscription period. Dashed horizontal line indicates ideal pro�t
of 0.4137.
Fig. 4 o�ers further insight into consumer leakage. It shows the expected
average pro�t over 200 subscription periods for a wide range of f and � (not
just the optimal value). These are represented as pro�t landscapes for � =
0:1 and � = 0:9. Both landscapes have become peaked at lower values of f .
In other words, if the producer insists on holding to a �xed price schedule, it
will optimize pro�ts by setting the subscription fee lower than it could charge
perfectly informed consumers, thereby encouraging overly pessimistic consumers
to stay in the market longer. It partially compensates by raising � somewhat.
A little thought suggests that, regardless of , �, or other such parameters,
consumer leakage will eventually lead to complete market failure for any �xed
setting of f and �. As consumers continually update their �^, they are in e�ect
conducting a random walk about the correct value of �. If it should ever happen
that the consumer's �^ rises above a value such that hs(�;�)i falls below F , the
profit 0.325 0.3 0.2 0.1
0.005
0 0.25
0.5 0.75
1 0 0.5 1 1.5
2 2.5 3
0
0.2
0.4
fee
price
profit profit
0.2 0.1
0.005
0 0.25
0.5 0.75
1 0 0.5 1 1.5
2 2.5 3
0
0.2
0.4
fee
price
profit
Fig. 4. Pro�t landscape after consumer leakage has occurred. Mean pro�t per article
per consumer as a function of F and �, after 200 subscription periods with F and
� at the values that would be optimal for perfectly informed consumers. The market
consists ofM = 1000 consumers and one seller o�ering N = 10 articles per subscription
period. a) � = 0:1. b) � = 0:9.
consumer will become permanently disenfranchised from the market. In other
words, the value of �^
crit
at which consumers are discouraged from subscribing
(which, from Eq. 12, can be computed from
1
�^
crit
e
��^
crit
�
< f), acts as an ab-
sorbing boundary. In the next subsection, we will discuss how a producer might
use dynamic pricing and optimization to prevent unchecked pro�t erosion.
4.3 Solving the leakage problem
Consumer leakage hurts both the producer and the consumers, and therefore all
players have an incentive to counteract it. Both consumers and producers can do
this by putting more emphasis on exploration as opposed to (pure) exploitation.
Consumers could use a variety of schemes; for example, they could choose to
subscribe at random with a �nite probability even if their expected surplus is less
than the subscription fee, and this probability could diminish monotonically as
the di�erence between these quantities increases. Producers could �ght leakage
by temporarily decreasing prices to resurrect consumers who have mistakenly
disenfranchised themselves, in hopes that with additional samples the consumers
will increase their estimated surplus to levels that can support higher prices. One
might expect that consumers could be enticed back into the market even with
small discounts, since, once their �^ ventures into a realm where subscription
appears to be unpro�table, it is frozen at this just-barely-unpro�table value.
6
6
Economists have studied behavior of this sort in order to explain temporal price
dispersion, such as \sales" and temporary discounts. A mixture of reasons have
been modeled, including attempts to price discriminate between better and worse
Here we focus on the producer's strategy for enticing overly pessimistic con-
sumers back into the market. (This is not to deny that consumers' exploration
strategies merit serious study.) From our study of consumer leakage, it is appar-
ent that the producer's strategy must involve dynamic pricing, and that it must
cope with a pro�t landscape that changes dynamically due to shifts caused by
consumers' ongoing attempts to learn an estimate of g. It also seems most likely
that the pricing strategy would involve stochastic search on this dynamically
changing landscape, rather than following some pre-planned schedule.
The dynamically changing nature of the pro�t landscape in our problem lim-
its the applicability of standard stochastic optimization techniques. For example,
approaches such as simulated annealing implicitly assume that the search is be-
ing conducted on a static landscape as the value of its temperature parameter
is lowered. While more sophisticated approaches that are currently oriented to-
wards static landscapes might be modi�ed to dynamic ones, we focus here on a
very simple iterative random-mutation hill-climbing method (RMHC). At each
iteration of the RMHC method (i.e. following each subscription period), the pro-
ducer tries a new f and a new �. If the new settings result in increased pro�t,
the producer continues to move in the same direction in the f-� landscape in
the next trial. Otherwise, it backtracks to its previous best point and makes
small adjustments to f and �. The step size of the adjustment is chosen from
an exponential distribution with a mean 1=� for both f and �. To take into
account the dynamic nature of the landscape being maximized, the hill climber
has �nite memory and can remember the previous best point visited within the
last T time steps. We have observed that, for overly large values of T , the pro-
ducer can get mired for a long time in regions of the (f;�) landscape that have
become less pro�table due to evolving consumer expectations.
Fig. 5 illustrates how the pro�t � and the subscribed consumer population
p evolve with time for a particular setting of � = 0:5 and T = 20. For these
settings (and for a fairly broad range around them), the RMHC algorithm pre-
vents long-term consumer leakage and pro�t erosion, even when the consumers
are extremely ighty (� = 0:9). Frequent lowering of prices continually pulls
consumers back into the market so they can keep sampling articles. Ironically,
the RMHC's noisy exploration of the pro�t landscape helps to keep that land-
scape relatively stable. While the pro�t never attains the value of 0.41 that it
would have if the consumers were to keep their �^'s �xed at their exact values,
the producer is still able to extract a healthy, reasonably stable pro�t of roughly
0.25. Fig. 6 illustrates the noisy trajectory of f and �.
5 Conclusions
Commerce in information goods is one of the earliest emerging applications for
intelligent agents in commerce. However, the fundamental characteristics of in-
formation goods mean that they can and likely will be o�ered in widely varying
informed customers, and inducing potential customers to bear some search costs to
�nd a better price or product. See [11], [10], and [12].
0 0.25
0.5
0.75
1
0 250 500 750 1000
pr of
it, c
on su
m er
s
iter
0
0.25
0.5
0.75
1
0 250 500 750 1000
(av g)
pro fit,
fe e,
pri ce
iter
profit fee
price profit = 0.4137
Fig. 5. Simulation run with producer using RMHC algorithm with memory of T = 20
subscription periods and step-size parameter � = 0:5. = 0:1, � = 0:9. a) Propor-
tion p of subscribed consumers. b) Moving average of f , � and � (integrated over 20
subscription periods).
0
0.25
0.5
0.75
1
1.25
1.5
0 0.2 0.4 0.6 0.8 1
pr ice
fee
settings
Fig. 6. Trajectory of f and � taken by the RMHC algorithm for the simulation run
shown above.
con�gurations. Participating agents will need to deal with uncertainty about
both prices and location in multi-dimensional product space. Thus, studying
the behavior of learning agents is central to understanding and designing for
agent-based information economies. Since uncertainty will exist on both sides of
transactions, and interactions between learning agents that are negotiating and
transacting with other learning agents may lead to unexpected dynamics, it is
important to study two-sided learning.
We presented a simple but powerful model of an information bundling econ-
omy with a single producer and multiple consumer agents. We then explored
the pricing and purchasing behavior of these agents when articles can be bun-
dled. In this initial exploration, we studied the dynamics of this economy when
consumer agents are uninformed about the distribution of article values. We
discovered that a reasonable albeit naive consumer learning strategy can have a
profound in uence on market behavior | in this case, a strikingly bad in uence.
Our consumer and producer agents were rather naive in our �rst learning
experiments. This could be viewed as a criticism of our modeling. However, es-
pecially early in the development of adaptive agent intelligence, it may well be
that agent-based markets are quite vulnerable to odd behavior and dysfunctional
dynamics of the sort we observed. Our consumer agents did not recognize the
option value of new information, and thus su�ered by not undertaking su�cient
exploration relative to exploitation. Our producer agent did not initially adapt
to the pathological dynamics induced by the consumer agent naivete, and thus
su�ered by relying too con�dently on its \perfect" but static knowledge. Al-
though it was fairly easy for us to see what was going wrong, and to modify
the producer agent in a simple way that ameliorated much of the problem, our
environment is arti�cially simple and static. In more realistic settings it may be
quite di�cult for even relatively intelligent agents to adapt to emergent patholo-
gies. Human markets may not be as susceptible because human behavior is less
rote and more re ective. The lesson for agent design is to search for strategies
that are dynamically robust and adaptive in the face of substantial uncertainty.
We have started to explore how to make our simple mechanism more robust
in realistic settings. For example, the hill climbing algorithm is likely to get stuck
at local optima. For the uniform h there is a single peak in the pro�t landscape,
but that is not at all general. Several powerful optimization techniques, such as
simulated annealing, exist for static landscapes. Extensions to these that handle
dynamically changing, noisy landscapes, may lead to robustly adaptive agent
learning strategies.
We have an active agenda of continuing work on this topic. For example, we
have begun to consider less naive consumer strategies, that balance exploration
against exploitation. We are considering producer strategies that adapt based on
the number of recent subscribers relative to the producer's model of the optimal
number of subscribers. Perhaps most challenging but essential to a more general
understanding of the problem is the extension of our work into an economy
with multiple producers who are underinformed about each other's competitive
strategies as well as about consumer valuations.
In this and earlier work we have found that initial plausible but simple de-
signs of economically-intelligent agents lead to dynamic market interactions that
can be surprising and unsuccessful. The value of intelligent agents in electronic
commerce will depend on the ability to understand the problems of learning
and adaptivity, and to design agents that interact robustly in the presence of
substantial uncertainty about both parameters and the strategies of other un-
derinformed agents.
References
1. Y. Bakos and E. Brynjolfsson. Bundling information goods: Pricing, pro�ts and
e�ciency. In D. Hurley, B. Kahin, and H. Varian, editors, The Economics of Digital
Information Goods. MIT Press, Cambridge, Massachusetts, 1998.
2. J. C. Chuang and M. A. Sirbu. Network delivery of information goods: Opti-
mal pricing of articles and subscriptions. In D. Hurley, B. Kahin, and H. Varian,
editors, The Economics of Digital Information Goods. MIT Press, Cambridge, Mas-
sachusetts, 1998.
3. S. Fay and J. K. MacKie-Mason. Competition between �rms that bundle.
Manuscript, Dept. of Economics, University of Michigan, 1998.
4. J. E. Hanson and J. O. Kephart. Spontaneous specialization in a free-market
economy of agents. In Proceedings of Workshop on Arti�cial Societies and Compu-
tational Markets at the Second International Conference on Autonomous Agents,
May 1998.
5. J. O. Kephart, J. E. Hanson, D. W. Levine, B. N. Grosof, J. Sairamesh, R. B. Segal,
and S. R. White. Dynamics of an information-�ltering economy. In Proceedings
of Second International Workshop on Cooperative Information Agents (CIA-98),
Paris, July 1998.
6. J. O. Kephart, J. E. Hanson, and J. Sairamesh. Price-war dynamics in a free-
market economy of software agents. In Proceedings of Alife VI, June 1998.
7. J. K. MacKie-Mason and H. R. Varian. Some economics of the internet. In
W. Sichel, editor, Networks, Infrastructure and the New Task for Regulation. Uni-
versity of Michigan Press, Ann Arbor, Michigan, 1996.
8. A. Mas-Colell, M. Whinston, and J. Green. Microeconomic Theory. Oxford Uni-
versity Press, 1995.
9. W. Oi. A Disneyland dilemma: Two-part tari�s for a Mickey Mouse monopoly.
The Quarterly Journal of Economics, 85(1):77{96, February 1971.
10. S. Salop and J. Stiglitz. Bargains and ripo�s: A model of monopolistically com-
petitive price dispersion. Review of Economic Studies, 44:493{510, October 1977.
11. Y. Shilony. Mixed pricing in oligopoly. Journal of Economic Theory, 14:373{388,
April 1977.
12. H. R. Varian. A model of sales. American Economic Review, 70:651{659, Septem-
ber 1980.
13. J. M. Vidal and E. H. Durfee. Learning nested agent models in an information
economy. Journal of Experimental and Theoretical Arti�cial Intelligence (special
issue on learning in distributed arti�cial intelligence systems), 10(3):291{308, 1998.
14. J. M. Vidal and E. H. Durfee. The moving target problem in multi-agent learn-
ing. In Proceedings of the Third International Conference on Multi-agent Systems
(ICMAS98), 1998.Two-sided learning in an agent economy for
information bundles
Je�rey O. Kephart
1
, Rajarshi Das
1
, and Je�rey K. MacKie-Mason
2
1
IBM Research
PO Box 704, Yorktown Heights, NY 10598
fkephart,rajarshig@watson.ibm.com
2
School of Information and Department of Economics
University of Michigan, Ann Arbor, MI 48109
jmm@umich.edu
Abstract. Commerce in information goods is one of the earliest emerg-
ing applications for intelligent agents in commerce. However, the funda-
mental characteristics of information goods mean that they can and likely
will be o�ered in widely varying con�gurations. Participating agents
will need to deal with uncertainty about both prices and location in
multi-dimensional product space. Thus, studying the behavior of learning
agents is central to understanding and designing for agent-based informa-
tion economies. Since uncertainty will exist on both sides of transactions,
and interactions between learning agents that are negotiating and trans-
acting with other learning agents may lead to unexpected dynamics, it
is important to study two-sided learning.
We present a simple but powerful model of an information bundling
economy with a single producer and multiple consumer agents. We ex-
plore the pricing and purchasing behavior of these agents when articles
can be bundled. In this initial exploration, we study the dynamics of this
economy when consumer agents are uninformed about the distribution of
article values. We discover that a reasonable albeit naive consumer learn-
ing strategy can lead to disastrous market behavior. We �nd a simple
explanation for this market failure, and develop a simple improvement
to the producer agent's strategy that largely ameliorates the problem.
But in the process we learn an important lesson: dynamic market in-
teractions when there is substantial uncertainty can lead to pathological
outcomes if agents are designed with \reasonable" but not su�ciently
adaptive strategies. Thus, in programmed agent environments it may be
essential to dramatically increase our understanding of adaptivity and
learning if we want to obtain good aggregate outcomes.
Keywords: Information economy, information bundling, two-
sided learning, economic agents.
1 Introduction
Extraordinary rates of advance in computation and communication technolo-
gies
1
have fueled the much-noted proliferation of electronic commerce. Within a
few years, we anticipate that software agents will participate in a wide variety of
commercial transactions, and may even become economic players in their own
right [5]. One important domain for agent economies is the production and dis-
tribution of information goods and services, such as news articles, entertainment
and other service reviews, and instructional materials. Digital information goods
are unusually con�gurable, and negotiations over the composition and prices for
bundles are a natural application for software agents. However, to do so will
require that broker agents be able to explore not just prices, but also locations
in a multi-dimensional bundled product space.
Very little is understood about strategic search over the joint information
good price and product space (that is, search for which products to o�er, in
which combinations, at what prices).
2
Agents competing in this space must
learn about the distribution of preferences across a heterogeneous and changing
customer population, must learn about the strategies being followed by their
competitors, and then must optimize their strategies to take into account the
new understanding of customers and competitors. Consumers, on the other hand,
must learn about the quantity, quality and price of bundled items o�ered by the
various producers. Learning and optimization are both di�cult search problems,
and in the market context they are closely related.
In earlier work we considered strategic pricing for bundles when all of the
relevant parameters are known by brokers [3]. We distinguished between condi-
tions in which �rms prefer to o�er comprehensive bundles at a single price and
those in which they prefer to sell items individually. Although ours is the �rst
work focusing on competitive strategies, the bundling and pricing of information
goods has been an active area in recent research [1], [2]. We have also studied
price dynamics when competing brokers search a restricted price and product
space [5, 6, 4]. Price instability, characterized as \price wars", was prevalent.
In this paper we examine a considerably more general problem, although
for this initial foray we maintain some simplifying restrictions. We envision an
agent economy with one information broker agent and many consumer agents.
The consumer agents are heterogeneous: they value each item di�erently, and
these values are drawn from a di�erent distribution for each agent. To set pro�t-
maximizing prices, the broker agent desires, but may not know, the parameters
of the consumers' valuation distributions. The consumers also may not know
their valuation parameters until they gain experience with the broker's o�ered
information goods. Therefore, both sides wish to learn. The broker can set prices
strategically, learning about consumer valuations from their purchasing behavior
at di�erent prices. The consumers can purchase strategically, learning about the
distribution of information good values by sampling.
1
See, e.g., [7].
2
For some recent work on multi-agent search, see, e.g., [14, 13].
This environment is very rich and will permit us to explore many interesting
questions. In this paper we focus on one surprising result: that when consumer
agents follow a plausible but overly naive learning strategy, even if the producer is
fully informed (but also somewhat naive), the economy can continuously degen-
erate with disastrous overall performance. We �nd a simple explanation for the
initial failure, and develop a simple improvement to the producer agent's strategy
that largely ameliorates the problem. But in the process we learn an important
lesson: dynamic market interactions when there is substantial uncertainty can
lead to pathological outcomes if agents are designed with \reasonable" but not
su�ciently adaptive strategies. Thus, in programmed agent environments it may
be essential to dramatically increase our understanding of adaptivity and learn-
ing if we want to obtain good aggregate outcomes.
This paper is the �rst in a series of studies directed towards understanding
and developing robust adaptive learning techniques that are e�ective for indi-
vidual agents and lead to acceptable collective (market) behavior. We are also
extending the model to study the much more common setting in which there is
competition among multiple producers of information goods. Of course, this set-
ting exacerbates the learning problem, since the producers need to learn about
each other's strategies in addition to learning about consumer valuations.
In the next section we discuss the model. We introduce two relevant dis-
tributions: �rst, a parameterized distribution g from which a given consumer's
valuations are drawn, and second, a distribution h describing the population of
consumers | the distribution from which an individual consumer's valuation
distribution parameters are drawn. For tractability, we assume that the broker
is restricted to two-part tari� pricing schemes [9]: it can charge consumer agents
a subscription price to examine the current items, and a per item price for each
item the consumer subsequently purchases.
In a general analysis section, we derive the optimal strategies for fully in-
formed broker and consumer agents as functions of the article value and con-
sumer type distributions g and h. Then in Section 4 we explore speci�c cases of
exploration and exploitation through analysis and simulation. When agents are
underinformed and engage in learning, we consider plausible (but not necessarily
fully optimal) agent strategies, and examine the resulting system outcomes. We
close with a summary of our current results and plans for future work.
2 Model
A single producer periodically (at discrete times t = 0; 1; 2; : : :) generates sets of
N articles. It sets a subscription fee F and a price schedule P = fP (1); : : : ; P (N )g,
where P (k) represents the price it charges for a subset of k 2 f1; : : : ; Ng of the
N articles.
At a given time t, each of M consumers are informed about (F;P ), and they
decide whether to subscribe. Then, each subscribing consumer receives abstracts
of all N articles, and uses them to assess its value w
j
from reading each article,
for j 2 f1; : : : ; Ng. We assume that these values are generated randomly ac-
cording to a distribution g(q
i
;w), where q
i
is the set of parameters that de�ne
the distribution for consumer i. The parameters q
i
that represent consumer i's
valuation distribution are themselves generated randomly prior to time 0 from a
distribution h(r; q). Once the q
i
parameters are generated for consumer i, they
remain �xed for the rest of time (though the agents may not know their true
values for a while, if ever).
After assessing the value of each article, a subscriber decides which articles
to purchase. It does this by choosing a set of articles K to maximize surplus
s = (
P
j2K
w
j
) � P (jKj). Henceforth we assume that the articles have been
sorted by consumer j so that the w
j
are ordered highest to lowest, and thus the
set K consists of the �rst k = jKj articles.
The subscription decision depends on consumer expectations. If a consumer
believes correctly that its valuations are drawn from a distribution with param-
eters q
i
, then its expected surplus from purchasing k articles would be
hs
k
i =
*
k
X
j=1
w
j
+
� P (k): (1)
The consumer can then derive the vector of values p
k
, the probabilities that any
k is the expected surplus maximizing number of articles. Then the consumer's
optimized expected surplus is
hsi =
N
X
k=1
p
k
hs
k
i : (2)
The consumer should subscribe if and only if the expectation hsi exceeds the
subscription fee F .
3
If consumer i does not know its own q
i
for the articles o�ered by this broker,
then values w^
j
are the consumer's beliefs drawn from a distribution g(
^
q
i
;w) after
reviewing the abstracts, where the
^
q
i
are the agent's current best estimate of the
valuation parameters q
i
. When the agent purchases and reads articles, it learns
their true valuesw
j
and can then use this sample information to update its beliefs
^
q
i
about the distribution of article values. Therefore, a good consumer strategy
should take into account the value of learning. For example, when uncertainty
about q
i
is high, the consumer might deliberately subscribe even when its esti-
mated surplus is less than F , simply to experience more articles to improve its
valuation estimates. Or, having subscribed the consumer might purchase more
articles than would maximize expected surplus from current reading.
Turning to the producer's problem, it can choose a subscription fee F and
a price schedule P in each period. These should be chosen to maximize some
3
We assume consumers are risk neutral. A risk averse agent would want to optimize
the expectation of a concave function of surplus, and optimal behavior would depend
on second and possibly higher moments of the induced distribution of surplus, not
just the expected surplus. See, e.g., [8].
function of expected current and future pro�ts, where current pro�t can be
expressed as:
� =
M
X
i=1
�
i
(F + P (k
i
)�C(k
i
)) : (3)
with �
i
= 1 if consumer i chooses to subscribe, and 0 otherwise. The cost of
delivering k
i
articles to consumer i is denoted as C(k
i
).
In performing its maximization, the producer must take into account the
e�ect of P and F on the consumer's subscription decisions, and the e�ect of P
on the distribution of k
i
across the set of subscribers. Higher prices will decrease
the number of subscribers and the expected number of articles purchased, of
course. To compute the optimal subscription fee and price schedule the broker
wants to know the distribution h(r; q) from which the consumers' parameters
were generated, and the consumer strategies for subscribing and purchasing.
Based on its current beliefs about r and consumer strategies, the broker can
simulate a consumer population and its responses to various (F;P ) schedules,
and then pick (F;P ) to maximize a value function. In practice, the broker may
not know the consumer type parameters r, nor the consumer strategies. Thus,
the broker may choose (F;P ) to balance current expected pro�t according to
equation (3), against the increase in expected future pro�t from learning about
consumer preferences by observing their behavior when confronted with varying
(F;P ) combinations.
3 General Analysis
In this section, we �rst analyze the expected surplus and number of articles
purchased per subscription period for a rational, fully-informed consumer with a
given valuation distribution g. Then, we derive an expression for a monopolistic
producer's expected pro�t as a function of its price schedule and the distribution
h of valuation parameters q across the consumer population, assuming that all
consumers are rational and fully-informed about their valuation distribution.
A rational, fully-informed producer would choose its price schedule so as to
maximize its expected pro�t.
Suppose that a given consumer has its valuations w drawn from a probabil-
ity density function g(q;w). (The distribution parameters q may vary from one
consumer to another.) For the sake of simplicity, we assume that the producer
constrains itself to a linear price schedule: P (k) = k�. Then a rational consumer
will purchase k articles, where k is the number of articles with valuations ex-
ceeding the threshold P (k) � P (k � 1) = �. The probability p
k
for exactly k
articles to have valuations w > � is
p
k
=
�
N
k
�
G(q;�)
N�k
(1�G(q;�))
k
(4)
where G(q;w) represents the cumulative distribution function that corresponds
to g(q;w), i.e. it is the probability for an article to be valued at less than w.
From this we can compute the expected number of articles purchased:
hk(q;�)i =
N
X
k=1
k p
k
= N [1� G(q;�)] (5)
where the last equality follows from simple manipulations of binomial coe�-
cients.
The expected surplus hsi can be obtained from Eq. 2, provided that we �rst
compute hs
k
i, the expected surplus given that exactly k articles prove to have
valuations w > �. The conditional probability distribution for a single draw
from g(q;w) given that w > � is
~g(q;w) =
g(q;w)
1�G(q;�)
�(w ��) (6)
where �(x) represents the step function, equal to 1 if x > 0 and 0 otherwise.
The expected valuation for this distribution is
�w(q;�) =
Z
1
�
dww ~g(q;w) (7)
The expected sum of k draws from this conditional distribution is k �w(q;�).
Subtracting the price P (k) = k�, we obtain:
hs
k
i = k( �w(�)��) (8)
Inserting this result into Eq. 2, we obtain:
hs(q;�)i =
N
X
k=1
p
k
k [ �w(q;�)��] = [ �w(q;�)��] hk(q;�)i (9)
= N [ �w(q;�)��] [1� G(q;�)]
Note that the consumer will subscribe if and only if hs(q;�)i is greater than the
subscription fee F .
Now we take the producer's perspective. If we assume (for simplicity) that
the cost of producing and delivering k items is C(k) = k , then the expected
pro�t is
h�(F;�)i =
M
X
i=1
�(hs(q
i
;�)i � F )(F + (�� ) hk(q
i
;�)i) (10)
�
Z
q
dqh(r; q)�(hs(q;�)i � F ) (F + (�� ) hk(q;�)i)
where h(r; q) is the probability distribution for the consumers' q parameters, as
de�ned previously. In e�ect, the last approximation replaces the actual realized
set of consumer distribution parameters fq
i
g with an ensemble average over all
possible realizations of a set of M consumers generated from the distribution
h, and this approximation grows increasingly accurate in the limit of large M .
Note that a producer that knows h can compute this pro�t landscape. A fully
knowledgeable and rational producer would set � and F so as to maximize
h�(F;�)i.
4 Rational and Bounded-Rational Players
In the remainder of the paper, we shall assume simple functional forms for g and
h, and explore what happens when the consumers and/or the producer must
learn these distributions. In particular, we suppose that g is a one-parameter
exponential distribution given by g(�;w) = �e
��w
, and that h(�) is a uniform
distribution in the interval from �
min
to �
max
.
4.1 Fully-informed producer and consumers
As a reference point, we analyze the case where the consumers are fully informed
about their individual values of �, the producer knows the distribution h(�), and
the producer and the consumers act so as to maximize their expected gain.
Integrating g, we obtain the cumulative distribution G(�;w) = 1 � e
��w
.
From Eq. 6 we can compute the conditional distribution
~g(�;w) =
g(�;w)
1�G(�)
�(w ��) (11)
= �e
��(w��)
�(w ��):
The average valuation for this conditional distribution is �w(�;�) = � + �
�1
.
Using Eqs. 5 and 9, we obtain the expected number of purchased articles and
the expected surplus (assuming the consumer subscribes):
hk(�)i = Ne
���
(12)
hs(�)i =
N
�
e
���
:
To compute the producer's expected pro�t as a function of � and F , we can
substitute Eqs. 12 into Eq.10, which yields:
h�(F;�)i =
1
(�
max
� �
min
)
Z
�
0
�
min
d� (F + N (�� )e
���
) (13)
=
(�
0
� �
min
)F +N (1�
�
)
�
e
���
min
� e
���
0
�
(�
max
� �
min
)
;
where �
0
is de�ned as �
0
= min(�
max
; ~�) and ~� is de�ned as the unique solution
to Ne
�~��
= F ~�.
profit 0.4 0.3 0.2 0.1
0.005
0 0.25
0.5 0.75
1 0 0.5 1 1.5
2 2.5 3
0
0.2
0.4
fee
price
profit profit
0.225 0.2 0.1
0.005
0 0.25
0.5 0.75
1 0 0.5 1 1.5
2 2.5 3
-0.5
-0.3
-0.1
0.1
0.3
fee
price
profit
Fig. 1. Pro�t landscape �(f;�). h is a uniform distribution with �
min
= 0:5, �
min
= 2:0.
a) Production cost = 0:1. b) Production cost = 0:5.
Eq. 13 can be visualized as a pro�t landscape in which the expected pro�t
is plotted as a function of F and �. It is convenient to de�ne a normalized
expected pro�t � = �=N and a normalized fee f = F=N . Fig. 1 illustrates the
landscape for two di�erent production costs: = 0:1 and = 0:5.
In each such landscape, there are two ridges. The lower ridge, which demar-
cates the boundary beyond which the producer attracts no consumers and thus
makes no pro�t, is de�ned by e
��
min
�
= f�
min
. The upper ridge is described
by a piecewise joining of two nonlinear curves, the simpler one being described
by the relation e
��
max
�
= f�
max
. This portion of the ridge has the following
characteristics:
� it is formed by a discontinuous derivative with respect to f and � (it
changes abruptly from positive to negative)
� for price settings along this ridge, all consumers subscribe
� if the production cost <
crit
, the optimal setting of (f;�) occurs along
this part of the ridge. For the examples of Fig. 1, the critical production
cost
crit
is found to be approximately 0.28
4
, so that Fig. 1a ( = 0:1)
is in this regime, while Fig. 1b ( = 0:5) is not.
The other portion of the ridge is de�ned by a more complex nonlinear relation
between f and�. It is less sharp, resulting from derivatives with respect to f and
� being zero rather than jumping discontinuously from positive to negative.
5
4
It can be shown that, for exponential g and uniform h,
crit
=
�(
crit
)�
min
�
max
+�
min
�
2
max
�
�
min
�
2
max
, which is positive when �
min
> 0.
5
These characteristics only pertain to the case where g is exponential and h is uniform;
we have explored other combinations of functional forms for g and h that yield pro�t
landscapes that are topographically di�erent. For example, if g is exponential and
Figure 2 shows the dependence of the optimal f and � upon the production
cost . There is a discontinuous derivative at =
crit
, due to the switchover
between the two nonlinear curves that de�ne the upper ridge in the landscape.
As one might guess, the pro�t � decreases monotonically with the production
cost. The proportion p of consumers that subscribe is 1 for all �
crit
; for
exceeding this threshold the proportion of subscribers is strictly less than one,
and is given by
~���
min
�
max
��
min
.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0
0.2
0.4
0.6
0.8
1.0
µmin = 0.5 µmax = 2.0
∆
f
pi
p
production cost γ
∆,
f
Optimal ∆ and f vs. γ
Fig. 2. Optimal �, f , �, and p vs. , where h is a uniform distribution with �
min
= 0:5,
�
min
= 2:0.
4.2 Uninformed consumers
For a variety of reasons, a consumer may have to rely on adaptive estimates of
its valuation distribution, g. Suppose for example that consumer i knows that
g is an exponential distribution, but does not know its individual parameter �
i
.
Depending on the consumer's beliefs about the dynamics of the environment, it
might wish to place more or less weight on recent observations. Since the mean of
the distribution g(�;w) = � exp
��w
is 1=�, one reasonable and exible approach
to estimating � is to start with an assumed prior �
0
and after each period of
subscription to update the estimate �^ according to
�^
t+1
= (� �m
t
+ (1� �)(�^
t
)
�1
)
�1
; (14)
where m
t
is the mean of the N valuations received during the subscription
period t, and � is the consumer's \ ightiness" factor. This factor could be set to
h consists of a number of well-separated mass points, the landscape can contain
multiple ridges and peaks.
a constant, or be time-varying; in fact, by setting � = 1=t, the consumer weights
all observations equally.
Suppose that the producer knows the distribution h, and sets f and � to
the \optimal" values given by Fig. 2. (For = 0:1, these values are (f;�) =
(0:30878; 0:24098), at which the \ideal" pro�t � = 0:4137 and the fraction of sub-
scribers p = 1.) Furthermore, suppose that the consumers all start with exactly
the right estimates of their � parameters, but that they update their estimates
in accordance with Eq. 14. Fig. 3 illustrates what happens to consumer pro�ts
and subscription rates as a function of time for two values of �. In both cases,
the proportion p of subscribed consumers continues to diminish inde�nitely, and,
correspondingly, so does the pro�t �.
0
0.2
0.4
0.6
0.8
1
0 40 80 120 160 200
Pr of
it, C
on su
m er
iteration
profit consumers
profit = 0.4137
0
0.2
0.4
0.6
0.8
1
0 40 80 120 160 200
Pr of
it, C
on su
m er
iteration
profit consumers
profit = 0.4137
Fig. 3. Pro�t � and proportion of subscribed consumers p vs. time (in subscription
periods) when consumers estimate their � values, and the ightiness parameter is a)
� = 0:1 and b) � = 0:9. Consumer leakage occurs for � = 0:1, and is even more
pronounced for � = 0:9. Pro�ts diminish inde�nitely, although the rate of reduction
slows with time. The market consists of M = 1000 consumers and one seller o�ering
N = 10 articles per subscription period. Dashed horizontal line indicates ideal pro�t
of 0.4137.
Fig. 4 o�ers further insight into consumer leakage. It shows the expected
average pro�t over 200 subscription periods for a wide range of f and � (not
just the optimal value). These are represented as pro�t landscapes for � =
0:1 and � = 0:9. Both landscapes have become peaked at lower values of f .
In other words, if the producer insists on holding to a �xed price schedule, it
will optimize pro�ts by setting the subscription fee lower than it could charge
perfectly informed consumers, thereby encouraging overly pessimistic consumers
to stay in the market longer. It partially compensates by raising � somewhat.
A little thought suggests that, regardless of , �, or other such parameters,
consumer leakage will eventually lead to complete market failure for any �xed
setting of f and �. As consumers continually update their �^, they are in e�ect
conducting a random walk about the correct value of �. If it should ever happen
that the consumer's �^ rises above a value such that hs(�;�)i falls below F , the
profit 0.325 0.3 0.2 0.1
0.005
0 0.25
0.5 0.75
1 0 0.5 1 1.5
2 2.5 3
0
0.2
0.4
fee
price
profit profit
0.2 0.1
0.005
0 0.25
0.5 0.75
1 0 0.5 1 1.5
2 2.5 3
0
0.2
0.4
fee
price
profit
Fig. 4. Pro�t landscape after consumer leakage has occurred. Mean pro�t per article
per consumer as a function of F and �, after 200 subscription periods with F and
� at the values that would be optimal for perfectly informed consumers. The market
consists ofM = 1000 consumers and one seller o�ering N = 10 articles per subscription
period. a) � = 0:1. b) � = 0:9.
consumer will become permanently disenfranchised from the market. In other
words, the value of �^
crit
at which consumers are discouraged from subscribing
(which, from Eq. 12, can be computed from
1
�^
crit
e
��^
crit
�
< f), acts as an ab-
sorbing boundary. In the next subsection, we will discuss how a producer might
use dynamic pricing and optimization to prevent unchecked pro�t erosion.
4.3 Solving the leakage problem
Consumer leakage hurts both the producer and the consumers, and therefore all
players have an incentive to counteract it. Both consumers and producers can do
this by putting more emphasis on exploration as opposed to (pure) exploitation.
Consumers could use a variety of schemes; for example, they could choose to
subscribe at random with a �nite probability even if their expected surplus is less
than the subscription fee, and this probability could diminish monotonically as
the di�erence between these quantities increases. Producers could �ght leakage
by temporarily decreasing prices to resurrect consumers who have mistakenly
disenfranchised themselves, in hopes that with additional samples the consumers
will increase their estimated surplus to levels that can support higher prices. One
might expect that consumers could be enticed back into the market even with
small discounts, since, once their �^ ventures into a realm where subscription
appears to be unpro�table, it is frozen at this just-barely-unpro�table value.
6
6
Economists have studied behavior of this sort in order to explain temporal price
dispersion, such as \sales" and temporary discounts. A mixture of reasons have
been modeled, including attempts to price discriminate between better and worse
Here we focus on the producer's strategy for enticing overly pessimistic con-
sumers back into the market. (This is not to deny that consumers' exploration
strategies merit serious study.) From our study of consumer leakage, it is appar-
ent that the producer's strategy must involve dynamic pricing, and that it must
cope with a pro�t landscape that changes dynamically due to shifts caused by
consumers' ongoing attempts to learn an estimate of g. It also seems most likely
that the pricing strategy would involve stochastic search on this dynamically
changing landscape, rather than following some pre-planned schedule.
The dynamically changing nature of the pro�t landscape in our problem lim-
its the applicability of standard stochastic optimization techniques. For example,
approaches such as simulated annealing implicitly assume that the search is be-
ing conducted on a static landscape as the value of its temperature parameter
is lowered. While more sophisticated approaches that are currently oriented to-
wards static landscapes might be modi�ed to dynamic ones, we focus here on a
very simple iterative random-mutation hill-climbing method (RMHC). At each
iteration of the RMHC method (i.e. following each subscription period), the pro-
ducer tries a new f and a new �. If the new settings result in increased pro�t,
the producer continues to move in the same direction in the f-� landscape in
the next trial. Otherwise, it backtracks to its previous best point and makes
small adjustments to f and �. The step size of the adjustment is chosen from
an exponential distribution with a mean 1=� for both f and �. To take into
account the dynamic nature of the landscape being maximized, the hill climber
has �nite memory and can remember the previous best point visited within the
last T time steps. We have observed that, for overly large values of T , the pro-
ducer can get mired for a long time in regions of the (f;�) landscape that have
become less pro�table due to evolving consumer expectations.
Fig. 5 illustrates how the pro�t � and the subscribed consumer population
p evolve with time for a particular setting of � = 0:5 and T = 20. For these
settings (and for a fairly broad range around them), the RMHC algorithm pre-
vents long-term consumer leakage and pro�t erosion, even when the consumers
are extremely ighty (� = 0:9). Frequent lowering of prices continually pulls
consumers back into the market so they can keep sampling articles. Ironically,
the RMHC's noisy exploration of the pro�t landscape helps to keep that land-
scape relatively stable. While the pro�t never attains the value of 0.41 that it
would have if the consumers were to keep their �^'s �xed at their exact values,
the producer is still able to extract a healthy, reasonably stable pro�t of roughly
0.25. Fig. 6 illustrates the noisy trajectory of f and �.
5 Conclusions
Commerce in information goods is one of the earliest emerging applications for
intelligent agents in commerce. However, the fundamental characteristics of in-
formation goods mean that they can and likely will be o�ered in widely varying
informed customers, and inducing potential customers to bear some search costs to
�nd a better price or product. See [11], [10], and [12].
0 0.25
0.5
0.75
1
0 250 500 750 1000
pr of
it, c
on su
m er
s
iter
0
0.25
0.5
0.75
1
0 250 500 750 1000
(av g)
pro fit,
fe e,
pri ce
iter
profit fee
price profit = 0.4137
Fig. 5. Simulation run with producer using RMHC algorithm with memory of T = 20
subscription periods and step-size parameter � = 0:5. = 0:1, � = 0:9. a) Propor-
tion p of subscribed consumers. b) Moving average of f , � and � (integrated over 20
subscription periods).
0
0.25
0.5
0.75
1
1.25
1.5
0 0.2 0.4 0.6 0.8 1
pr ice
fee
settings
Fig. 6. Trajectory of f and � taken by the RMHC algorithm for the simulation run
shown above.
con�gurations. Participating agents will need to deal with uncertainty about
both prices and location in multi-dimensional product space. Thus, studying
the behavior of learning agents is central to understanding and designing for
agent-based information economies. Since uncertainty will exist on both sides of
transactions, and interactions between learning agents that are negotiating and
transacting with other learning agents may lead to unexpected dynamics, it is
important to study two-sided learning.
We presented a simple but powerful model of an information bundling econ-
omy with a single producer and multiple consumer agents. We then explored
the pricing and purchasing behavior of these agents when articles can be bun-
dled. In this initial exploration, we studied the dynamics of this economy when
consumer agents are uninformed about the distribution of article values. We
discovered that a reasonable albeit naive consumer learning strategy can have a
profound in uence on market behavior | in this case, a strikingly bad in uence.
Our consumer and producer agents were rather naive in our �rst learning
experiments. This could be viewed as a criticism of our modeling. However, es-
pecially early in the development of adaptive agent intelligence, it may well be
that agent-based markets are quite vulnerable to odd behavior and dysfunctional
dynamics of the sort we observed. Our consumer agents did not recognize the
option value of new information, and thus su�ered by not undertaking su�cient
exploration relative to exploitation. Our producer agent did not initially adapt
to the pathological dynamics induced by the consumer agent naivete, and thus
su�ered by relying too con�dently on its \perfect" but static knowledge. Al-
though it was fairly easy for us to see what was going wrong, and to modify
the producer agent in a simple way that ameliorated much of the problem, our
environment is arti�cially simple and static. In more realistic settings it may be
quite di�cult for even relatively intelligent agents to adapt to emergent patholo-
gies. Human markets may not be as susceptible because human behavior is less
rote and more re ective. The lesson for agent design is to search for strategies
that are dynamically robust and adaptive in the face of substantial uncertainty.
We have started to explore how to make our simple mechanism more robust
in realistic settings. For example, the hill climbing algorithm is likely to get stuck
at local optima. For the uniform h there is a single peak in the pro�t landscape,
but that is not at all general. Several powerful optimization techniques, such as
simulated annealing, exist for static landscapes. Extensions to these that handle
dynamically changing, noisy landscapes, may lead to robustly adaptive agent
learning strategies.
We have an active agenda of continuing work on this topic. For example, we
have begun to consider less naive consumer strategies, that balance exploration
against exploitation. We are considering producer strategies that adapt based on
the number of recent subscribers relative to the producer's model of the optimal
number of subscribers. Perhaps most challenging but essential to a more general
understanding of the problem is the extension of our work into an economy
with multiple producers who are underinformed about each other's competitive
strategies as well as about consumer valuations.
In this and earlier work we have found that initial plausible but simple de-
signs of economically-intelligent agents lead to dynamic market interactions that
can be surprising and unsuccessful. The value of intelligent agents in electronic
commerce will depend on the ability to understand the problems of learning
and adaptivity, and to design agents that interact robustly in the presence of
substantial uncertainty about both parameters and the strategies of other un-
derinformed agents.
References
1. Y. Bakos and E. Brynjolfsson. Bundling information goods: Pricing, pro�ts and
e�ciency. In D. Hurley, B. Kahin, and H. Varian, editors, The Economics of Digital
Information Goods. MIT Press, Cambridge, Massachusetts, 1998.
2. J. C. Chuang and M. A. Sirbu. Network delivery of information goods: Opti-
mal pricing of articles and subscriptions. In D. Hurley, B. Kahin, and H. Varian,
editors, The Economics of Digital Information Goods. MIT Press, Cambridge, Mas-
sachusetts, 1998.
3. S. Fay and J. K. MacKie-Mason. Competition between �rms that bundle.
Manuscript, Dept. of Economics, University of Michigan, 1998.
4. J. E. Hanson and J. O. Kephart. Spontaneous specialization in a free-market
economy of agents. In Proceedings of Workshop on Arti�cial Societies and Compu-
tational Markets at the Second International Conference on Autonomous Agents,
May 1998.
5. J. O. Kephart, J. E. Hanson, D. W. Levine, B. N. Grosof, J. Sairamesh, R. B. Segal,
and S. R. White. Dynamics of an information-�ltering economy. In Proceedings
of Second International Workshop on Cooperative Information Agents (CIA-98),
Paris, July 1998.
6. J. O. Kephart, J. E. Hanson, and J. Sairamesh. Price-war dynamics in a free-
market economy of software agents. In Proceedings of Alife VI, June 1998.
7. J. K. MacKie-Mason and H. R. Varian. Some economics of the internet. In
W. Sichel, editor, Networks, Infrastructure and the New Task for Regulation. Uni-
versity of Michigan Press, Ann Arbor, Michigan, 1996.
8. A. Mas-Colell, M. Whinston, and J. Green. Microeconomic Theory. Oxford Uni-
versity Press, 1995.
9. W. Oi. A Disneyland dilemma: Two-part tari�s for a Mickey Mouse monopoly.
The Quarterly Journal of Economics, 85(1):77{96, February 1971.
10. S. Salop and J. Stiglitz. Bargains and ripo�s: A model of monopolistically com-
petitive price dispersion. Review of Economic Studies, 44:493{510, October 1977.
11. Y. Shilony. Mixed pricing in oligopoly. Journal of Economic Theory, 14:373{388,
April 1977.
12. H. R. Varian. A model of sales. American Economic Review, 70:651{659, Septem-
ber 1980.
13. J. M. Vidal and E. H. Durfee. Learning nested agent models in an information
economy. Journal of Experimental and Theoretical Arti�cial Intelligence (special
issue on learning in distributed arti�cial intelligence systems), 10(3):291{308, 1998.
14. J. M. Vidal and E. H. Durfee. The moving target problem in multi-agent learn-
ing. In Proceedings of the Third International Conference on Multi-agent Systems
(ICMAS98), 1998.Two-sided learning in an agent economy for
information bundles
Je�rey O. Kephart
1
, Rajarshi Das
1
, and Je�rey K. MacKie-Mason
2
1
IBM Research
PO Box 704, Yorktown Heights, NY 10598
fkephart,rajarshig@watson.ibm.com
2
School of Information and Department of Economics
University of Michigan, Ann Arbor, MI 48109
jmm@umich.edu
Abstract. Commerce in information goods is one of the earliest emerg-
ing applications for intelligent agents in commerce. However, the funda-
mental characteristics of information goods mean that they can and likely
will be o�ered in widely varying con�gurations. Participating agents
will need to deal with uncertainty about both prices and location in
multi-dimensional product space. Thus, studying the behavior of learning
agents is central to understanding and designing for agent-based informa-
tion economies. Since uncertainty will exist on both sides of transactions,
and interactions between learning agents that are negotiating and trans-
acting with other learning agents may lead to unexpected dynamics, it
is important to study two-sided learning.
We present a simple but powerful model of an information bundling
economy with a single producer and multiple consumer agents. We ex-
plore the pricing and purchasing behavior of these agents when articles
can be bundled. In this initial exploration, we study the dynamics of this
economy when consumer agents are uninformed about the distribution of
article values. We discover that a reasonable albeit naive consumer learn-
ing strategy can lead to disastrous market behavior. We �nd a simple
explanation for this market failure, and develop a simple improvement
to the producer agent's strategy that largely ameliorates the problem.
But in the process we learn an important lesson: dynamic market in-
teractions when there is substantial uncertainty can lead to pathological
outcomes if agents are designed with \reasonable" but not su�ciently
adaptive strategies. Thus, in programmed agent environments it may be
essential to dramatically increase our understanding of adaptivity and
learning if we want to obtain good aggregate outcomes.
Keywords: Information economy, information bundling, two-
sided learning, economic agents.
1 Introduction
Extraordinary rates of advance in computation and communication technolo-
gies
1
have fueled the much-noted proliferation of electronic commerce. Within a
few years, we anticipate that software agents will participate in a wide variety of
commercial transactions, and may even become economic players in their own
right [5]. One important domain for agent economies is the production and dis-
tribution of information goods and services, such as news articles, entertainment
and other service reviews, and instructional materials. Digital information goods
are unusually con�gurable, and negotiations over the composition and prices for
bundles are a natural application for software agents. However, to do so will
require that broker agents be able to explore not just prices, but also locations
in a multi-dimensional bundled product space.
Very little is understood about strategic search over the joint information
good price and product space (that is, search for which products to o�er, in
which combinations, at what prices).
2
Agents competing in this space must
learn about the distribution of preferences across a heterogeneous and changing
customer population, must learn about the strategies being followed by their
competitors, and then must optimize their strategies to take into account the
new understanding of customers and competitors. Consumers, on the other hand,
must learn about the quantity, quality and price of bundled items o�ered by the
various producers. Learning and optimization are both di�cult search problems,
and in the market context they are closely related.
In earlier work we considered strategic pricing for bundles when all of the
relevant parameters are known by brokers [3]. We distinguished between condi-
tions in which �rms prefer to o�er comprehensive bundles at a single price and
those in which they prefer to sell items individually. Although ours is the �rst
work focusing on competitive strategies, the bundling and pricing of information
goods has been an active area in recent research [1], [2]. We have also studied
price dynamics when competing brokers search a restricted price and product
space [5, 6, 4]. Price instability, characterized as \price wars", was prevalent.
In this paper we examine a considerably more general problem, although
for this initial foray we maintain some simplifying restrictions. We envision an
agent economy with one information broker agent and many consumer agents.
The consumer agents are heterogeneous: they value each item di�erently, and
these values are drawn from a di�erent distribution for each agent. To set pro�t-
maximizing prices, the broker agent desires, but may not know, the parameters
of the consumers' valuation distributions. The consumers also may not know
their valuation parameters until they gain experience with the broker's o�ered
information goods. Therefore, both sides wish to learn. The broker can set prices
strategically, learning about consumer valuations from their purchasing behavior
at di�erent prices. The consumers can purchase strategically, learning about the
distribution of information good values by sampling.
1
See, e.g., [7].
2
For some recent work on multi-agent search, see, e.g., [14, 13].
This environment is very rich and will permit us to explore many interesting
questions. In this paper we focus on one surprising result: that when consumer
agents follow a plausible but overly naive learning strategy, even if the producer is
fully informed (but also somewhat naive), the economy can continuously degen-
erate with disastrous overall performance. We �nd a simple explanation for the
initial failure, and develop a simple improvement to the producer agent's strategy
that largely ameliorates the problem. But in the process we learn an important
lesson: dynamic market interactions when there is substantial uncertainty can
lead to pathological outcomes if agents are designed with \reasonable" but not
su�ciently adaptive strategies. Thus, in programmed agent environments it may
be essential to dramatically increase our understanding of adaptivity and learn-
ing if we want to obtain good aggregate outcomes.
This paper is the �rst in a series of studies directed towards understanding
and developing robust adaptive learning techniques that are e�ective for indi-
vidual agents and lead to acceptable collective (market) behavior. We are also
extending the model to study the much more common setting in which there is
competition among multiple producers of information goods. Of course, this set-
ting exacerbates the learning problem, since the producers need to learn about
each other's strategies in addition to learning about consumer valuations.
In the next section we discuss the model. We introduce two relevant dis-
tributions: �rst, a parameterized distribution g from which a given consumer's
valuations are drawn, and second, a distribution h describing the population of
consumers | the distribution from which an individual consumer's valuation
distribution parameters are drawn. For tractability, we assume that the broker
is restricted to two-part tari� pricing schemes [9]: it can charge consumer agents
a subscription price to examine the current items, and a per item price for each
item the consumer subsequently purchases.
In a general analysis section, we derive the optimal strategies for fully in-
formed broker and consumer agents as functions of the article value and con-
sumer type distributions g and h. Then in Section 4 we explore speci�c cases of
exploration and exploitation through analysis and simulation. When agents are
underinformed and engage in learning, we consider plausible (but not necessarily
fully optimal) agent strategies, and examine the resulting system outcomes. We
close with a summary of our current results and plans for future work.
2 Model
A single producer periodically (at discrete times t = 0; 1; 2; : : :) generates sets of
N articles. It sets a subscription fee F and a price schedule P = fP (1); : : : ; P (N )g,
where P (k) represents the price it charges for a subset of k 2 f1; : : : ; Ng of the
N articles.
At a given time t, each of M consumers are informed about (F;P ), and they
decide whether to subscribe. Then, each subscribing consumer receives abstracts
of all N articles, and uses them to assess its value w
j
from reading each article,
for j 2 f1; : : : ; Ng. We assume that these values are generated randomly ac-
cording to a distribution g(q
i
;w), where q
i
is the set of parameters that de�ne
the distribution for consumer i. The parameters q
i
that represent consumer i's
valuation distribution are themselves generated randomly prior to time 0 from a
distribution h(r; q). Once the q
i
parameters are generated for consumer i, they
remain �xed for the rest of time (though the agents may not know their true
values for a while, if ever).
After assessing the value of each article, a subscriber decides which articles
to purchase. It does this by choosing a set of articles K to maximize surplus
s = (
P
j2K
w
j
) � P (jKj). Henceforth we assume that the articles have been
sorted by consumer j so that the w
j
are ordered highest to lowest, and thus the
set K consists of the �rst k = jKj articles.
The subscription decision depends on consumer expectations. If a consumer
believes correctly that its valuations are drawn from a distribution with param-
eters q
i
, then its expected surplus from purchasing k articles would be
hs
k
i =
*
k
X
j=1
w
j
+
� P (k): (1)
The consumer can then derive the vector of values p
k
, the probabilities that any
k is the expected surplus maximizing number of articles. Then the consumer's
optimized expected surplus is
hsi =
N
X
k=1
p
k
hs
k
i : (2)
The consumer should subscribe if and only if the expectation hsi exceeds the
subscription fee F .
3
If consumer i does not know its own q
i
for the articles o�ered by this broker,
then values w^
j
are the consumer's beliefs drawn from a distribution g(
^
q
i
;w) after
reviewing the abstracts, where the
^
q
i
are the agent's current best estimate of the
valuation parameters q
i
. When the agent purchases and reads articles, it learns
their true valuesw
j
and can then use this sample information to update its beliefs
^
q
i
about the distribution of article values. Therefore, a good consumer strategy
should take into account the value of learning. For example, when uncertainty
about q
i
is high, the consumer might deliberately subscribe even when its esti-
mated surplus is less than F , simply to experience more articles to improve its
valuation estimates. Or, having subscribed the consumer might purchase more
articles than would maximize expected surplus from current reading.
Turning to the producer's problem, it can choose a subscription fee F and
a price schedule P in each period. These should be chosen to maximize some
3
We assume consumers are risk neutral. A risk averse agent would want to optimize
the expectation of a concave function of surplus, and optimal behavior would depend
on second and possibly higher moments of the induced distribution of surplus, not
just the expected surplus. See, e.g., [8].
function of expected current and future pro�ts, where current pro�t can be
expressed as:
� =
M
X
i=1
�
i
(F + P (k
i
)�C(k
i
)) : (3)
with �
i
= 1 if consumer i chooses to subscribe, and 0 otherwise. The cost of
delivering k
i
articles to consumer i is denoted as C(k
i
).
In performing its maximization, the producer must take into account the
e�ect of P and F on the consumer's subscription decisions, and the e�ect of P
on the distribution of k
i
across the set of subscribers. Higher prices will decrease
the number of subscribers and the expected number of articles purchased, of
course. To compute the optimal subscription fee and price schedule the broker
wants to know the distribution h(r; q) from which the consumers' parameters
were generated, and the consumer strategies for subscribing and purchasing.
Based on its current beliefs about r and consumer strategies, the broker can
simulate a consumer population and its responses to various (F;P ) schedules,
and then pick (F;P ) to maximize a value function. In practice, the broker may
not know the consumer type parameters r, nor the consumer strategies. Thus,
the broker may choose (F;P ) to balance current expected pro�t according to
equation (3), against the increase in expected future pro�t from learning about
consumer preferences by observing their behavior when confronted with varying
(F;P ) combinations.
3 General Analysis
In this section, we �rst analyze the expected surplus and number of articles
purchased per subscription period for a rational, fully-informed consumer with a
given valuation distribution g. Then, we derive an expression for a monopolistic
producer's expected pro�t as a function of its price schedule and the distribution
h of valuation parameters q across the consumer population, assuming that all
consumers are rational and fully-informed about their valuation distribution.
A rational, fully-informed producer would choose its price schedule so as to
maximize its expected pro�t.
Suppose that a given consumer has its valuations w drawn from a probabil-
ity density function g(q;w). (The distribution parameters q may vary from one
consumer to another.) For the sake of simplicity, we assume that the producer
constrains itself to a linear price schedule: P (k) = k�. Then a rational consumer
will purchase k articles, where k is the number of articles with valuations ex-
ceeding the threshold P (k) � P (k � 1) = �. The probability p
k
for exactly k
articles to have valuations w > � is
p
k
=
�
N
k
�
G(q;�)
N�k
(1�G(q;�))
k
(4)
where G(q;w) represents the cumulative distribution function that corresponds
to g(q;w), i.e. it is the probability for an article to be valued at less than w.
From this we can compute the expected number of articles purchased:
hk(q;�)i =
N
X
k=1
k p
k
= N [1� G(q;�)] (5)
where the last equality follows from simple manipulations of binomial coe�-
cients.
The expected surplus hsi can be obtained from Eq. 2, provided that we �rst
compute hs
k
i, the expected surplus given that exactly k articles prove to have
valuations w > �. The conditional probability distribution for a single draw
from g(q;w) given that w > � is
~g(q;w) =
g(q;w)
1�G(q;�)
�(w ��) (6)
where �(x) represents the step function, equal to 1 if x > 0 and 0 otherwise.
The expected valuation for this distribution is
�w(q;�) =
Z
1
�
dww ~g(q;w) (7)
The expected sum of k draws from this conditional distribution is k �w(q;�).
Subtracting the price P (k) = k�, we obtain:
hs
k
i = k( �w(�)��) (8)
Inserting this result into Eq. 2, we obtain:
hs(q;�)i =
N
X
k=1
p
k
k [ �w(q;�)��] = [ �w(q;�)��] hk(q;�)i (9)
= N [ �w(q;�)��] [1� G(q;�)]
Note that the consumer will subscribe if and only if hs(q;�)i is greater than the
subscription fee F .
Now we take the producer's perspective. If we assume (for simplicity) that
the cost of producing and delivering k items is C(k) = k , then the expected
pro�t is
h�(F;�)i =
M
X
i=1
�(hs(q
i
;�)i � F )(F + (�� ) hk(q
i
;�)i) (10)
�
Z
q
dqh(r; q)�(hs(q;�)i � F ) (F + (�� ) hk(q;�)i)
where h(r; q) is the probability distribution for the consumers' q parameters, as
de�ned previously. In e�ect, the last approximation replaces the actual realized
set of consumer distribution parameters fq
i
g with an ensemble average over all
possible realizations of a set of M consumers generated from the distribution
h, and this approximation grows increasingly accurate in the limit of large M .
Note that a producer that knows h can compute this pro�t landscape. A fully
knowledgeable and rational producer would set � and F so as to maximize
h�(F;�)i.
4 Rational and Bounded-Rational Players
In the remainder of the paper, we shall assume simple functional forms for g and
h, and explore what happens when the consumers and/or the producer must
learn these distributions. In particular, we suppose that g is a one-parameter
exponential distribution given by g(�;w) = �e
��w
, and that h(�) is a uniform
distribution in the interval from �
min
to �
max
.
4.1 Fully-informed producer and consumers
As a reference point, we analyze the case where the consumers are fully informed
about their individual values of �, the producer knows the distribution h(�), and
the producer and the consumers act so as to maximize their expected gain.
Integrating g, we obtain the cumulative distribution G(�;w) = 1 � e
��w
.
From Eq. 6 we can compute the conditional distribution
~g(�;w) =
g(�;w)
1�G(�)
�(w ��) (11)
= �e
��(w��)
�(w ��):
The average valuation for this conditional distribution is �w(�;�) = � + �
�1
.
Using Eqs. 5 and 9, we obtain the expected number of purchased articles and
the expected surplus (assuming the consumer subscribes):
hk(�)i = Ne
���
(12)
hs(�)i =
N
�
e
���
:
To compute the producer's expected pro�t as a function of � and F , we can
substitute Eqs. 12 into Eq.10, which yields:
h�(F;�)i =
1
(�
max
� �
min
)
Z
�
0
�
min
d� (F + N (�� )e
���
) (13)
=
(�
0
� �
min
)F +N (1�
�
)
�
e
���
min
� e
���
0
�
(�
max
� �
min
)
;
where �
0
is de�ned as �
0
= min(�
max
; ~�) and ~� is de�ned as the unique solution
to Ne
�~��
= F ~�.
profit 0.4 0.3 0.2 0.1
0.005
0 0.25
0.5 0.75
1 0 0.5 1 1.5
2 2.5 3
0
0.2
0.4
fee
price
profit profit
0.225 0.2 0.1
0.005
0 0.25
0.5 0.75
1 0 0.5 1 1.5
2 2.5 3
-0.5
-0.3
-0.1
0.1
0.3
fee
price
profit
Fig. 1. Pro�t landscape �(f;�). h is a uniform distribution with �
min
= 0:5, �
min
= 2:0.
a) Production cost = 0:1. b) Production cost = 0:5.
Eq. 13 can be visualized as a pro�t landscape in which the expected pro�t
is plotted as a function of F and �. It is convenient to de�ne a normalized
expected pro�t � = �=N and a normalized fee f = F=N . Fig. 1 illustrates the
landscape for two di�erent production costs: = 0:1 and = 0:5.
In each such landscape, there are two ridges. The lower ridge, which demar-
cates the boundary beyond which the producer attracts no consumers and thus
makes no pro�t, is de�ned by e
��
min
�
= f�
min
. The upper ridge is described
by a piecewise joining of two nonlinear curves, the simpler one being described
by the relation e
��
max
�
= f�
max
. This portion of the ridge has the following
characteristics:
� it is formed by a discontinuous derivative with respect to f and � (it
changes abruptly from positive to negative)
� for price settings along this ridge, all consumers subscribe
� if the production cost <
crit
, the optimal setting of (f;�) occurs along
this part of the ridge. For the examples of Fig. 1, the critical production
cost
crit
is found to be approximately 0.28
4
, so that Fig. 1a ( = 0:1)
is in this regime, while Fig. 1b ( = 0:5) is not.
The other portion of the ridge is de�ned by a more complex nonlinear relation
between f and�. It is less sharp, resulting from derivatives with respect to f and
� being zero rather than jumping discontinuously from positive to negative.
5
4
It can be shown that, for exponential g and uniform h,
crit
=
�(
crit
)�
min
�
max
+�
min
�
2
max
�
�
min
�
2
max
, which is positive when �
min
> 0.
5
These characteristics only pertain to the case where g is exponential and h is uniform;
we have explored other combinations of functional forms for g and h that yield pro�t
landscapes that are topographically di�erent. For example, if g is exponential and
Figure 2 shows the dependence of the optimal f and � upon the production
cost . There is a discontinuous derivative at =
crit
, due to the switchover
between the two nonlinear curves that de�ne the upper ridge in the landscape.
As one might guess, the pro�t � decreases monotonically with the production
cost. The proportion p of consumers that subscribe is 1 for all �
crit
; for
exceeding this threshold the proportion of subscribers is strictly less than one,
and is given by
~���
min
�
max
��
min
.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0
0.2
0.4
0.6
0.8
1.0
µmin = 0.5 µmax = 2.0
∆
f
pi
p
production cost γ
∆,
f
Optimal ∆ and f vs. γ
Fig. 2. Optimal �, f , �, and p vs. , where h is a uniform distribution with �
min
= 0:5,
�
min
= 2:0.
4.2 Uninformed consumers
For a variety of reasons, a consumer may have to rely on adaptive estimates of
its valuation distribution, g. Suppose for example that consumer i knows that
g is an exponential distribution, but does not know its individual parameter �
i
.
Depending on the consumer's beliefs about the dynamics of the environment, it
might wish to place more or less weight on recent observations. Since the mean of
the distribution g(�;w) = � exp
��w
is 1=�, one reasonable and exible approach
to estimating � is to start with an assumed prior �
0
and after each period of
subscription to update the estimate �^ according to
�^
t+1
= (� �m
t
+ (1� �)(�^
t
)
�1
)
�1
; (14)
where m
t
is the mean of the N valuations received during the subscription
period t, and � is the consumer's \ ightiness" factor. This factor could be set to
h consists of a number of well-separated mass points, the landscape can contain
multiple ridges and peaks.
a constant, or be time-varying; in fact, by setting � = 1=t, the consumer weights
all observations equally.
Suppose that the producer knows the distribution h, and sets f and � to
the \optimal" values given by Fig. 2. (For = 0:1, these values are (f;�) =
(0:30878; 0:24098), at which the \ideal" pro�t � = 0:4137 and the fraction of sub-
scribers p = 1.) Furthermore, suppose that the consumers all start with exactly
the right estimates of their � parameters, but that they update their estimates
in accordance with Eq. 14. Fig. 3 illustrates what happens to consumer pro�ts
and subscription rates as a function of time for two values of �. In both cases,
the proportion p of subscribed consumers continues to diminish inde�nitely, and,
correspondingly, so does the pro�t �.
0
0.2
0.4
0.6
0.8
1
0 40 80 120 160 200
Pr of
it, C
on su
m er
iteration
profit consumers
profit = 0.4137
0
0.2
0.4
0.6
0.8
1
0 40 80 120 160 200
Pr of
it, C
on su
m er
iteration
profit consumers
profit = 0.4137
Fig. 3. Pro�t � and proportion of subscribed consumers p vs. time (in subscription
periods) when consumers estimate their � values, and the ightiness parameter is a)
� = 0:1 and b) � = 0:9. Consumer leakage occurs for � = 0:1, and is even more
pronounced for � = 0:9. Pro�ts diminish inde�nitely, although the rate of reduction
slows with time. The market consists of M = 1000 consumers and one seller o�ering
N = 10 articles per subscription period. Dashed horizontal line indicates ideal pro�t
of 0.4137.
Fig. 4 o�ers further insight into consumer leakage. It shows the expected
average pro�t over 200 subscription periods for a wide range of f and � (not
just the optimal value). These are represented as pro�t landscapes for � =
0:1 and � = 0:9. Both landscapes have become peaked at lower values of f .
In other words, if the producer insists on holding to a �xed price schedule, it
will optimize pro�ts by setting the subscription fee lower than it could charge
perfectly informed consumers, thereby encouraging overly pessimistic consumers
to stay in the market longer. It partially compensates by raising � somewhat.
A little thought suggests that, regardless of , �, or other such parameters,
consumer leakage will eventually lead to complete market failure for any �xed
setting of f and �. As consumers continually update their �^, they are in e�ect
conducting a random walk about the correct value of �. If it should ever happen
that the consumer's �^ rises above a value such that hs(�;�)i falls below F , the
profit 0.325 0.3 0.2 0.1
0.005
0 0.25
0.5 0.75
1 0 0.5 1 1.5
2 2.5 3
0
0.2
0.4
fee
price
profit profit
0.2 0.1
0.005
0 0.25
0.5 0.75
1 0 0.5 1 1.5
2 2.5 3
0
0.2
0.4
fee
price
profit
Fig. 4. Pro�t landscape after consumer leakage has occurred. Mean pro�t per article
per consumer as a function of F and �, after 200 subscription periods with F and
� at the values that would be optimal for perfectly informed consumers. The market
consists ofM = 1000 consumers and one seller o�ering N = 10 articles per subscription
period. a) � = 0:1. b) � = 0:9.
consumer will become permanently disenfranchised from the market. In other
words, the value of �^
crit
at which consumers are discouraged from subscribing
(which, from Eq. 12, can be computed from
1
�^
crit
e
��^
crit
�
< f), acts as an ab-
sorbing boundary. In the next subsection, we will discuss how a producer might
use dynamic pricing and optimization to prevent unchecked pro�t erosion.
4.3 Solving the leakage problem
Consumer leakage hurts both the producer and the consumers, and therefore all
players have an incentive to counteract it. Both consumers and producers can do
this by putting more emphasis on exploration as opposed to (pure) exploitation.
Consumers could use a variety of schemes; for example, they could choose to
subscribe at random with a �nite probability even if their expected surplus is less
than the subscription fee, and this probability could diminish monotonically as
the di�erence between these quantities increases. Producers could �ght leakage
by temporarily decreasing prices to resurrect consumers who have mistakenly
disenfranchised themselves, in hopes that with additional samples the consumers
will increase their estimated surplus to levels that can support higher prices. One
might expect that consumers could be enticed back into the market even with
small discounts, since, once their �^ ventures into a realm where subscription
appears to be unpro�table, it is frozen at this just-barely-unpro�table value.
6
6
Economists have studied behavior of this sort in order to explain temporal price
dispersion, such as \sales" and temporary discounts. A mixture of reasons have
been modeled, including attempts to price discriminate between better and worse
Here we focus on the producer's strategy for enticing overly pessimistic con-
sumers back into the market. (This is not to deny that consumers' exploration
strategies merit serious study.) From our study of consumer leakage, it is appar-
ent that the producer's strategy must involve dynamic pricing, and that it must
cope with a pro�t landscape that changes dynamically due to shifts caused by
consumers' ongoing attempts to learn an estimate of g. It also seems most likely
that the pricing strategy would involve stochastic search on this dynamically
changing landscape, rather than following some pre-planned schedule.
The dynamically changing nature of the pro�t landscape in our problem lim-
its the applicability of standard stochastic optimization techniques. For example,
approaches such as simulated annealing implicitly assume that the search is be-
ing conducted on a static landscape as the value of its temperature parameter
is lowered. While more sophisticated approaches that are currently oriented to-
wards static landscapes might be modi�ed to dynamic ones, we focus here on a
very simple iterative random-mutation hill-climbing method (RMHC). At each
iteration of the RMHC method (i.e. following each subscription period), the pro-
ducer tries a new f and a new �. If the new settings result in increased pro�t,
the producer continues to move in the same direction in the f-� landscape in
the next trial. Otherwise, it backtracks to its previous best point and makes
small adjustments to f and �. The step size of the adjustment is chosen from
an exponential distribution with a mean 1=� for both f and �. To take into
account the dynamic nature of the landscape being maximized, the hill climber
has �nite memory and can remember the previous best point visited within the
last T time steps. We have observed that, for overly large values of T , the pro-
ducer can get mired for a long time in regions of the (f;�) landscape that have
become less pro�table due to evolving consumer expectations.
Fig. 5 illustrates how the pro�t � and the subscribed consumer population
p evolve with time for a particular setting of � = 0:5 and T = 20. For these
settings (and for a fairly broad range around them), the RMHC algorithm pre-
vents long-term consumer leakage and pro�t erosion, even when the consumers
are extremely ighty (� = 0:9). Frequent lowering of prices continually pulls
consumers back into the market so they can keep sampling articles. Ironically,
the RMHC's noisy exploration of the pro�t landscape helps to keep that land-
scape relatively stable. While the pro�t never attains the value of 0.41 that it
would have if the consumers were to keep their �^'s �xed at their exact values,
the producer is still able to extract a healthy, reasonably stable pro�t of roughly
0.25. Fig. 6 illustrates the noisy trajectory of f and �.
5 Conclusions
Commerce in information goods is one of the earliest emerging applications for
intelligent agents in commerce. However, the fundamental characteristics of in-
formation goods mean that they can and likely will be o�ered in widely varying
informed customers, and inducing potential customers to bear some search costs to
�nd a better price or product. See [11], [10], and [12].
0 0.25
0.5
0.75
1
0 250 500 750 1000
pr of
it, c
on su
m er
s
iter
0
0.25
0.5
0.75
1
0 250 500 750 1000
(av g)
pro fit,
fe e,
pri ce
iter
profit fee
price profit = 0.4137
Fig. 5. Simulation run with producer using RMHC algorithm with memory of T = 20
subscription periods and step-size parameter � = 0:5. = 0:1, � = 0:9. a) Propor-
tion p of subscribed consumers. b) Moving average of f , � and � (integrated over 20
subscription periods).
0
0.25
0.5
0.75
1
1.25
1.5
0 0.2 0.4 0.6 0.8 1
pr ice
fee
settings
Fig. 6. Trajectory of f and � taken by the RMHC algorithm for the simulation run
shown above.
con�gurations. Participating agents will need to deal with uncertainty about
both prices and location in multi-dimensional product space. Thus, studying
the behavior of learning agents is central to understanding and designing for
agent-based information economies. Since uncertainty will exist on both sides of
transactions, and interactions between learning agents that are negotiating and
transacting with other learning agents may lead to unexpected dynamics, it is
important to study two-sided learning.
We presented a simple but powerful model of an information bundling econ-
omy with a single producer and multiple consumer agents. We then explored
the pricing and purchasing behavior of these agents when articles can be bun-
dled. In this initial exploration, we studied the dynamics of this economy when
consumer agents are uninformed about the distribution of article values. We
discovered that a reasonable albeit naive consumer learning strategy can have a
profound in uence on market behavior | in this case, a strikingly bad in uence.
Our consumer and producer agents were rather naive in our �rst learning
experiments. This could be viewed as a criticism of our modeling. However, es-
pecially early in the development of adaptive agent intelligence, it may well be
that agent-based markets are quite vulnerable to odd behavior and dysfunctional
dynamics of the sort we observed. Our consumer agents did not recognize the
option value of new information, and thus su�ered by not undertaking su�cient
exploration relative to exploitation. Our producer agent did not initially adapt
to the pathological dynamics induced by the consumer agent naivete, and thus
su�ered by relying too con�dently on its \perfect" but static knowledge. Al-
though it was fairly easy for us to see what was going wrong, and to modify
the producer agent in a simple way that ameliorated much of the problem, our
environment is arti�cially simple and static. In more realistic settings it may be
quite di�cult for even relatively intelligent agents to adapt to emergent patholo-
gies. Human markets may not be as susceptible because human behavior is less
rote and more re ective. The lesson for agent design is to search for strategies
that are dynamically robust and adaptive in the face of substantial uncertainty.
We have started to explore how to make our simple mechanism more robust
in realistic settings. For example, the hill climbing algorithm is likely to get stuck
at local optima. For the uniform h there is a single peak in the pro�t landscape,
but that is not at all general. Several powerful optimization techniques, such as
simulated annealing, exist for static landscapes. Extensions to these that handle
dynamically changing, noisy landscapes, may lead to robustly adaptive agent
learning strategies.
We have an active agenda of continuing work on this topic. For example, we
have begun to consider less naive consumer strategies, that balance exploration
against exploitation. We are considering producer strategies that adapt based on
the number of recent subscribers relative to the producer's model of the optimal
number of subscribers. Perhaps most challenging but essential to a more general
understanding of the problem is the extension of our work into an economy
with multiple producers who are underinformed about each other's competitive
strategies as well as about consumer valuations.
In this and earlier work we have found that initial plausible but simple de-
signs of economically-intelligent agents lead to dynamic market interactions that
can be surprising and unsuccessful. The value of intelligent agents in electronic
commerce will depend on the ability to understand the problems of learning
and adaptivity, and to design agents that interact robustly in the presence of
substantial uncertainty about both parameters and the strategies of other un-
derinformed agents.
References
1. Y. Bakos and E. Brynjolfsson. Bundling information goods: Pricing, pro�ts and
e�ciency. In D. Hurley, B. Kahin, and H. Varian, editors, The Economics of Digital
Information Goods. MIT Press, Cambridge, Massachusetts, 1998.
2. J. C. Chuang and M. A. Sirbu. Network delivery of information goods: Opti-
mal pricing of articles and subscriptions. In D. Hurley, B. Kahin, and H. Varian,
editors, The Economics of Digital Information Goods. MIT Press, Cambridge, Mas-
sachusetts, 1998.
3. S. Fay and J. K. MacKie-Mason. Competition between �rms that bundle.
Manuscript, Dept. of Economics, University of Michigan, 1998.
4. J. E. Hanson and J. O. Kephart. Spontaneous specialization in a free-market
economy of agents. In Proceedings of Workshop on Arti�cial Societies and Compu-
tational Markets at the Second International Conference on Autonomous Agents,
May 1998.
5. J. O. Kephart, J. E. Hanson, D. W. Levine, B. N. Grosof, J. Sairamesh, R. B. Segal,
and S. R. White. Dynamics of an information-�ltering economy. In Proceedings
of Second International Workshop on Cooperative Information Agents (CIA-98),
Paris, July 1998.
6. J. O. Kephart, J. E. Hanson, and J. Sairamesh. Price-war dynamics in a free-
market economy of software agents. In Proceedings of Alife VI, June 1998.
7. J. K. MacKie-Mason and H. R. Varian. Some economics of the internet. In
W. Sichel, editor, Networks, Infrastructure and the New Task for Regulation. Uni-
versity of Michigan Press, Ann Arbor, Michigan, 1996.
8. A. Mas-Colell, M. Whinston, and J. Green. Microeconomic Theory. Oxford Uni-
versity Press, 1995.
9. W. Oi. A Disneyland dilemma: Two-part tari�s for a Mickey Mouse monopoly.
The Quarterly Journal of Economics, 85(1):77{96, February 1971.
10. S. Salop and J. Stiglitz. Bargains and ripo�s: A model of monopolistically com-
petitive price dispersion. Review of Economic Studies, 44:493{510, October 1977.
11. Y. Shilony. Mixed pricing in oligopoly. Journal of Economic Theory, 14:373{388,
April 1977.
12. H. R. Varian. A model of sales. American Economic Review, 70:651{659, Septem-
ber 1980.
13. J. M. Vidal and E. H. Durfee. Learning nested agent models in an information
economy. Journal of Experimental and Theoretical Arti�cial Intelligence (special
issue on learning in distributed arti�cial intelligence systems), 10(3):291{308, 1998.
14. J. M. Vidal and E. H. Durfee. The moving target problem in multi-agent learn-
ing. In Proceedings of the Third International Conference on Multi-agent Systems
(ICMAS98), 1998.
information bundles
Je�rey O. Kephart
1
, Rajarshi Das
1
, and Je�rey K. MacKie-Mason
2
1
IBM Research
PO Box 704, Yorktown Heights, NY 10598
fkephart,rajarshig@watson.ibm.com
2
School of Information and Department of Economics
University of Michigan, Ann Arbor, MI 48109
jmm@umich.edu
Abstract. Commerce in information goods is one of the earliest emerg-
ing applications for intelligent agents in commerce. However, the funda-
mental characteristics of information goods mean that they can and likely
will be o�ered in widely varying con�gurations. Participating agents
will need to deal with uncertainty about both prices and location in
multi-dimensional product space. Thus, studying the behavior of learning
agents is central to understanding and designing for agent-based informa-
tion economies. Since uncertainty will exist on both sides of transactions,
and interactions between learning agents that are negotiating and trans-
acting with other learning agents may lead to unexpected dynamics, it
is important to study two-sided learning.
We present a simple but powerful model of an information bundling
economy with a single producer and multiple consumer agents. We ex-
plore the pricing and purchasing behavior of these agents when articles
can be bundled. In this initial exploration, we study the dynamics of this
economy when consumer agents are uninformed about the distribution of
article values. We discover that a reasonable albeit naive consumer learn-
ing strategy can lead to disastrous market behavior. We �nd a simple
explanation for this market failure, and develop a simple improvement
to the producer agent's strategy that largely ameliorates the problem.
But in the process we learn an important lesson: dynamic market in-
teractions when there is substantial uncertainty can lead to pathological
outcomes if agents are designed with \reasonable" but not su�ciently
adaptive strategies. Thus, in programmed agent environments it may be
essential to dramatically increase our understanding of adaptivity and
learning if we want to obtain good aggregate outcomes.
Keywords: Information economy, information bundling, two-
sided learning, economic agents.
1 Introduction
Extraordinary rates of advance in computation and communication technolo-
gies
1
have fueled the much-noted proliferation of electronic commerce. Within a
few years, we anticipate that software agents will participate in a wide variety of
commercial transactions, and may even become economic players in their own
right [5]. One important domain for agent economies is the production and dis-
tribution of information goods and services, such as news articles, entertainment
and other service reviews, and instructional materials. Digital information goods
are unusually con�gurable, and negotiations over the composition and prices for
bundles are a natural application for software agents. However, to do so will
require that broker agents be able to explore not just prices, but also locations
in a multi-dimensional bundled product space.
Very little is understood about strategic search over the joint information
good price and product space (that is, search for which products to o�er, in
which combinations, at what prices).
2
Agents competing in this space must
learn about the distribution of preferences across a heterogeneous and changing
customer population, must learn about the strategies being followed by their
competitors, and then must optimize their strategies to take into account the
new understanding of customers and competitors. Consumers, on the other hand,
must learn about the quantity, quality and price of bundled items o�ered by the
various producers. Learning and optimization are both di�cult search problems,
and in the market context they are closely related.
In earlier work we considered strategic pricing for bundles when all of the
relevant parameters are known by brokers [3]. We distinguished between condi-
tions in which �rms prefer to o�er comprehensive bundles at a single price and
those in which they prefer to sell items individually. Although ours is the �rst
work focusing on competitive strategies, the bundling and pricing of information
goods has been an active area in recent research [1], [2]. We have also studied
price dynamics when competing brokers search a restricted price and product
space [5, 6, 4]. Price instability, characterized as \price wars", was prevalent.
In this paper we examine a considerably more general problem, although
for this initial foray we maintain some simplifying restrictions. We envision an
agent economy with one information broker agent and many consumer agents.
The consumer agents are heterogeneous: they value each item di�erently, and
these values are drawn from a di�erent distribution for each agent. To set pro�t-
maximizing prices, the broker agent desires, but may not know, the parameters
of the consumers' valuation distributions. The consumers also may not know
their valuation parameters until they gain experience with the broker's o�ered
information goods. Therefore, both sides wish to learn. The broker can set prices
strategically, learning about consumer valuations from their purchasing behavior
at di�erent prices. The consumers can purchase strategically, learning about the
distribution of information good values by sampling.
1
See, e.g., [7].
2
For some recent work on multi-agent search, see, e.g., [14, 13].
This environment is very rich and will permit us to explore many interesting
questions. In this paper we focus on one surprising result: that when consumer
agents follow a plausible but overly naive learning strategy, even if the producer is
fully informed (but also somewhat naive), the economy can continuously degen-
erate with disastrous overall performance. We �nd a simple explanation for the
initial failure, and develop a simple improvement to the producer agent's strategy
that largely ameliorates the problem. But in the process we learn an important
lesson: dynamic market interactions when there is substantial uncertainty can
lead to pathological outcomes if agents are designed with \reasonable" but not
su�ciently adaptive strategies. Thus, in programmed agent environments it may
be essential to dramatically increase our understanding of adaptivity and learn-
ing if we want to obtain good aggregate outcomes.
This paper is the �rst in a series of studies directed towards understanding
and developing robust adaptive learning techniques that are e�ective for indi-
vidual agents and lead to acceptable collective (market) behavior. We are also
extending the model to study the much more common setting in which there is
competition among multiple producers of information goods. Of course, this set-
ting exacerbates the learning problem, since the producers need to learn about
each other's strategies in addition to learning about consumer valuations.
In the next section we discuss the model. We introduce two relevant dis-
tributions: �rst, a parameterized distribution g from which a given consumer's
valuations are drawn, and second, a distribution h describing the population of
consumers | the distribution from which an individual consumer's valuation
distribution parameters are drawn. For tractability, we assume that the broker
is restricted to two-part tari� pricing schemes [9]: it can charge consumer agents
a subscription price to examine the current items, and a per item price for each
item the consumer subsequently purchases.
In a general analysis section, we derive the optimal strategies for fully in-
formed broker and consumer agents as functions of the article value and con-
sumer type distributions g and h. Then in Section 4 we explore speci�c cases of
exploration and exploitation through analysis and simulation. When agents are
underinformed and engage in learning, we consider plausible (but not necessarily
fully optimal) agent strategies, and examine the resulting system outcomes. We
close with a summary of our current results and plans for future work.
2 Model
A single producer periodically (at discrete times t = 0; 1; 2; : : :) generates sets of
N articles. It sets a subscription fee F and a price schedule P = fP (1); : : : ; P (N )g,
where P (k) represents the price it charges for a subset of k 2 f1; : : : ; Ng of the
N articles.
At a given time t, each of M consumers are informed about (F;P ), and they
decide whether to subscribe. Then, each subscribing consumer receives abstracts
of all N articles, and uses them to assess its value w
j
from reading each article,
for j 2 f1; : : : ; Ng. We assume that these values are generated randomly ac-
cording to a distribution g(q
i
;w), where q
i
is the set of parameters that de�ne
the distribution for consumer i. The parameters q
i
that represent consumer i's
valuation distribution are themselves generated randomly prior to time 0 from a
distribution h(r; q). Once the q
i
parameters are generated for consumer i, they
remain �xed for the rest of time (though the agents may not know their true
values for a while, if ever).
After assessing the value of each article, a subscriber decides which articles
to purchase. It does this by choosing a set of articles K to maximize surplus
s = (
P
j2K
w
j
) � P (jKj). Henceforth we assume that the articles have been
sorted by consumer j so that the w
j
are ordered highest to lowest, and thus the
set K consists of the �rst k = jKj articles.
The subscription decision depends on consumer expectations. If a consumer
believes correctly that its valuations are drawn from a distribution with param-
eters q
i
, then its expected surplus from purchasing k articles would be
hs
k
i =
*
k
X
j=1
w
j
+
� P (k): (1)
The consumer can then derive the vector of values p
k
, the probabilities that any
k is the expected surplus maximizing number of articles. Then the consumer's
optimized expected surplus is
hsi =
N
X
k=1
p
k
hs
k
i : (2)
The consumer should subscribe if and only if the expectation hsi exceeds the
subscription fee F .
3
If consumer i does not know its own q
i
for the articles o�ered by this broker,
then values w^
j
are the consumer's beliefs drawn from a distribution g(
^
q
i
;w) after
reviewing the abstracts, where the
^
q
i
are the agent's current best estimate of the
valuation parameters q
i
. When the agent purchases and reads articles, it learns
their true valuesw
j
and can then use this sample information to update its beliefs
^
q
i
about the distribution of article values. Therefore, a good consumer strategy
should take into account the value of learning. For example, when uncertainty
about q
i
is high, the consumer might deliberately subscribe even when its esti-
mated surplus is less than F , simply to experience more articles to improve its
valuation estimates. Or, having subscribed the consumer might purchase more
articles than would maximize expected surplus from current reading.
Turning to the producer's problem, it can choose a subscription fee F and
a price schedule P in each period. These should be chosen to maximize some
3
We assume consumers are risk neutral. A risk averse agent would want to optimize
the expectation of a concave function of surplus, and optimal behavior would depend
on second and possibly higher moments of the induced distribution of surplus, not
just the expected surplus. See, e.g., [8].
function of expected current and future pro�ts, where current pro�t can be
expressed as:
� =
M
X
i=1
�
i
(F + P (k
i
)�C(k
i
)) : (3)
with �
i
= 1 if consumer i chooses to subscribe, and 0 otherwise. The cost of
delivering k
i
articles to consumer i is denoted as C(k
i
).
In performing its maximization, the producer must take into account the
e�ect of P and F on the consumer's subscription decisions, and the e�ect of P
on the distribution of k
i
across the set of subscribers. Higher prices will decrease
the number of subscribers and the expected number of articles purchased, of
course. To compute the optimal subscription fee and price schedule the broker
wants to know the distribution h(r; q) from which the consumers' parameters
were generated, and the consumer strategies for subscribing and purchasing.
Based on its current beliefs about r and consumer strategies, the broker can
simulate a consumer population and its responses to various (F;P ) schedules,
and then pick (F;P ) to maximize a value function. In practice, the broker may
not know the consumer type parameters r, nor the consumer strategies. Thus,
the broker may choose (F;P ) to balance current expected pro�t according to
equation (3), against the increase in expected future pro�t from learning about
consumer preferences by observing their behavior when confronted with varying
(F;P ) combinations.
3 General Analysis
In this section, we �rst analyze the expected surplus and number of articles
purchased per subscription period for a rational, fully-informed consumer with a
given valuation distribution g. Then, we derive an expression for a monopolistic
producer's expected pro�t as a function of its price schedule and the distribution
h of valuation parameters q across the consumer population, assuming that all
consumers are rational and fully-informed about their valuation distribution.
A rational, fully-informed producer would choose its price schedule so as to
maximize its expected pro�t.
Suppose that a given consumer has its valuations w drawn from a probabil-
ity density function g(q;w). (The distribution parameters q may vary from one
consumer to another.) For the sake of simplicity, we assume that the producer
constrains itself to a linear price schedule: P (k) = k�. Then a rational consumer
will purchase k articles, where k is the number of articles with valuations ex-
ceeding the threshold P (k) � P (k � 1) = �. The probability p
k
for exactly k
articles to have valuations w > � is
p
k
=
�
N
k
�
G(q;�)
N�k
(1�G(q;�))
k
(4)
where G(q;w) represents the cumulative distribution function that corresponds
to g(q;w), i.e. it is the probability for an article to be valued at less than w.
From this we can compute the expected number of articles purchased:
hk(q;�)i =
N
X
k=1
k p
k
= N [1� G(q;�)] (5)
where the last equality follows from simple manipulations of binomial coe�-
cients.
The expected surplus hsi can be obtained from Eq. 2, provided that we �rst
compute hs
k
i, the expected surplus given that exactly k articles prove to have
valuations w > �. The conditional probability distribution for a single draw
from g(q;w) given that w > � is
~g(q;w) =
g(q;w)
1�G(q;�)
�(w ��) (6)
where �(x) represents the step function, equal to 1 if x > 0 and 0 otherwise.
The expected valuation for this distribution is
�w(q;�) =
Z
1
�
dww ~g(q;w) (7)
The expected sum of k draws from this conditional distribution is k �w(q;�).
Subtracting the price P (k) = k�, we obtain:
hs
k
i = k( �w(�)��) (8)
Inserting this result into Eq. 2, we obtain:
hs(q;�)i =
N
X
k=1
p
k
k [ �w(q;�)��] = [ �w(q;�)��] hk(q;�)i (9)
= N [ �w(q;�)��] [1� G(q;�)]
Note that the consumer will subscribe if and only if hs(q;�)i is greater than the
subscription fee F .
Now we take the producer's perspective. If we assume (for simplicity) that
the cost of producing and delivering k items is C(k) = k , then the expected
pro�t is
h�(F;�)i =
M
X
i=1
�(hs(q
i
;�)i � F )(F + (�� ) hk(q
i
;�)i) (10)
�
Z
q
dqh(r; q)�(hs(q;�)i � F ) (F + (�� ) hk(q;�)i)
where h(r; q) is the probability distribution for the consumers' q parameters, as
de�ned previously. In e�ect, the last approximation replaces the actual realized
set of consumer distribution parameters fq
i
g with an ensemble average over all
possible realizations of a set of M consumers generated from the distribution
h, and this approximation grows increasingly accurate in the limit of large M .
Note that a producer that knows h can compute this pro�t landscape. A fully
knowledgeable and rational producer would set � and F so as to maximize
h�(F;�)i.
4 Rational and Bounded-Rational Players
In the remainder of the paper, we shall assume simple functional forms for g and
h, and explore what happens when the consumers and/or the producer must
learn these distributions. In particular, we suppose that g is a one-parameter
exponential distribution given by g(�;w) = �e
��w
, and that h(�) is a uniform
distribution in the interval from �
min
to �
max
.
4.1 Fully-informed producer and consumers
As a reference point, we analyze the case where the consumers are fully informed
about their individual values of �, the producer knows the distribution h(�), and
the producer and the consumers act so as to maximize their expected gain.
Integrating g, we obtain the cumulative distribution G(�;w) = 1 � e
��w
.
From Eq. 6 we can compute the conditional distribution
~g(�;w) =
g(�;w)
1�G(�)
�(w ��) (11)
= �e
��(w��)
�(w ��):
The average valuation for this conditional distribution is �w(�;�) = � + �
�1
.
Using Eqs. 5 and 9, we obtain the expected number of purchased articles and
the expected surplus (assuming the consumer subscribes):
hk(�)i = Ne
���
(12)
hs(�)i =
N
�
e
���
:
To compute the producer's expected pro�t as a function of � and F , we can
substitute Eqs. 12 into Eq.10, which yields:
h�(F;�)i =
1
(�
max
� �
min
)
Z
�
0
�
min
d� (F + N (�� )e
���
) (13)
=
(�
0
� �
min
)F +N (1�
�
)
�
e
���
min
� e
���
0
�
(�
max
� �
min
)
;
where �
0
is de�ned as �
0
= min(�
max
; ~�) and ~� is de�ned as the unique solution
to Ne
�~��
= F ~�.
profit 0.4 0.3 0.2 0.1
0.005
0 0.25
0.5 0.75
1 0 0.5 1 1.5
2 2.5 3
0
0.2
0.4
fee
price
profit profit
0.225 0.2 0.1
0.005
0 0.25
0.5 0.75
1 0 0.5 1 1.5
2 2.5 3
-0.5
-0.3
-0.1
0.1
0.3
fee
price
profit
Fig. 1. Pro�t landscape �(f;�). h is a uniform distribution with �
min
= 0:5, �
min
= 2:0.
a) Production cost = 0:1. b) Production cost = 0:5.
Eq. 13 can be visualized as a pro�t landscape in which the expected pro�t
is plotted as a function of F and �. It is convenient to de�ne a normalized
expected pro�t � = �=N and a normalized fee f = F=N . Fig. 1 illustrates the
landscape for two di�erent production costs: = 0:1 and = 0:5.
In each such landscape, there are two ridges. The lower ridge, which demar-
cates the boundary beyond which the producer attracts no consumers and thus
makes no pro�t, is de�ned by e
��
min
�
= f�
min
. The upper ridge is described
by a piecewise joining of two nonlinear curves, the simpler one being described
by the relation e
��
max
�
= f�
max
. This portion of the ridge has the following
characteristics:
� it is formed by a discontinuous derivative with respect to f and � (it
changes abruptly from positive to negative)
� for price settings along this ridge, all consumers subscribe
� if the production cost <
crit
, the optimal setting of (f;�) occurs along
this part of the ridge. For the examples of Fig. 1, the critical production
cost
crit
is found to be approximately 0.28
4
, so that Fig. 1a ( = 0:1)
is in this regime, while Fig. 1b ( = 0:5) is not.
The other portion of the ridge is de�ned by a more complex nonlinear relation
between f and�. It is less sharp, resulting from derivatives with respect to f and
� being zero rather than jumping discontinuously from positive to negative.
5
4
It can be shown that, for exponential g and uniform h,
crit
=
�(
crit
)�
min
�
max
+�
min
�
2
max
�
�
min
�
2
max
, which is positive when �
min
> 0.
5
These characteristics only pertain to the case where g is exponential and h is uniform;
we have explored other combinations of functional forms for g and h that yield pro�t
landscapes that are topographically di�erent. For example, if g is exponential and
Figure 2 shows the dependence of the optimal f and � upon the production
cost . There is a discontinuous derivative at =
crit
, due to the switchover
between the two nonlinear curves that de�ne the upper ridge in the landscape.
As one might guess, the pro�t � decreases monotonically with the production
cost. The proportion p of consumers that subscribe is 1 for all �
crit
; for
exceeding this threshold the proportion of subscribers is strictly less than one,
and is given by
~���
min
�
max
��
min
.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0
0.2
0.4
0.6
0.8
1.0
µmin = 0.5 µmax = 2.0
∆
f
pi
p
production cost γ
∆,
f
Optimal ∆ and f vs. γ
Fig. 2. Optimal �, f , �, and p vs. , where h is a uniform distribution with �
min
= 0:5,
�
min
= 2:0.
4.2 Uninformed consumers
For a variety of reasons, a consumer may have to rely on adaptive estimates of
its valuation distribution, g. Suppose for example that consumer i knows that
g is an exponential distribution, but does not know its individual parameter �
i
.
Depending on the consumer's beliefs about the dynamics of the environment, it
might wish to place more or less weight on recent observations. Since the mean of
the distribution g(�;w) = � exp
��w
is 1=�, one reasonable and exible approach
to estimating � is to start with an assumed prior �
0
and after each period of
subscription to update the estimate �^ according to
�^
t+1
= (� �m
t
+ (1� �)(�^
t
)
�1
)
�1
; (14)
where m
t
is the mean of the N valuations received during the subscription
period t, and � is the consumer's \ ightiness" factor. This factor could be set to
h consists of a number of well-separated mass points, the landscape can contain
multiple ridges and peaks.
a constant, or be time-varying; in fact, by setting � = 1=t, the consumer weights
all observations equally.
Suppose that the producer knows the distribution h, and sets f and � to
the \optimal" values given by Fig. 2. (For = 0:1, these values are (f;�) =
(0:30878; 0:24098), at which the \ideal" pro�t � = 0:4137 and the fraction of sub-
scribers p = 1.) Furthermore, suppose that the consumers all start with exactly
the right estimates of their � parameters, but that they update their estimates
in accordance with Eq. 14. Fig. 3 illustrates what happens to consumer pro�ts
and subscription rates as a function of time for two values of �. In both cases,
the proportion p of subscribed consumers continues to diminish inde�nitely, and,
correspondingly, so does the pro�t �.
0
0.2
0.4
0.6
0.8
1
0 40 80 120 160 200
Pr of
it, C
on su
m er
iteration
profit consumers
profit = 0.4137
0
0.2
0.4
0.6
0.8
1
0 40 80 120 160 200
Pr of
it, C
on su
m er
iteration
profit consumers
profit = 0.4137
Fig. 3. Pro�t � and proportion of subscribed consumers p vs. time (in subscription
periods) when consumers estimate their � values, and the ightiness parameter is a)
� = 0:1 and b) � = 0:9. Consumer leakage occurs for � = 0:1, and is even more
pronounced for � = 0:9. Pro�ts diminish inde�nitely, although the rate of reduction
slows with time. The market consists of M = 1000 consumers and one seller o�ering
N = 10 articles per subscription period. Dashed horizontal line indicates ideal pro�t
of 0.4137.
Fig. 4 o�ers further insight into consumer leakage. It shows the expected
average pro�t over 200 subscription periods for a wide range of f and � (not
just the optimal value). These are represented as pro�t landscapes for � =
0:1 and � = 0:9. Both landscapes have become peaked at lower values of f .
In other words, if the producer insists on holding to a �xed price schedule, it
will optimize pro�ts by setting the subscription fee lower than it could charge
perfectly informed consumers, thereby encouraging overly pessimistic consumers
to stay in the market longer. It partially compensates by raising � somewhat.
A little thought suggests that, regardless of , �, or other such parameters,
consumer leakage will eventually lead to complete market failure for any �xed
setting of f and �. As consumers continually update their �^, they are in e�ect
conducting a random walk about the correct value of �. If it should ever happen
that the consumer's �^ rises above a value such that hs(�;�)i falls below F , the
profit 0.325 0.3 0.2 0.1
0.005
0 0.25
0.5 0.75
1 0 0.5 1 1.5
2 2.5 3
0
0.2
0.4
fee
price
profit profit
0.2 0.1
0.005
0 0.25
0.5 0.75
1 0 0.5 1 1.5
2 2.5 3
0
0.2
0.4
fee
price
profit
Fig. 4. Pro�t landscape after consumer leakage has occurred. Mean pro�t per article
per consumer as a function of F and �, after 200 subscription periods with F and
� at the values that would be optimal for perfectly informed consumers. The market
consists ofM = 1000 consumers and one seller o�ering N = 10 articles per subscription
period. a) � = 0:1. b) � = 0:9.
consumer will become permanently disenfranchised from the market. In other
words, the value of �^
crit
at which consumers are discouraged from subscribing
(which, from Eq. 12, can be computed from
1
�^
crit
e
��^
crit
�
< f), acts as an ab-
sorbing boundary. In the next subsection, we will discuss how a producer might
use dynamic pricing and optimization to prevent unchecked pro�t erosion.
4.3 Solving the leakage problem
Consumer leakage hurts both the producer and the consumers, and therefore all
players have an incentive to counteract it. Both consumers and producers can do
this by putting more emphasis on exploration as opposed to (pure) exploitation.
Consumers could use a variety of schemes; for example, they could choose to
subscribe at random with a �nite probability even if their expected surplus is less
than the subscription fee, and this probability could diminish monotonically as
the di�erence between these quantities increases. Producers could �ght leakage
by temporarily decreasing prices to resurrect consumers who have mistakenly
disenfranchised themselves, in hopes that with additional samples the consumers
will increase their estimated surplus to levels that can support higher prices. One
might expect that consumers could be enticed back into the market even with
small discounts, since, once their �^ ventures into a realm where subscription
appears to be unpro�table, it is frozen at this just-barely-unpro�table value.
6
6
Economists have studied behavior of this sort in order to explain temporal price
dispersion, such as \sales" and temporary discounts. A mixture of reasons have
been modeled, including attempts to price discriminate between better and worse
Here we focus on the producer's strategy for enticing overly pessimistic con-
sumers back into the market. (This is not to deny that consumers' exploration
strategies merit serious study.) From our study of consumer leakage, it is appar-
ent that the producer's strategy must involve dynamic pricing, and that it must
cope with a pro�t landscape that changes dynamically due to shifts caused by
consumers' ongoing attempts to learn an estimate of g. It also seems most likely
that the pricing strategy would involve stochastic search on this dynamically
changing landscape, rather than following some pre-planned schedule.
The dynamically changing nature of the pro�t landscape in our problem lim-
its the applicability of standard stochastic optimization techniques. For example,
approaches such as simulated annealing implicitly assume that the search is be-
ing conducted on a static landscape as the value of its temperature parameter
is lowered. While more sophisticated approaches that are currently oriented to-
wards static landscapes might be modi�ed to dynamic ones, we focus here on a
very simple iterative random-mutation hill-climbing method (RMHC). At each
iteration of the RMHC method (i.e. following each subscription period), the pro-
ducer tries a new f and a new �. If the new settings result in increased pro�t,
the producer continues to move in the same direction in the f-� landscape in
the next trial. Otherwise, it backtracks to its previous best point and makes
small adjustments to f and �. The step size of the adjustment is chosen from
an exponential distribution with a mean 1=� for both f and �. To take into
account the dynamic nature of the landscape being maximized, the hill climber
has �nite memory and can remember the previous best point visited within the
last T time steps. We have observed that, for overly large values of T , the pro-
ducer can get mired for a long time in regions of the (f;�) landscape that have
become less pro�table due to evolving consumer expectations.
Fig. 5 illustrates how the pro�t � and the subscribed consumer population
p evolve with time for a particular setting of � = 0:5 and T = 20. For these
settings (and for a fairly broad range around them), the RMHC algorithm pre-
vents long-term consumer leakage and pro�t erosion, even when the consumers
are extremely ighty (� = 0:9). Frequent lowering of prices continually pulls
consumers back into the market so they can keep sampling articles. Ironically,
the RMHC's noisy exploration of the pro�t landscape helps to keep that land-
scape relatively stable. While the pro�t never attains the value of 0.41 that it
would have if the consumers were to keep their �^'s �xed at their exact values,
the producer is still able to extract a healthy, reasonably stable pro�t of roughly
0.25. Fig. 6 illustrates the noisy trajectory of f and �.
5 Conclusions
Commerce in information goods is one of the earliest emerging applications for
intelligent agents in commerce. However, the fundamental characteristics of in-
formation goods mean that they can and likely will be o�ered in widely varying
informed customers, and inducing potential customers to bear some search costs to
�nd a better price or product. See [11], [10], and [12].
0 0.25
0.5
0.75
1
0 250 500 750 1000
pr of
it, c
on su
m er
s
iter
0
0.25
0.5
0.75
1
0 250 500 750 1000
(av g)
pro fit,
fe e,
pri ce
iter
profit fee
price profit = 0.4137
Fig. 5. Simulation run with producer using RMHC algorithm with memory of T = 20
subscription periods and step-size parameter � = 0:5. = 0:1, � = 0:9. a) Propor-
tion p of subscribed consumers. b) Moving average of f , � and � (integrated over 20
subscription periods).
0
0.25
0.5
0.75
1
1.25
1.5
0 0.2 0.4 0.6 0.8 1
pr ice
fee
settings
Fig. 6. Trajectory of f and � taken by the RMHC algorithm for the simulation run
shown above.
con�gurations. Participating agents will need to deal with uncertainty about
both prices and location in multi-dimensional product space. Thus, studying
the behavior of learning agents is central to understanding and designing for
agent-based information economies. Since uncertainty will exist on both sides of
transactions, and interactions between learning agents that are negotiating and
transacting with other learning agents may lead to unexpected dynamics, it is
important to study two-sided learning.
We presented a simple but powerful model of an information bundling econ-
omy with a single producer and multiple consumer agents. We then explored
the pricing and purchasing behavior of these agents when articles can be bun-
dled. In this initial exploration, we studied the dynamics of this economy when
consumer agents are uninformed about the distribution of article values. We
discovered that a reasonable albeit naive consumer learning strategy can have a
profound in uence on market behavior | in this case, a strikingly bad in uence.
Our consumer and producer agents were rather naive in our �rst learning
experiments. This could be viewed as a criticism of our modeling. However, es-
pecially early in the development of adaptive agent intelligence, it may well be
that agent-based markets are quite vulnerable to odd behavior and dysfunctional
dynamics of the sort we observed. Our consumer agents did not recognize the
option value of new information, and thus su�ered by not undertaking su�cient
exploration relative to exploitation. Our producer agent did not initially adapt
to the pathological dynamics induced by the consumer agent naivete, and thus
su�ered by relying too con�dently on its \perfect" but static knowledge. Al-
though it was fairly easy for us to see what was going wrong, and to modify
the producer agent in a simple way that ameliorated much of the problem, our
environment is arti�cially simple and static. In more realistic settings it may be
quite di�cult for even relatively intelligent agents to adapt to emergent patholo-
gies. Human markets may not be as susceptible because human behavior is less
rote and more re ective. The lesson for agent design is to search for strategies
that are dynamically robust and adaptive in the face of substantial uncertainty.
We have started to explore how to make our simple mechanism more robust
in realistic settings. For example, the hill climbing algorithm is likely to get stuck
at local optima. For the uniform h there is a single peak in the pro�t landscape,
but that is not at all general. Several powerful optimization techniques, such as
simulated annealing, exist for static landscapes. Extensions to these that handle
dynamically changing, noisy landscapes, may lead to robustly adaptive agent
learning strategies.
We have an active agenda of continuing work on this topic. For example, we
have begun to consider less naive consumer strategies, that balance exploration
against exploitation. We are considering producer strategies that adapt based on
the number of recent subscribers relative to the producer's model of the optimal
number of subscribers. Perhaps most challenging but essential to a more general
understanding of the problem is the extension of our work into an economy
with multiple producers who are underinformed about each other's competitive
strategies as well as about consumer valuations.
In this and earlier work we have found that initial plausible but simple de-
signs of economically-intelligent agents lead to dynamic market interactions that
can be surprising and unsuccessful. The value of intelligent agents in electronic
commerce will depend on the ability to understand the problems of learning
and adaptivity, and to design agents that interact robustly in the presence of
substantial uncertainty about both parameters and the strategies of other un-
derinformed agents.
References
1. Y. Bakos and E. Brynjolfsson. Bundling information goods: Pricing, pro�ts and
e�ciency. In D. Hurley, B. Kahin, and H. Varian, editors, The Economics of Digital
Information Goods. MIT Press, Cambridge, Massachusetts, 1998.
2. J. C. Chuang and M. A. Sirbu. Network delivery of information goods: Opti-
mal pricing of articles and subscriptions. In D. Hurley, B. Kahin, and H. Varian,
editors, The Economics of Digital Information Goods. MIT Press, Cambridge, Mas-
sachusetts, 1998.
3. S. Fay and J. K. MacKie-Mason. Competition between �rms that bundle.
Manuscript, Dept. of Economics, University of Michigan, 1998.
4. J. E. Hanson and J. O. Kephart. Spontaneous specialization in a free-market
economy of agents. In Proceedings of Workshop on Arti�cial Societies and Compu-
tational Markets at the Second International Conference on Autonomous Agents,
May 1998.
5. J. O. Kephart, J. E. Hanson, D. W. Levine, B. N. Grosof, J. Sairamesh, R. B. Segal,
and S. R. White. Dynamics of an information-�ltering economy. In Proceedings
of Second International Workshop on Cooperative Information Agents (CIA-98),
Paris, July 1998.
6. J. O. Kephart, J. E. Hanson, and J. Sairamesh. Price-war dynamics in a free-
market economy of software agents. In Proceedings of Alife VI, June 1998.
7. J. K. MacKie-Mason and H. R. Varian. Some economics of the internet. In
W. Sichel, editor, Networks, Infrastructure and the New Task for Regulation. Uni-
versity of Michigan Press, Ann Arbor, Michigan, 1996.
8. A. Mas-Colell, M. Whinston, and J. Green. Microeconomic Theory. Oxford Uni-
versity Press, 1995.
9. W. Oi. A Disneyland dilemma: Two-part tari�s for a Mickey Mouse monopoly.
The Quarterly Journal of Economics, 85(1):77{96, February 1971.
10. S. Salop and J. Stiglitz. Bargains and ripo�s: A model of monopolistically com-
petitive price dispersion. Review of Economic Studies, 44:493{510, October 1977.
11. Y. Shilony. Mixed pricing in oligopoly. Journal of Economic Theory, 14:373{388,
April 1977.
12. H. R. Varian. A model of sales. American Economic Review, 70:651{659, Septem-
ber 1980.
13. J. M. Vidal and E. H. Durfee. Learning nested agent models in an information
economy. Journal of Experimental and Theoretical Arti�cial Intelligence (special
issue on learning in distributed arti�cial intelligence systems), 10(3):291{308, 1998.
14. J. M. Vidal and E. H. Durfee. The moving target problem in multi-agent learn-
ing. In Proceedings of the Third International Conference on Multi-agent Systems
(ICMAS98), 1998.Two-sided learning in an agent economy for
information bundles
Je�rey O. Kephart
1
, Rajarshi Das
1
, and Je�rey K. MacKie-Mason
2
1
IBM Research
PO Box 704, Yorktown Heights, NY 10598
fkephart,rajarshig@watson.ibm.com
2
School of Information and Department of Economics
University of Michigan, Ann Arbor, MI 48109
jmm@umich.edu
Abstract. Commerce in information goods is one of the earliest emerg-
ing applications for intelligent agents in commerce. However, the funda-
mental characteristics of information goods mean that they can and likely
will be o�ered in widely varying con�gurations. Participating agents
will need to deal with uncertainty about both prices and location in
multi-dimensional product space. Thus, studying the behavior of learning
agents is central to understanding and designing for agent-based informa-
tion economies. Since uncertainty will exist on both sides of transactions,
and interactions between learning agents that are negotiating and trans-
acting with other learning agents may lead to unexpected dynamics, it
is important to study two-sided learning.
We present a simple but powerful model of an information bundling
economy with a single producer and multiple consumer agents. We ex-
plore the pricing and purchasing behavior of these agents when articles
can be bundled. In this initial exploration, we study the dynamics of this
economy when consumer agents are uninformed about the distribution of
article values. We discover that a reasonable albeit naive consumer learn-
ing strategy can lead to disastrous market behavior. We �nd a simple
explanation for this market failure, and develop a simple improvement
to the producer agent's strategy that largely ameliorates the problem.
But in the process we learn an important lesson: dynamic market in-
teractions when there is substantial uncertainty can lead to pathological
outcomes if agents are designed with \reasonable" but not su�ciently
adaptive strategies. Thus, in programmed agent environments it may be
essential to dramatically increase our understanding of adaptivity and
learning if we want to obtain good aggregate outcomes.
Keywords: Information economy, information bundling, two-
sided learning, economic agents.
1 Introduction
Extraordinary rates of advance in computation and communication technolo-
gies
1
have fueled the much-noted proliferation of electronic commerce. Within a
few years, we anticipate that software agents will participate in a wide variety of
commercial transactions, and may even become economic players in their own
right [5]. One important domain for agent economies is the production and dis-
tribution of information goods and services, such as news articles, entertainment
and other service reviews, and instructional materials. Digital information goods
are unusually con�gurable, and negotiations over the composition and prices for
bundles are a natural application for software agents. However, to do so will
require that broker agents be able to explore not just prices, but also locations
in a multi-dimensional bundled product space.
Very little is understood about strategic search over the joint information
good price and product space (that is, search for which products to o�er, in
which combinations, at what prices).
2
Agents competing in this space must
learn about the distribution of preferences across a heterogeneous and changing
customer population, must learn about the strategies being followed by their
competitors, and then must optimize their strategies to take into account the
new understanding of customers and competitors. Consumers, on the other hand,
must learn about the quantity, quality and price of bundled items o�ered by the
various producers. Learning and optimization are both di�cult search problems,
and in the market context they are closely related.
In earlier work we considered strategic pricing for bundles when all of the
relevant parameters are known by brokers [3]. We distinguished between condi-
tions in which �rms prefer to o�er comprehensive bundles at a single price and
those in which they prefer to sell items individually. Although ours is the �rst
work focusing on competitive strategies, the bundling and pricing of information
goods has been an active area in recent research [1], [2]. We have also studied
price dynamics when competing brokers search a restricted price and product
space [5, 6, 4]. Price instability, characterized as \price wars", was prevalent.
In this paper we examine a considerably more general problem, although
for this initial foray we maintain some simplifying restrictions. We envision an
agent economy with one information broker agent and many consumer agents.
The consumer agents are heterogeneous: they value each item di�erently, and
these values are drawn from a di�erent distribution for each agent. To set pro�t-
maximizing prices, the broker agent desires, but may not know, the parameters
of the consumers' valuation distributions. The consumers also may not know
their valuation parameters until they gain experience with the broker's o�ered
information goods. Therefore, both sides wish to learn. The broker can set prices
strategically, learning about consumer valuations from their purchasing behavior
at di�erent prices. The consumers can purchase strategically, learning about the
distribution of information good values by sampling.
1
See, e.g., [7].
2
For some recent work on multi-agent search, see, e.g., [14, 13].
This environment is very rich and will permit us to explore many interesting
questions. In this paper we focus on one surprising result: that when consumer
agents follow a plausible but overly naive learning strategy, even if the producer is
fully informed (but also somewhat naive), the economy can continuously degen-
erate with disastrous overall performance. We �nd a simple explanation for the
initial failure, and develop a simple improvement to the producer agent's strategy
that largely ameliorates the problem. But in the process we learn an important
lesson: dynamic market interactions when there is substantial uncertainty can
lead to pathological outcomes if agents are designed with \reasonable" but not
su�ciently adaptive strategies. Thus, in programmed agent environments it may
be essential to dramatically increase our understanding of adaptivity and learn-
ing if we want to obtain good aggregate outcomes.
This paper is the �rst in a series of studies directed towards understanding
and developing robust adaptive learning techniques that are e�ective for indi-
vidual agents and lead to acceptable collective (market) behavior. We are also
extending the model to study the much more common setting in which there is
competition among multiple producers of information goods. Of course, this set-
ting exacerbates the learning problem, since the producers need to learn about
each other's strategies in addition to learning about consumer valuations.
In the next section we discuss the model. We introduce two relevant dis-
tributions: �rst, a parameterized distribution g from which a given consumer's
valuations are drawn, and second, a distribution h describing the population of
consumers | the distribution from which an individual consumer's valuation
distribution parameters are drawn. For tractability, we assume that the broker
is restricted to two-part tari� pricing schemes [9]: it can charge consumer agents
a subscription price to examine the current items, and a per item price for each
item the consumer subsequently purchases.
In a general analysis section, we derive the optimal strategies for fully in-
formed broker and consumer agents as functions of the article value and con-
sumer type distributions g and h. Then in Section 4 we explore speci�c cases of
exploration and exploitation through analysis and simulation. When agents are
underinformed and engage in learning, we consider plausible (but not necessarily
fully optimal) agent strategies, and examine the resulting system outcomes. We
close with a summary of our current results and plans for future work.
2 Model
A single producer periodically (at discrete times t = 0; 1; 2; : : :) generates sets of
N articles. It sets a subscription fee F and a price schedule P = fP (1); : : : ; P (N )g,
where P (k) represents the price it charges for a subset of k 2 f1; : : : ; Ng of the
N articles.
At a given time t, each of M consumers are informed about (F;P ), and they
decide whether to subscribe. Then, each subscribing consumer receives abstracts
of all N articles, and uses them to assess its value w
j
from reading each article,
for j 2 f1; : : : ; Ng. We assume that these values are generated randomly ac-
cording to a distribution g(q
i
;w), where q
i
is the set of parameters that de�ne
the distribution for consumer i. The parameters q
i
that represent consumer i's
valuation distribution are themselves generated randomly prior to time 0 from a
distribution h(r; q). Once the q
i
parameters are generated for consumer i, they
remain �xed for the rest of time (though the agents may not know their true
values for a while, if ever).
After assessing the value of each article, a subscriber decides which articles
to purchase. It does this by choosing a set of articles K to maximize surplus
s = (
P
j2K
w
j
) � P (jKj). Henceforth we assume that the articles have been
sorted by consumer j so that the w
j
are ordered highest to lowest, and thus the
set K consists of the �rst k = jKj articles.
The subscription decision depends on consumer expectations. If a consumer
believes correctly that its valuations are drawn from a distribution with param-
eters q
i
, then its expected surplus from purchasing k articles would be
hs
k
i =
*
k
X
j=1
w
j
+
� P (k): (1)
The consumer can then derive the vector of values p
k
, the probabilities that any
k is the expected surplus maximizing number of articles. Then the consumer's
optimized expected surplus is
hsi =
N
X
k=1
p
k
hs
k
i : (2)
The consumer should subscribe if and only if the expectation hsi exceeds the
subscription fee F .
3
If consumer i does not know its own q
i
for the articles o�ered by this broker,
then values w^
j
are the consumer's beliefs drawn from a distribution g(
^
q
i
;w) after
reviewing the abstracts, where the
^
q
i
are the agent's current best estimate of the
valuation parameters q
i
. When the agent purchases and reads articles, it learns
their true valuesw
j
and can then use this sample information to update its beliefs
^
q
i
about the distribution of article values. Therefore, a good consumer strategy
should take into account the value of learning. For example, when uncertainty
about q
i
is high, the consumer might deliberately subscribe even when its esti-
mated surplus is less than F , simply to experience more articles to improve its
valuation estimates. Or, having subscribed the consumer might purchase more
articles than would maximize expected surplus from current reading.
Turning to the producer's problem, it can choose a subscription fee F and
a price schedule P in each period. These should be chosen to maximize some
3
We assume consumers are risk neutral. A risk averse agent would want to optimize
the expectation of a concave function of surplus, and optimal behavior would depend
on second and possibly higher moments of the induced distribution of surplus, not
just the expected surplus. See, e.g., [8].
function of expected current and future pro�ts, where current pro�t can be
expressed as:
� =
M
X
i=1
�
i
(F + P (k
i
)�C(k
i
)) : (3)
with �
i
= 1 if consumer i chooses to subscribe, and 0 otherwise. The cost of
delivering k
i
articles to consumer i is denoted as C(k
i
).
In performing its maximization, the producer must take into account the
e�ect of P and F on the consumer's subscription decisions, and the e�ect of P
on the distribution of k
i
across the set of subscribers. Higher prices will decrease
the number of subscribers and the expected number of articles purchased, of
course. To compute the optimal subscription fee and price schedule the broker
wants to know the distribution h(r; q) from which the consumers' parameters
were generated, and the consumer strategies for subscribing and purchasing.
Based on its current beliefs about r and consumer strategies, the broker can
simulate a consumer population and its responses to various (F;P ) schedules,
and then pick (F;P ) to maximize a value function. In practice, the broker may
not know the consumer type parameters r, nor the consumer strategies. Thus,
the broker may choose (F;P ) to balance current expected pro�t according to
equation (3), against the increase in expected future pro�t from learning about
consumer preferences by observing their behavior when confronted with varying
(F;P ) combinations.
3 General Analysis
In this section, we �rst analyze the expected surplus and number of articles
purchased per subscription period for a rational, fully-informed consumer with a
given valuation distribution g. Then, we derive an expression for a monopolistic
producer's expected pro�t as a function of its price schedule and the distribution
h of valuation parameters q across the consumer population, assuming that all
consumers are rational and fully-informed about their valuation distribution.
A rational, fully-informed producer would choose its price schedule so as to
maximize its expected pro�t.
Suppose that a given consumer has its valuations w drawn from a probabil-
ity density function g(q;w). (The distribution parameters q may vary from one
consumer to another.) For the sake of simplicity, we assume that the producer
constrains itself to a linear price schedule: P (k) = k�. Then a rational consumer
will purchase k articles, where k is the number of articles with valuations ex-
ceeding the threshold P (k) � P (k � 1) = �. The probability p
k
for exactly k
articles to have valuations w > � is
p
k
=
�
N
k
�
G(q;�)
N�k
(1�G(q;�))
k
(4)
where G(q;w) represents the cumulative distribution function that corresponds
to g(q;w), i.e. it is the probability for an article to be valued at less than w.
From this we can compute the expected number of articles purchased:
hk(q;�)i =
N
X
k=1
k p
k
= N [1� G(q;�)] (5)
where the last equality follows from simple manipulations of binomial coe�-
cients.
The expected surplus hsi can be obtained from Eq. 2, provided that we �rst
compute hs
k
i, the expected surplus given that exactly k articles prove to have
valuations w > �. The conditional probability distribution for a single draw
from g(q;w) given that w > � is
~g(q;w) =
g(q;w)
1�G(q;�)
�(w ��) (6)
where �(x) represents the step function, equal to 1 if x > 0 and 0 otherwise.
The expected valuation for this distribution is
�w(q;�) =
Z
1
�
dww ~g(q;w) (7)
The expected sum of k draws from this conditional distribution is k �w(q;�).
Subtracting the price P (k) = k�, we obtain:
hs
k
i = k( �w(�)��) (8)
Inserting this result into Eq. 2, we obtain:
hs(q;�)i =
N
X
k=1
p
k
k [ �w(q;�)��] = [ �w(q;�)��] hk(q;�)i (9)
= N [ �w(q;�)��] [1� G(q;�)]
Note that the consumer will subscribe if and only if hs(q;�)i is greater than the
subscription fee F .
Now we take the producer's perspective. If we assume (for simplicity) that
the cost of producing and delivering k items is C(k) = k , then the expected
pro�t is
h�(F;�)i =
M
X
i=1
�(hs(q
i
;�)i � F )(F + (�� ) hk(q
i
;�)i) (10)
�
Z
q
dqh(r; q)�(hs(q;�)i � F ) (F + (�� ) hk(q;�)i)
where h(r; q) is the probability distribution for the consumers' q parameters, as
de�ned previously. In e�ect, the last approximation replaces the actual realized
set of consumer distribution parameters fq
i
g with an ensemble average over all
possible realizations of a set of M consumers generated from the distribution
h, and this approximation grows increasingly accurate in the limit of large M .
Note that a producer that knows h can compute this pro�t landscape. A fully
knowledgeable and rational producer would set � and F so as to maximize
h�(F;�)i.
4 Rational and Bounded-Rational Players
In the remainder of the paper, we shall assume simple functional forms for g and
h, and explore what happens when the consumers and/or the producer must
learn these distributions. In particular, we suppose that g is a one-parameter
exponential distribution given by g(�;w) = �e
��w
, and that h(�) is a uniform
distribution in the interval from �
min
to �
max
.
4.1 Fully-informed producer and consumers
As a reference point, we analyze the case where the consumers are fully informed
about their individual values of �, the producer knows the distribution h(�), and
the producer and the consumers act so as to maximize their expected gain.
Integrating g, we obtain the cumulative distribution G(�;w) = 1 � e
��w
.
From Eq. 6 we can compute the conditional distribution
~g(�;w) =
g(�;w)
1�G(�)
�(w ��) (11)
= �e
��(w��)
�(w ��):
The average valuation for this conditional distribution is �w(�;�) = � + �
�1
.
Using Eqs. 5 and 9, we obtain the expected number of purchased articles and
the expected surplus (assuming the consumer subscribes):
hk(�)i = Ne
���
(12)
hs(�)i =
N
�
e
���
:
To compute the producer's expected pro�t as a function of � and F , we can
substitute Eqs. 12 into Eq.10, which yields:
h�(F;�)i =
1
(�
max
� �
min
)
Z
�
0
�
min
d� (F + N (�� )e
���
) (13)
=
(�
0
� �
min
)F +N (1�
�
)
�
e
���
min
� e
���
0
�
(�
max
� �
min
)
;
where �
0
is de�ned as �
0
= min(�
max
; ~�) and ~� is de�ned as the unique solution
to Ne
�~��
= F ~�.
profit 0.4 0.3 0.2 0.1
0.005
0 0.25
0.5 0.75
1 0 0.5 1 1.5
2 2.5 3
0
0.2
0.4
fee
price
profit profit
0.225 0.2 0.1
0.005
0 0.25
0.5 0.75
1 0 0.5 1 1.5
2 2.5 3
-0.5
-0.3
-0.1
0.1
0.3
fee
price
profit
Fig. 1. Pro�t landscape �(f;�). h is a uniform distribution with �
min
= 0:5, �
min
= 2:0.
a) Production cost = 0:1. b) Production cost = 0:5.
Eq. 13 can be visualized as a pro�t landscape in which the expected pro�t
is plotted as a function of F and �. It is convenient to de�ne a normalized
expected pro�t � = �=N and a normalized fee f = F=N . Fig. 1 illustrates the
landscape for two di�erent production costs: = 0:1 and = 0:5.
In each such landscape, there are two ridges. The lower ridge, which demar-
cates the boundary beyond which the producer attracts no consumers and thus
makes no pro�t, is de�ned by e
��
min
�
= f�
min
. The upper ridge is described
by a piecewise joining of two nonlinear curves, the simpler one being described
by the relation e
��
max
�
= f�
max
. This portion of the ridge has the following
characteristics:
� it is formed by a discontinuous derivative with respect to f and � (it
changes abruptly from positive to negative)
� for price settings along this ridge, all consumers subscribe
� if the production cost <
crit
, the optimal setting of (f;�) occurs along
this part of the ridge. For the examples of Fig. 1, the critical production
cost
crit
is found to be approximately 0.28
4
, so that Fig. 1a ( = 0:1)
is in this regime, while Fig. 1b ( = 0:5) is not.
The other portion of the ridge is de�ned by a more complex nonlinear relation
between f and�. It is less sharp, resulting from derivatives with respect to f and
� being zero rather than jumping discontinuously from positive to negative.
5
4
It can be shown that, for exponential g and uniform h,
crit
=
�(
crit
)�
min
�
max
+�
min
�
2
max
�
�
min
�
2
max
, which is positive when �
min
> 0.
5
These characteristics only pertain to the case where g is exponential and h is uniform;
we have explored other combinations of functional forms for g and h that yield pro�t
landscapes that are topographically di�erent. For example, if g is exponential and
Figure 2 shows the dependence of the optimal f and � upon the production
cost . There is a discontinuous derivative at =
crit
, due to the switchover
between the two nonlinear curves that de�ne the upper ridge in the landscape.
As one might guess, the pro�t � decreases monotonically with the production
cost. The proportion p of consumers that subscribe is 1 for all �
crit
; for
exceeding this threshold the proportion of subscribers is strictly less than one,
and is given by
~���
min
�
max
��
min
.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0
0.2
0.4
0.6
0.8
1.0
µmin = 0.5 µmax = 2.0
∆
f
pi
p
production cost γ
∆,
f
Optimal ∆ and f vs. γ
Fig. 2. Optimal �, f , �, and p vs. , where h is a uniform distribution with �
min
= 0:5,
�
min
= 2:0.
4.2 Uninformed consumers
For a variety of reasons, a consumer may have to rely on adaptive estimates of
its valuation distribution, g. Suppose for example that consumer i knows that
g is an exponential distribution, but does not know its individual parameter �
i
.
Depending on the consumer's beliefs about the dynamics of the environment, it
might wish to place more or less weight on recent observations. Since the mean of
the distribution g(�;w) = � exp
��w
is 1=�, one reasonable and exible approach
to estimating � is to start with an assumed prior �
0
and after each period of
subscription to update the estimate �^ according to
�^
t+1
= (� �m
t
+ (1� �)(�^
t
)
�1
)
�1
; (14)
where m
t
is the mean of the N valuations received during the subscription
period t, and � is the consumer's \ ightiness" factor. This factor could be set to
h consists of a number of well-separated mass points, the landscape can contain
multiple ridges and peaks.
a constant, or be time-varying; in fact, by setting � = 1=t, the consumer weights
all observations equally.
Suppose that the producer knows the distribution h, and sets f and � to
the \optimal" values given by Fig. 2. (For = 0:1, these values are (f;�) =
(0:30878; 0:24098), at which the \ideal" pro�t � = 0:4137 and the fraction of sub-
scribers p = 1.) Furthermore, suppose that the consumers all start with exactly
the right estimates of their � parameters, but that they update their estimates
in accordance with Eq. 14. Fig. 3 illustrates what happens to consumer pro�ts
and subscription rates as a function of time for two values of �. In both cases,
the proportion p of subscribed consumers continues to diminish inde�nitely, and,
correspondingly, so does the pro�t �.
0
0.2
0.4
0.6
0.8
1
0 40 80 120 160 200
Pr of
it, C
on su
m er
iteration
profit consumers
profit = 0.4137
0
0.2
0.4
0.6
0.8
1
0 40 80 120 160 200
Pr of
it, C
on su
m er
iteration
profit consumers
profit = 0.4137
Fig. 3. Pro�t � and proportion of subscribed consumers p vs. time (in subscription
periods) when consumers estimate their � values, and the ightiness parameter is a)
� = 0:1 and b) � = 0:9. Consumer leakage occurs for � = 0:1, and is even more
pronounced for � = 0:9. Pro�ts diminish inde�nitely, although the rate of reduction
slows with time. The market consists of M = 1000 consumers and one seller o�ering
N = 10 articles per subscription period. Dashed horizontal line indicates ideal pro�t
of 0.4137.
Fig. 4 o�ers further insight into consumer leakage. It shows the expected
average pro�t over 200 subscription periods for a wide range of f and � (not
just the optimal value). These are represented as pro�t landscapes for � =
0:1 and � = 0:9. Both landscapes have become peaked at lower values of f .
In other words, if the producer insists on holding to a �xed price schedule, it
will optimize pro�ts by setting the subscription fee lower than it could charge
perfectly informed consumers, thereby encouraging overly pessimistic consumers
to stay in the market longer. It partially compensates by raising � somewhat.
A little thought suggests that, regardless of , �, or other such parameters,
consumer leakage will eventually lead to complete market failure for any �xed
setting of f and �. As consumers continually update their �^, they are in e�ect
conducting a random walk about the correct value of �. If it should ever happen
that the consumer's �^ rises above a value such that hs(�;�)i falls below F , the
profit 0.325 0.3 0.2 0.1
0.005
0 0.25
0.5 0.75
1 0 0.5 1 1.5
2 2.5 3
0
0.2
0.4
fee
price
profit profit
0.2 0.1
0.005
0 0.25
0.5 0.75
1 0 0.5 1 1.5
2 2.5 3
0
0.2
0.4
fee
price
profit
Fig. 4. Pro�t landscape after consumer leakage has occurred. Mean pro�t per article
per consumer as a function of F and �, after 200 subscription periods with F and
� at the values that would be optimal for perfectly informed consumers. The market
consists ofM = 1000 consumers and one seller o�ering N = 10 articles per subscription
period. a) � = 0:1. b) � = 0:9.
consumer will become permanently disenfranchised from the market. In other
words, the value of �^
crit
at which consumers are discouraged from subscribing
(which, from Eq. 12, can be computed from
1
�^
crit
e
��^
crit
�
< f), acts as an ab-
sorbing boundary. In the next subsection, we will discuss how a producer might
use dynamic pricing and optimization to prevent unchecked pro�t erosion.
4.3 Solving the leakage problem
Consumer leakage hurts both the producer and the consumers, and therefore all
players have an incentive to counteract it. Both consumers and producers can do
this by putting more emphasis on exploration as opposed to (pure) exploitation.
Consumers could use a variety of schemes; for example, they could choose to
subscribe at random with a �nite probability even if their expected surplus is less
than the subscription fee, and this probability could diminish monotonically as
the di�erence between these quantities increases. Producers could �ght leakage
by temporarily decreasing prices to resurrect consumers who have mistakenly
disenfranchised themselves, in hopes that with additional samples the consumers
will increase their estimated surplus to levels that can support higher prices. One
might expect that consumers could be enticed back into the market even with
small discounts, since, once their �^ ventures into a realm where subscription
appears to be unpro�table, it is frozen at this just-barely-unpro�table value.
6
6
Economists have studied behavior of this sort in order to explain temporal price
dispersion, such as \sales" and temporary discounts. A mixture of reasons have
been modeled, including attempts to price discriminate between better and worse
Here we focus on the producer's strategy for enticing overly pessimistic con-
sumers back into the market. (This is not to deny that consumers' exploration
strategies merit serious study.) From our study of consumer leakage, it is appar-
ent that the producer's strategy must involve dynamic pricing, and that it must
cope with a pro�t landscape that changes dynamically due to shifts caused by
consumers' ongoing attempts to learn an estimate of g. It also seems most likely
that the pricing strategy would involve stochastic search on this dynamically
changing landscape, rather than following some pre-planned schedule.
The dynamically changing nature of the pro�t landscape in our problem lim-
its the applicability of standard stochastic optimization techniques. For example,
approaches such as simulated annealing implicitly assume that the search is be-
ing conducted on a static landscape as the value of its temperature parameter
is lowered. While more sophisticated approaches that are currently oriented to-
wards static landscapes might be modi�ed to dynamic ones, we focus here on a
very simple iterative random-mutation hill-climbing method (RMHC). At each
iteration of the RMHC method (i.e. following each subscription period), the pro-
ducer tries a new f and a new �. If the new settings result in increased pro�t,
the producer continues to move in the same direction in the f-� landscape in
the next trial. Otherwise, it backtracks to its previous best point and makes
small adjustments to f and �. The step size of the adjustment is chosen from
an exponential distribution with a mean 1=� for both f and �. To take into
account the dynamic nature of the landscape being maximized, the hill climber
has �nite memory and can remember the previous best point visited within the
last T time steps. We have observed that, for overly large values of T , the pro-
ducer can get mired for a long time in regions of the (f;�) landscape that have
become less pro�table due to evolving consumer expectations.
Fig. 5 illustrates how the pro�t � and the subscribed consumer population
p evolve with time for a particular setting of � = 0:5 and T = 20. For these
settings (and for a fairly broad range around them), the RMHC algorithm pre-
vents long-term consumer leakage and pro�t erosion, even when the consumers
are extremely ighty (� = 0:9). Frequent lowering of prices continually pulls
consumers back into the market so they can keep sampling articles. Ironically,
the RMHC's noisy exploration of the pro�t landscape helps to keep that land-
scape relatively stable. While the pro�t never attains the value of 0.41 that it
would have if the consumers were to keep their �^'s �xed at their exact values,
the producer is still able to extract a healthy, reasonably stable pro�t of roughly
0.25. Fig. 6 illustrates the noisy trajectory of f and �.
5 Conclusions
Commerce in information goods is one of the earliest emerging applications for
intelligent agents in commerce. However, the fundamental characteristics of in-
formation goods mean that they can and likely will be o�ered in widely varying
informed customers, and inducing potential customers to bear some search costs to
�nd a better price or product. See [11], [10], and [12].
0 0.25
0.5
0.75
1
0 250 500 750 1000
pr of
it, c
on su
m er
s
iter
0
0.25
0.5
0.75
1
0 250 500 750 1000
(av g)
pro fit,
fe e,
pri ce
iter
profit fee
price profit = 0.4137
Fig. 5. Simulation run with producer using RMHC algorithm with memory of T = 20
subscription periods and step-size parameter � = 0:5. = 0:1, � = 0:9. a) Propor-
tion p of subscribed consumers. b) Moving average of f , � and � (integrated over 20
subscription periods).
0
0.25
0.5
0.75
1
1.25
1.5
0 0.2 0.4 0.6 0.8 1
pr ice
fee
settings
Fig. 6. Trajectory of f and � taken by the RMHC algorithm for the simulation run
shown above.
con�gurations. Participating agents will need to deal with uncertainty about
both prices and location in multi-dimensional product space. Thus, studying
the behavior of learning agents is central to understanding and designing for
agent-based information economies. Since uncertainty will exist on both sides of
transactions, and interactions between learning agents that are negotiating and
transacting with other learning agents may lead to unexpected dynamics, it is
important to study two-sided learning.
We presented a simple but powerful model of an information bundling econ-
omy with a single producer and multiple consumer agents. We then explored
the pricing and purchasing behavior of these agents when articles can be bun-
dled. In this initial exploration, we studied the dynamics of this economy when
consumer agents are uninformed about the distribution of article values. We
discovered that a reasonable albeit naive consumer learning strategy can have a
profound in uence on market behavior | in this case, a strikingly bad in uence.
Our consumer and producer agents were rather naive in our �rst learning
experiments. This could be viewed as a criticism of our modeling. However, es-
pecially early in the development of adaptive agent intelligence, it may well be
that agent-based markets are quite vulnerable to odd behavior and dysfunctional
dynamics of the sort we observed. Our consumer agents did not recognize the
option value of new information, and thus su�ered by not undertaking su�cient
exploration relative to exploitation. Our producer agent did not initially adapt
to the pathological dynamics induced by the consumer agent naivete, and thus
su�ered by relying too con�dently on its \perfect" but static knowledge. Al-
though it was fairly easy for us to see what was going wrong, and to modify
the producer agent in a simple way that ameliorated much of the problem, our
environment is arti�cially simple and static. In more realistic settings it may be
quite di�cult for even relatively intelligent agents to adapt to emergent patholo-
gies. Human markets may not be as susceptible because human behavior is less
rote and more re ective. The lesson for agent design is to search for strategies
that are dynamically robust and adaptive in the face of substantial uncertainty.
We have started to explore how to make our simple mechanism more robust
in realistic settings. For example, the hill climbing algorithm is likely to get stuck
at local optima. For the uniform h there is a single peak in the pro�t landscape,
but that is not at all general. Several powerful optimization techniques, such as
simulated annealing, exist for static landscapes. Extensions to these that handle
dynamically changing, noisy landscapes, may lead to robustly adaptive agent
learning strategies.
We have an active agenda of continuing work on this topic. For example, we
have begun to consider less naive consumer strategies, that balance exploration
against exploitation. We are considering producer strategies that adapt based on
the number of recent subscribers relative to the producer's model of the optimal
number of subscribers. Perhaps most challenging but essential to a more general
understanding of the problem is the extension of our work into an economy
with multiple producers who are underinformed about each other's competitive
strategies as well as about consumer valuations.
In this and earlier work we have found that initial plausible but simple de-
signs of economically-intelligent agents lead to dynamic market interactions that
can be surprising and unsuccessful. The value of intelligent agents in electronic
commerce will depend on the ability to understand the problems of learning
and adaptivity, and to design agents that interact robustly in the presence of
substantial uncertainty about both parameters and the strategies of other un-
derinformed agents.
References
1. Y. Bakos and E. Brynjolfsson. Bundling information goods: Pricing, pro�ts and
e�ciency. In D. Hurley, B. Kahin, and H. Varian, editors, The Economics of Digital
Information Goods. MIT Press, Cambridge, Massachusetts, 1998.
2. J. C. Chuang and M. A. Sirbu. Network delivery of information goods: Opti-
mal pricing of articles and subscriptions. In D. Hurley, B. Kahin, and H. Varian,
editors, The Economics of Digital Information Goods. MIT Press, Cambridge, Mas-
sachusetts, 1998.
3. S. Fay and J. K. MacKie-Mason. Competition between �rms that bundle.
Manuscript, Dept. of Economics, University of Michigan, 1998.
4. J. E. Hanson and J. O. Kephart. Spontaneous specialization in a free-market
economy of agents. In Proceedings of Workshop on Arti�cial Societies and Compu-
tational Markets at the Second International Conference on Autonomous Agents,
May 1998.
5. J. O. Kephart, J. E. Hanson, D. W. Levine, B. N. Grosof, J. Sairamesh, R. B. Segal,
and S. R. White. Dynamics of an information-�ltering economy. In Proceedings
of Second International Workshop on Cooperative Information Agents (CIA-98),
Paris, July 1998.
6. J. O. Kephart, J. E. Hanson, and J. Sairamesh. Price-war dynamics in a free-
market economy of software agents. In Proceedings of Alife VI, June 1998.
7. J. K. MacKie-Mason and H. R. Varian. Some economics of the internet. In
W. Sichel, editor, Networks, Infrastructure and the New Task for Regulation. Uni-
versity of Michigan Press, Ann Arbor, Michigan, 1996.
8. A. Mas-Colell, M. Whinston, and J. Green. Microeconomic Theory. Oxford Uni-
versity Press, 1995.
9. W. Oi. A Disneyland dilemma: Two-part tari�s for a Mickey Mouse monopoly.
The Quarterly Journal of Economics, 85(1):77{96, February 1971.
10. S. Salop and J. Stiglitz. Bargains and ripo�s: A model of monopolistically com-
petitive price dispersion. Review of Economic Studies, 44:493{510, October 1977.
11. Y. Shilony. Mixed pricing in oligopoly. Journal of Economic Theory, 14:373{388,
April 1977.
12. H. R. Varian. A model of sales. American Economic Review, 70:651{659, Septem-
ber 1980.
13. J. M. Vidal and E. H. Durfee. Learning nested agent models in an information
economy. Journal of Experimental and Theoretical Arti�cial Intelligence (special
issue on learning in distributed arti�cial intelligence systems), 10(3):291{308, 1998.
14. J. M. Vidal and E. H. Durfee. The moving target problem in multi-agent learn-
ing. In Proceedings of the Third International Conference on Multi-agent Systems
(ICMAS98), 1998.Two-sided learning in an agent economy for
information bundles
Je�rey O. Kephart
1
, Rajarshi Das
1
, and Je�rey K. MacKie-Mason
2
1
IBM Research
PO Box 704, Yorktown Heights, NY 10598
fkephart,rajarshig@watson.ibm.com
2
School of Information and Department of Economics
University of Michigan, Ann Arbor, MI 48109
jmm@umich.edu
Abstract. Commerce in information goods is one of the earliest emerg-
ing applications for intelligent agents in commerce. However, the funda-
mental characteristics of information goods mean that they can and likely
will be o�ered in widely varying con�gurations. Participating agents
will need to deal with uncertainty about both prices and location in
multi-dimensional product space. Thus, studying the behavior of learning
agents is central to understanding and designing for agent-based informa-
tion economies. Since uncertainty will exist on both sides of transactions,
and interactions between learning agents that are negotiating and trans-
acting with other learning agents may lead to unexpected dynamics, it
is important to study two-sided learning.
We present a simple but powerful model of an information bundling
economy with a single producer and multiple consumer agents. We ex-
plore the pricing and purchasing behavior of these agents when articles
can be bundled. In this initial exploration, we study the dynamics of this
economy when consumer agents are uninformed about the distribution of
article values. We discover that a reasonable albeit naive consumer learn-
ing strategy can lead to disastrous market behavior. We �nd a simple
explanation for this market failure, and develop a simple improvement
to the producer agent's strategy that largely ameliorates the problem.
But in the process we learn an important lesson: dynamic market in-
teractions when there is substantial uncertainty can lead to pathological
outcomes if agents are designed with \reasonable" but not su�ciently
adaptive strategies. Thus, in programmed agent environments it may be
essential to dramatically increase our understanding of adaptivity and
learning if we want to obtain good aggregate outcomes.
Keywords: Information economy, information bundling, two-
sided learning, economic agents.
1 Introduction
Extraordinary rates of advance in computation and communication technolo-
gies
1
have fueled the much-noted proliferation of electronic commerce. Within a
few years, we anticipate that software agents will participate in a wide variety of
commercial transactions, and may even become economic players in their own
right [5]. One important domain for agent economies is the production and dis-
tribution of information goods and services, such as news articles, entertainment
and other service reviews, and instructional materials. Digital information goods
are unusually con�gurable, and negotiations over the composition and prices for
bundles are a natural application for software agents. However, to do so will
require that broker agents be able to explore not just prices, but also locations
in a multi-dimensional bundled product space.
Very little is understood about strategic search over the joint information
good price and product space (that is, search for which products to o�er, in
which combinations, at what prices).
2
Agents competing in this space must
learn about the distribution of preferences across a heterogeneous and changing
customer population, must learn about the strategies being followed by their
competitors, and then must optimize their strategies to take into account the
new understanding of customers and competitors. Consumers, on the other hand,
must learn about the quantity, quality and price of bundled items o�ered by the
various producers. Learning and optimization are both di�cult search problems,
and in the market context they are closely related.
In earlier work we considered strategic pricing for bundles when all of the
relevant parameters are known by brokers [3]. We distinguished between condi-
tions in which �rms prefer to o�er comprehensive bundles at a single price and
those in which they prefer to sell items individually. Although ours is the �rst
work focusing on competitive strategies, the bundling and pricing of information
goods has been an active area in recent research [1], [2]. We have also studied
price dynamics when competing brokers search a restricted price and product
space [5, 6, 4]. Price instability, characterized as \price wars", was prevalent.
In this paper we examine a considerably more general problem, although
for this initial foray we maintain some simplifying restrictions. We envision an
agent economy with one information broker agent and many consumer agents.
The consumer agents are heterogeneous: they value each item di�erently, and
these values are drawn from a di�erent distribution for each agent. To set pro�t-
maximizing prices, the broker agent desires, but may not know, the parameters
of the consumers' valuation distributions. The consumers also may not know
their valuation parameters until they gain experience with the broker's o�ered
information goods. Therefore, both sides wish to learn. The broker can set prices
strategically, learning about consumer valuations from their purchasing behavior
at di�erent prices. The consumers can purchase strategically, learning about the
distribution of information good values by sampling.
1
See, e.g., [7].
2
For some recent work on multi-agent search, see, e.g., [14, 13].
This environment is very rich and will permit us to explore many interesting
questions. In this paper we focus on one surprising result: that when consumer
agents follow a plausible but overly naive learning strategy, even if the producer is
fully informed (but also somewhat naive), the economy can continuously degen-
erate with disastrous overall performance. We �nd a simple explanation for the
initial failure, and develop a simple improvement to the producer agent's strategy
that largely ameliorates the problem. But in the process we learn an important
lesson: dynamic market interactions when there is substantial uncertainty can
lead to pathological outcomes if agents are designed with \reasonable" but not
su�ciently adaptive strategies. Thus, in programmed agent environments it may
be essential to dramatically increase our understanding of adaptivity and learn-
ing if we want to obtain good aggregate outcomes.
This paper is the �rst in a series of studies directed towards understanding
and developing robust adaptive learning techniques that are e�ective for indi-
vidual agents and lead to acceptable collective (market) behavior. We are also
extending the model to study the much more common setting in which there is
competition among multiple producers of information goods. Of course, this set-
ting exacerbates the learning problem, since the producers need to learn about
each other's strategies in addition to learning about consumer valuations.
In the next section we discuss the model. We introduce two relevant dis-
tributions: �rst, a parameterized distribution g from which a given consumer's
valuations are drawn, and second, a distribution h describing the population of
consumers | the distribution from which an individual consumer's valuation
distribution parameters are drawn. For tractability, we assume that the broker
is restricted to two-part tari� pricing schemes [9]: it can charge consumer agents
a subscription price to examine the current items, and a per item price for each
item the consumer subsequently purchases.
In a general analysis section, we derive the optimal strategies for fully in-
formed broker and consumer agents as functions of the article value and con-
sumer type distributions g and h. Then in Section 4 we explore speci�c cases of
exploration and exploitation through analysis and simulation. When agents are
underinformed and engage in learning, we consider plausible (but not necessarily
fully optimal) agent strategies, and examine the resulting system outcomes. We
close with a summary of our current results and plans for future work.
2 Model
A single producer periodically (at discrete times t = 0; 1; 2; : : :) generates sets of
N articles. It sets a subscription fee F and a price schedule P = fP (1); : : : ; P (N )g,
where P (k) represents the price it charges for a subset of k 2 f1; : : : ; Ng of the
N articles.
At a given time t, each of M consumers are informed about (F;P ), and they
decide whether to subscribe. Then, each subscribing consumer receives abstracts
of all N articles, and uses them to assess its value w
j
from reading each article,
for j 2 f1; : : : ; Ng. We assume that these values are generated randomly ac-
cording to a distribution g(q
i
;w), where q
i
is the set of parameters that de�ne
the distribution for consumer i. The parameters q
i
that represent consumer i's
valuation distribution are themselves generated randomly prior to time 0 from a
distribution h(r; q). Once the q
i
parameters are generated for consumer i, they
remain �xed for the rest of time (though the agents may not know their true
values for a while, if ever).
After assessing the value of each article, a subscriber decides which articles
to purchase. It does this by choosing a set of articles K to maximize surplus
s = (
P
j2K
w
j
) � P (jKj). Henceforth we assume that the articles have been
sorted by consumer j so that the w
j
are ordered highest to lowest, and thus the
set K consists of the �rst k = jKj articles.
The subscription decision depends on consumer expectations. If a consumer
believes correctly that its valuations are drawn from a distribution with param-
eters q
i
, then its expected surplus from purchasing k articles would be
hs
k
i =
*
k
X
j=1
w
j
+
� P (k): (1)
The consumer can then derive the vector of values p
k
, the probabilities that any
k is the expected surplus maximizing number of articles. Then the consumer's
optimized expected surplus is
hsi =
N
X
k=1
p
k
hs
k
i : (2)
The consumer should subscribe if and only if the expectation hsi exceeds the
subscription fee F .
3
If consumer i does not know its own q
i
for the articles o�ered by this broker,
then values w^
j
are the consumer's beliefs drawn from a distribution g(
^
q
i
;w) after
reviewing the abstracts, where the
^
q
i
are the agent's current best estimate of the
valuation parameters q
i
. When the agent purchases and reads articles, it learns
their true valuesw
j
and can then use this sample information to update its beliefs
^
q
i
about the distribution of article values. Therefore, a good consumer strategy
should take into account the value of learning. For example, when uncertainty
about q
i
is high, the consumer might deliberately subscribe even when its esti-
mated surplus is less than F , simply to experience more articles to improve its
valuation estimates. Or, having subscribed the consumer might purchase more
articles than would maximize expected surplus from current reading.
Turning to the producer's problem, it can choose a subscription fee F and
a price schedule P in each period. These should be chosen to maximize some
3
We assume consumers are risk neutral. A risk averse agent would want to optimize
the expectation of a concave function of surplus, and optimal behavior would depend
on second and possibly higher moments of the induced distribution of surplus, not
just the expected surplus. See, e.g., [8].
function of expected current and future pro�ts, where current pro�t can be
expressed as:
� =
M
X
i=1
�
i
(F + P (k
i
)�C(k
i
)) : (3)
with �
i
= 1 if consumer i chooses to subscribe, and 0 otherwise. The cost of
delivering k
i
articles to consumer i is denoted as C(k
i
).
In performing its maximization, the producer must take into account the
e�ect of P and F on the consumer's subscription decisions, and the e�ect of P
on the distribution of k
i
across the set of subscribers. Higher prices will decrease
the number of subscribers and the expected number of articles purchased, of
course. To compute the optimal subscription fee and price schedule the broker
wants to know the distribution h(r; q) from which the consumers' parameters
were generated, and the consumer strategies for subscribing and purchasing.
Based on its current beliefs about r and consumer strategies, the broker can
simulate a consumer population and its responses to various (F;P ) schedules,
and then pick (F;P ) to maximize a value function. In practice, the broker may
not know the consumer type parameters r, nor the consumer strategies. Thus,
the broker may choose (F;P ) to balance current expected pro�t according to
equation (3), against the increase in expected future pro�t from learning about
consumer preferences by observing their behavior when confronted with varying
(F;P ) combinations.
3 General Analysis
In this section, we �rst analyze the expected surplus and number of articles
purchased per subscription period for a rational, fully-informed consumer with a
given valuation distribution g. Then, we derive an expression for a monopolistic
producer's expected pro�t as a function of its price schedule and the distribution
h of valuation parameters q across the consumer population, assuming that all
consumers are rational and fully-informed about their valuation distribution.
A rational, fully-informed producer would choose its price schedule so as to
maximize its expected pro�t.
Suppose that a given consumer has its valuations w drawn from a probabil-
ity density function g(q;w). (The distribution parameters q may vary from one
consumer to another.) For the sake of simplicity, we assume that the producer
constrains itself to a linear price schedule: P (k) = k�. Then a rational consumer
will purchase k articles, where k is the number of articles with valuations ex-
ceeding the threshold P (k) � P (k � 1) = �. The probability p
k
for exactly k
articles to have valuations w > � is
p
k
=
�
N
k
�
G(q;�)
N�k
(1�G(q;�))
k
(4)
where G(q;w) represents the cumulative distribution function that corresponds
to g(q;w), i.e. it is the probability for an article to be valued at less than w.
From this we can compute the expected number of articles purchased:
hk(q;�)i =
N
X
k=1
k p
k
= N [1� G(q;�)] (5)
where the last equality follows from simple manipulations of binomial coe�-
cients.
The expected surplus hsi can be obtained from Eq. 2, provided that we �rst
compute hs
k
i, the expected surplus given that exactly k articles prove to have
valuations w > �. The conditional probability distribution for a single draw
from g(q;w) given that w > � is
~g(q;w) =
g(q;w)
1�G(q;�)
�(w ��) (6)
where �(x) represents the step function, equal to 1 if x > 0 and 0 otherwise.
The expected valuation for this distribution is
�w(q;�) =
Z
1
�
dww ~g(q;w) (7)
The expected sum of k draws from this conditional distribution is k �w(q;�).
Subtracting the price P (k) = k�, we obtain:
hs
k
i = k( �w(�)��) (8)
Inserting this result into Eq. 2, we obtain:
hs(q;�)i =
N
X
k=1
p
k
k [ �w(q;�)��] = [ �w(q;�)��] hk(q;�)i (9)
= N [ �w(q;�)��] [1� G(q;�)]
Note that the consumer will subscribe if and only if hs(q;�)i is greater than the
subscription fee F .
Now we take the producer's perspective. If we assume (for simplicity) that
the cost of producing and delivering k items is C(k) = k , then the expected
pro�t is
h�(F;�)i =
M
X
i=1
�(hs(q
i
;�)i � F )(F + (�� ) hk(q
i
;�)i) (10)
�
Z
q
dqh(r; q)�(hs(q;�)i � F ) (F + (�� ) hk(q;�)i)
where h(r; q) is the probability distribution for the consumers' q parameters, as
de�ned previously. In e�ect, the last approximation replaces the actual realized
set of consumer distribution parameters fq
i
g with an ensemble average over all
possible realizations of a set of M consumers generated from the distribution
h, and this approximation grows increasingly accurate in the limit of large M .
Note that a producer that knows h can compute this pro�t landscape. A fully
knowledgeable and rational producer would set � and F so as to maximize
h�(F;�)i.
4 Rational and Bounded-Rational Players
In the remainder of the paper, we shall assume simple functional forms for g and
h, and explore what happens when the consumers and/or the producer must
learn these distributions. In particular, we suppose that g is a one-parameter
exponential distribution given by g(�;w) = �e
��w
, and that h(�) is a uniform
distribution in the interval from �
min
to �
max
.
4.1 Fully-informed producer and consumers
As a reference point, we analyze the case where the consumers are fully informed
about their individual values of �, the producer knows the distribution h(�), and
the producer and the consumers act so as to maximize their expected gain.
Integrating g, we obtain the cumulative distribution G(�;w) = 1 � e
��w
.
From Eq. 6 we can compute the conditional distribution
~g(�;w) =
g(�;w)
1�G(�)
�(w ��) (11)
= �e
��(w��)
�(w ��):
The average valuation for this conditional distribution is �w(�;�) = � + �
�1
.
Using Eqs. 5 and 9, we obtain the expected number of purchased articles and
the expected surplus (assuming the consumer subscribes):
hk(�)i = Ne
���
(12)
hs(�)i =
N
�
e
���
:
To compute the producer's expected pro�t as a function of � and F , we can
substitute Eqs. 12 into Eq.10, which yields:
h�(F;�)i =
1
(�
max
� �
min
)
Z
�
0
�
min
d� (F + N (�� )e
���
) (13)
=
(�
0
� �
min
)F +N (1�
�
)
�
e
���
min
� e
���
0
�
(�
max
� �
min
)
;
where �
0
is de�ned as �
0
= min(�
max
; ~�) and ~� is de�ned as the unique solution
to Ne
�~��
= F ~�.
profit 0.4 0.3 0.2 0.1
0.005
0 0.25
0.5 0.75
1 0 0.5 1 1.5
2 2.5 3
0
0.2
0.4
fee
price
profit profit
0.225 0.2 0.1
0.005
0 0.25
0.5 0.75
1 0 0.5 1 1.5
2 2.5 3
-0.5
-0.3
-0.1
0.1
0.3
fee
price
profit
Fig. 1. Pro�t landscape �(f;�). h is a uniform distribution with �
min
= 0:5, �
min
= 2:0.
a) Production cost = 0:1. b) Production cost = 0:5.
Eq. 13 can be visualized as a pro�t landscape in which the expected pro�t
is plotted as a function of F and �. It is convenient to de�ne a normalized
expected pro�t � = �=N and a normalized fee f = F=N . Fig. 1 illustrates the
landscape for two di�erent production costs: = 0:1 and = 0:5.
In each such landscape, there are two ridges. The lower ridge, which demar-
cates the boundary beyond which the producer attracts no consumers and thus
makes no pro�t, is de�ned by e
��
min
�
= f�
min
. The upper ridge is described
by a piecewise joining of two nonlinear curves, the simpler one being described
by the relation e
��
max
�
= f�
max
. This portion of the ridge has the following
characteristics:
� it is formed by a discontinuous derivative with respect to f and � (it
changes abruptly from positive to negative)
� for price settings along this ridge, all consumers subscribe
� if the production cost <
crit
, the optimal setting of (f;�) occurs along
this part of the ridge. For the examples of Fig. 1, the critical production
cost
crit
is found to be approximately 0.28
4
, so that Fig. 1a ( = 0:1)
is in this regime, while Fig. 1b ( = 0:5) is not.
The other portion of the ridge is de�ned by a more complex nonlinear relation
between f and�. It is less sharp, resulting from derivatives with respect to f and
� being zero rather than jumping discontinuously from positive to negative.
5
4
It can be shown that, for exponential g and uniform h,
crit
=
�(
crit
)�
min
�
max
+�
min
�
2
max
�
�
min
�
2
max
, which is positive when �
min
> 0.
5
These characteristics only pertain to the case where g is exponential and h is uniform;
we have explored other combinations of functional forms for g and h that yield pro�t
landscapes that are topographically di�erent. For example, if g is exponential and
Figure 2 shows the dependence of the optimal f and � upon the production
cost . There is a discontinuous derivative at =
crit
, due to the switchover
between the two nonlinear curves that de�ne the upper ridge in the landscape.
As one might guess, the pro�t � decreases monotonically with the production
cost. The proportion p of consumers that subscribe is 1 for all �
crit
; for
exceeding this threshold the proportion of subscribers is strictly less than one,
and is given by
~���
min
�
max
��
min
.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0
0.2
0.4
0.6
0.8
1.0
µmin = 0.5 µmax = 2.0
∆
f
pi
p
production cost γ
∆,
f
Optimal ∆ and f vs. γ
Fig. 2. Optimal �, f , �, and p vs. , where h is a uniform distribution with �
min
= 0:5,
�
min
= 2:0.
4.2 Uninformed consumers
For a variety of reasons, a consumer may have to rely on adaptive estimates of
its valuation distribution, g. Suppose for example that consumer i knows that
g is an exponential distribution, but does not know its individual parameter �
i
.
Depending on the consumer's beliefs about the dynamics of the environment, it
might wish to place more or less weight on recent observations. Since the mean of
the distribution g(�;w) = � exp
��w
is 1=�, one reasonable and exible approach
to estimating � is to start with an assumed prior �
0
and after each period of
subscription to update the estimate �^ according to
�^
t+1
= (� �m
t
+ (1� �)(�^
t
)
�1
)
�1
; (14)
where m
t
is the mean of the N valuations received during the subscription
period t, and � is the consumer's \ ightiness" factor. This factor could be set to
h consists of a number of well-separated mass points, the landscape can contain
multiple ridges and peaks.
a constant, or be time-varying; in fact, by setting � = 1=t, the consumer weights
all observations equally.
Suppose that the producer knows the distribution h, and sets f and � to
the \optimal" values given by Fig. 2. (For = 0:1, these values are (f;�) =
(0:30878; 0:24098), at which the \ideal" pro�t � = 0:4137 and the fraction of sub-
scribers p = 1.) Furthermore, suppose that the consumers all start with exactly
the right estimates of their � parameters, but that they update their estimates
in accordance with Eq. 14. Fig. 3 illustrates what happens to consumer pro�ts
and subscription rates as a function of time for two values of �. In both cases,
the proportion p of subscribed consumers continues to diminish inde�nitely, and,
correspondingly, so does the pro�t �.
0
0.2
0.4
0.6
0.8
1
0 40 80 120 160 200
Pr of
it, C
on su
m er
iteration
profit consumers
profit = 0.4137
0
0.2
0.4
0.6
0.8
1
0 40 80 120 160 200
Pr of
it, C
on su
m er
iteration
profit consumers
profit = 0.4137
Fig. 3. Pro�t � and proportion of subscribed consumers p vs. time (in subscription
periods) when consumers estimate their � values, and the ightiness parameter is a)
� = 0:1 and b) � = 0:9. Consumer leakage occurs for � = 0:1, and is even more
pronounced for � = 0:9. Pro�ts diminish inde�nitely, although the rate of reduction
slows with time. The market consists of M = 1000 consumers and one seller o�ering
N = 10 articles per subscription period. Dashed horizontal line indicates ideal pro�t
of 0.4137.
Fig. 4 o�ers further insight into consumer leakage. It shows the expected
average pro�t over 200 subscription periods for a wide range of f and � (not
just the optimal value). These are represented as pro�t landscapes for � =
0:1 and � = 0:9. Both landscapes have become peaked at lower values of f .
In other words, if the producer insists on holding to a �xed price schedule, it
will optimize pro�ts by setting the subscription fee lower than it could charge
perfectly informed consumers, thereby encouraging overly pessimistic consumers
to stay in the market longer. It partially compensates by raising � somewhat.
A little thought suggests that, regardless of , �, or other such parameters,
consumer leakage will eventually lead to complete market failure for any �xed
setting of f and �. As consumers continually update their �^, they are in e�ect
conducting a random walk about the correct value of �. If it should ever happen
that the consumer's �^ rises above a value such that hs(�;�)i falls below F , the
profit 0.325 0.3 0.2 0.1
0.005
0 0.25
0.5 0.75
1 0 0.5 1 1.5
2 2.5 3
0
0.2
0.4
fee
price
profit profit
0.2 0.1
0.005
0 0.25
0.5 0.75
1 0 0.5 1 1.5
2 2.5 3
0
0.2
0.4
fee
price
profit
Fig. 4. Pro�t landscape after consumer leakage has occurred. Mean pro�t per article
per consumer as a function of F and �, after 200 subscription periods with F and
� at the values that would be optimal for perfectly informed consumers. The market
consists ofM = 1000 consumers and one seller o�ering N = 10 articles per subscription
period. a) � = 0:1. b) � = 0:9.
consumer will become permanently disenfranchised from the market. In other
words, the value of �^
crit
at which consumers are discouraged from subscribing
(which, from Eq. 12, can be computed from
1
�^
crit
e
��^
crit
�
< f), acts as an ab-
sorbing boundary. In the next subsection, we will discuss how a producer might
use dynamic pricing and optimization to prevent unchecked pro�t erosion.
4.3 Solving the leakage problem
Consumer leakage hurts both the producer and the consumers, and therefore all
players have an incentive to counteract it. Both consumers and producers can do
this by putting more emphasis on exploration as opposed to (pure) exploitation.
Consumers could use a variety of schemes; for example, they could choose to
subscribe at random with a �nite probability even if their expected surplus is less
than the subscription fee, and this probability could diminish monotonically as
the di�erence between these quantities increases. Producers could �ght leakage
by temporarily decreasing prices to resurrect consumers who have mistakenly
disenfranchised themselves, in hopes that with additional samples the consumers
will increase their estimated surplus to levels that can support higher prices. One
might expect that consumers could be enticed back into the market even with
small discounts, since, once their �^ ventures into a realm where subscription
appears to be unpro�table, it is frozen at this just-barely-unpro�table value.
6
6
Economists have studied behavior of this sort in order to explain temporal price
dispersion, such as \sales" and temporary discounts. A mixture of reasons have
been modeled, including attempts to price discriminate between better and worse
Here we focus on the producer's strategy for enticing overly pessimistic con-
sumers back into the market. (This is not to deny that consumers' exploration
strategies merit serious study.) From our study of consumer leakage, it is appar-
ent that the producer's strategy must involve dynamic pricing, and that it must
cope with a pro�t landscape that changes dynamically due to shifts caused by
consumers' ongoing attempts to learn an estimate of g. It also seems most likely
that the pricing strategy would involve stochastic search on this dynamically
changing landscape, rather than following some pre-planned schedule.
The dynamically changing nature of the pro�t landscape in our problem lim-
its the applicability of standard stochastic optimization techniques. For example,
approaches such as simulated annealing implicitly assume that the search is be-
ing conducted on a static landscape as the value of its temperature parameter
is lowered. While more sophisticated approaches that are currently oriented to-
wards static landscapes might be modi�ed to dynamic ones, we focus here on a
very simple iterative random-mutation hill-climbing method (RMHC). At each
iteration of the RMHC method (i.e. following each subscription period), the pro-
ducer tries a new f and a new �. If the new settings result in increased pro�t,
the producer continues to move in the same direction in the f-� landscape in
the next trial. Otherwise, it backtracks to its previous best point and makes
small adjustments to f and �. The step size of the adjustment is chosen from
an exponential distribution with a mean 1=� for both f and �. To take into
account the dynamic nature of the landscape being maximized, the hill climber
has �nite memory and can remember the previous best point visited within the
last T time steps. We have observed that, for overly large values of T , the pro-
ducer can get mired for a long time in regions of the (f;�) landscape that have
become less pro�table due to evolving consumer expectations.
Fig. 5 illustrates how the pro�t � and the subscribed consumer population
p evolve with time for a particular setting of � = 0:5 and T = 20. For these
settings (and for a fairly broad range around them), the RMHC algorithm pre-
vents long-term consumer leakage and pro�t erosion, even when the consumers
are extremely ighty (� = 0:9). Frequent lowering of prices continually pulls
consumers back into the market so they can keep sampling articles. Ironically,
the RMHC's noisy exploration of the pro�t landscape helps to keep that land-
scape relatively stable. While the pro�t never attains the value of 0.41 that it
would have if the consumers were to keep their �^'s �xed at their exact values,
the producer is still able to extract a healthy, reasonably stable pro�t of roughly
0.25. Fig. 6 illustrates the noisy trajectory of f and �.
5 Conclusions
Commerce in information goods is one of the earliest emerging applications for
intelligent agents in commerce. However, the fundamental characteristics of in-
formation goods mean that they can and likely will be o�ered in widely varying
informed customers, and inducing potential customers to bear some search costs to
�nd a better price or product. See [11], [10], and [12].
0 0.25
0.5
0.75
1
0 250 500 750 1000
pr of
it, c
on su
m er
s
iter
0
0.25
0.5
0.75
1
0 250 500 750 1000
(av g)
pro fit,
fe e,
pri ce
iter
profit fee
price profit = 0.4137
Fig. 5. Simulation run with producer using RMHC algorithm with memory of T = 20
subscription periods and step-size parameter � = 0:5. = 0:1, � = 0:9. a) Propor-
tion p of subscribed consumers. b) Moving average of f , � and � (integrated over 20
subscription periods).
0
0.25
0.5
0.75
1
1.25
1.5
0 0.2 0.4 0.6 0.8 1
pr ice
fee
settings
Fig. 6. Trajectory of f and � taken by the RMHC algorithm for the simulation run
shown above.
con�gurations. Participating agents will need to deal with uncertainty about
both prices and location in multi-dimensional product space. Thus, studying
the behavior of learning agents is central to understanding and designing for
agent-based information economies. Since uncertainty will exist on both sides of
transactions, and interactions between learning agents that are negotiating and
transacting with other learning agents may lead to unexpected dynamics, it is
important to study two-sided learning.
We presented a simple but powerful model of an information bundling econ-
omy with a single producer and multiple consumer agents. We then explored
the pricing and purchasing behavior of these agents when articles can be bun-
dled. In this initial exploration, we studied the dynamics of this economy when
consumer agents are uninformed about the distribution of article values. We
discovered that a reasonable albeit naive consumer learning strategy can have a
profound in uence on market behavior | in this case, a strikingly bad in uence.
Our consumer and producer agents were rather naive in our �rst learning
experiments. This could be viewed as a criticism of our modeling. However, es-
pecially early in the development of adaptive agent intelligence, it may well be
that agent-based markets are quite vulnerable to odd behavior and dysfunctional
dynamics of the sort we observed. Our consumer agents did not recognize the
option value of new information, and thus su�ered by not undertaking su�cient
exploration relative to exploitation. Our producer agent did not initially adapt
to the pathological dynamics induced by the consumer agent naivete, and thus
su�ered by relying too con�dently on its \perfect" but static knowledge. Al-
though it was fairly easy for us to see what was going wrong, and to modify
the producer agent in a simple way that ameliorated much of the problem, our
environment is arti�cially simple and static. In more realistic settings it may be
quite di�cult for even relatively intelligent agents to adapt to emergent patholo-
gies. Human markets may not be as susceptible because human behavior is less
rote and more re ective. The lesson for agent design is to search for strategies
that are dynamically robust and adaptive in the face of substantial uncertainty.
We have started to explore how to make our simple mechanism more robust
in realistic settings. For example, the hill climbing algorithm is likely to get stuck
at local optima. For the uniform h there is a single peak in the pro�t landscape,
but that is not at all general. Several powerful optimization techniques, such as
simulated annealing, exist for static landscapes. Extensions to these that handle
dynamically changing, noisy landscapes, may lead to robustly adaptive agent
learning strategies.
We have an active agenda of continuing work on this topic. For example, we
have begun to consider less naive consumer strategies, that balance exploration
against exploitation. We are considering producer strategies that adapt based on
the number of recent subscribers relative to the producer's model of the optimal
number of subscribers. Perhaps most challenging but essential to a more general
understanding of the problem is the extension of our work into an economy
with multiple producers who are underinformed about each other's competitive
strategies as well as about consumer valuations.
In this and earlier work we have found that initial plausible but simple de-
signs of economically-intelligent agents lead to dynamic market interactions that
can be surprising and unsuccessful. The value of intelligent agents in electronic
commerce will depend on the ability to understand the problems of learning
and adaptivity, and to design agents that interact robustly in the presence of
substantial uncertainty about both parameters and the strategies of other un-
derinformed agents.
References
1. Y. Bakos and E. Brynjolfsson. Bundling information goods: Pricing, pro�ts and
e�ciency. In D. Hurley, B. Kahin, and H. Varian, editors, The Economics of Digital
Information Goods. MIT Press, Cambridge, Massachusetts, 1998.
2. J. C. Chuang and M. A. Sirbu. Network delivery of information goods: Opti-
mal pricing of articles and subscriptions. In D. Hurley, B. Kahin, and H. Varian,
editors, The Economics of Digital Information Goods. MIT Press, Cambridge, Mas-
sachusetts, 1998.
3. S. Fay and J. K. MacKie-Mason. Competition between �rms that bundle.
Manuscript, Dept. of Economics, University of Michigan, 1998.
4. J. E. Hanson and J. O. Kephart. Spontaneous specialization in a free-market
economy of agents. In Proceedings of Workshop on Arti�cial Societies and Compu-
tational Markets at the Second International Conference on Autonomous Agents,
May 1998.
5. J. O. Kephart, J. E. Hanson, D. W. Levine, B. N. Grosof, J. Sairamesh, R. B. Segal,
and S. R. White. Dynamics of an information-�ltering economy. In Proceedings
of Second International Workshop on Cooperative Information Agents (CIA-98),
Paris, July 1998.
6. J. O. Kephart, J. E. Hanson, and J. Sairamesh. Price-war dynamics in a free-
market economy of software agents. In Proceedings of Alife VI, June 1998.
7. J. K. MacKie-Mason and H. R. Varian. Some economics of the internet. In
W. Sichel, editor, Networks, Infrastructure and the New Task for Regulation. Uni-
versity of Michigan Press, Ann Arbor, Michigan, 1996.
8. A. Mas-Colell, M. Whinston, and J. Green. Microeconomic Theory. Oxford Uni-
versity Press, 1995.
9. W. Oi. A Disneyland dilemma: Two-part tari�s for a Mickey Mouse monopoly.
The Quarterly Journal of Economics, 85(1):77{96, February 1971.
10. S. Salop and J. Stiglitz. Bargains and ripo�s: A model of monopolistically com-
petitive price dispersion. Review of Economic Studies, 44:493{510, October 1977.
11. Y. Shilony. Mixed pricing in oligopoly. Journal of Economic Theory, 14:373{388,
April 1977.
12. H. R. Varian. A model of sales. American Economic Review, 70:651{659, Septem-
ber 1980.
13. J. M. Vidal and E. H. Durfee. Learning nested agent models in an information
economy. Journal of Experimental and Theoretical Arti�cial Intelligence (special
issue on learning in distributed arti�cial intelligence systems), 10(3):291{308, 1998.
14. J. M. Vidal and E. H. Durfee. The moving target problem in multi-agent learn-
ing. In Proceedings of the Third International Conference on Multi-agent Systems
(ICMAS98), 1998.
Comments







