This paper considers undiscounted Markov Decision Problems. For the general multichain case, we obtain necessary and sufficient conditions which guarantee that the maximal total expected reward for a planning horizon of n epochs minus n times the long run average expected reward has a finite limit as n approaches infinity for each initial state and each final reward vector. In addition, we obtain a characterization of the chain and periodicity structure of the set of one-step and J-step maximal gain policies. Finally, we discuss the asymptotic properties of the undiscounted value-iteration method.
Schweitzer, Paul, and Awi Federgruen. "The asymptotic behavior of undiscounted value iteration in Markov decision problems." Mathematics of Operations Research 2, no. 4 (November 1977): 360-381.
Each author name for a Columbia Business School faculty member is linked to a faculty research page, which lists additional publications by that faculty member.
Each topic is linked to an index of publications on that topic.