Free Essay

Submitted By syedbabarali

Words 38376

Pages 154

Words 38376

Pages 154

by Aijun Zhang

A dissertation submitted in partial fulﬁllment of the requirements for the degree of Doctor of Philosophy (Statistics) in The University of Michigan 2009

Doctoral Committee: Professor Vijayan N. Nair, Co-Chair Agus Sudjianto, Co-Chair, Bank of America Professor Tailen Hsing Associate Professor Jionghua Jin Associate Professor Ji Zhu

c

Aijun Zhang

2009

All Rights Reserved

To my elementary school, high school and university teachers

ii

ACKNOWLEDGEMENTS

First of all, I would express my gratitude to my advisor Prof. Vijay Nair for guiding me during the entire PhD research. I appreciate his inspiration, encouragement and protection through these valuable years at the University of Michigan. I am thankful to Julian Faraway for his encouragement during the ﬁrst years of my PhD journey. I would also like to thank Ji Zhu, Judy Jin and Tailen Hsing for serving on my doctoral committee and helpful discussions on this thesis and other research works. I am grateful to Dr. Agus Sudjianto, my co-advisor from Bank of America, for giving me the opportunity to work with him during the summers of 2006 and 2007 and for oﬀering me a full-time position. I appreciate his guidance, active support and his many illuminating ideas. I would also like to thank Tony Nobili, Mike Bonn, Ruilong He, Shelly Ennis, Xuejun Zhou, Arun Pinto, and others I ﬁrst met in 2006 at the Bank. They all persuaded me to jump into the area of credit risk research; I did it a year later and ﬁnally came up with this thesis within two more years. I would extend my acknowledgement to Prof. Kai-Tai Fang for his consistent encouragement ever since I graduated from his class in Hong Kong 5 years ago. This thesis is impossible without the love and remote support of my parents in China. To them, I am most indebted.

iii

TABLE OF CONTENTS

DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER I. An Introduction to Credit Risk Modeling . . . . . . . . . . . . 1.1 Two Worlds of Credit Risk . . . . . . . . 1.1.1 Credit Spread Puzzle . . . . . . 1.1.2 Actual Defaults . . . . . . . . . Credit Risk Models . . . . . . . . . . . . 1.2.1 Structural Approach . . . . . . 1.2.2 Intensity-based Approach . . . . Survival Models . . . . . . . . . . . . . . 1.3.1 Parametrizing Default Intensity 1.3.2 Incorporating Covariates . . . . 1.3.3 Correlating Credit Defaults . . . Scope of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ii iii vi ix x

1 2 2 5 6 6 9 11 12 13 15 18 24 24 28 30 32 36 38

1.2

1.3

1.4

II. Adaptive Smoothing Spline . . . . . . . . . . . . . . . . . . . . . 2.1 2.2 Introduction . . . . . . . . . . . . . AdaSS: Adaptive Smoothing Spline . 2.2.1 Reproducing Kernels . . . 2.2.2 Local Penalty Adaptation . AdaSS Properties . . . . . . . . . . Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.3 2.4

iv

2.5 2.6

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Technical Proofs . . . . . . . . . . . . . . . . . . . . . . . . .

45 46 48 48 51 58 58 60 65 67 68 69 73 74 78 80 82 86 88 88 93 94 95 97 104 104 109 111 112 114 116 119 119 124 129 130

III. Vintage Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . 3.1 3.2 3.3 Introduction . . . . . . . . . . . . . . . . . . MEV Decomposition Framework . . . . . . . Gaussian Process Models . . . . . . . . . . . 3.3.1 Covariance Kernels . . . . . . . . . 3.3.2 Kriging, Spline and Kernel Methods 3.3.3 MEV Backﬁtting Algorithm . . . . Semiparametric Regression . . . . . . . . . . 3.4.1 Single Segment . . . . . . . . . . . 3.4.2 Multiple Segments . . . . . . . . . . Applications in Credit Risk Modeling . . . . 3.5.1 Simulation Study . . . . . . . . . . 3.5.2 Corporate Default Rates . . . . . . 3.5.3 Retail Loan Loss Rates . . . . . . . Discussion . . . . . . . . . . . . . . . . . . . Technical Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.4

3.5

3.6 3.7

IV. Dual-time Survival Analysis . . . . . . . . . . . . . . . . . . . . . 4.1 4.2 Introduction . . . . . . . . . . . . . . . . . . Nonparametric Methods . . . . . . . . . . . . 4.2.1 Empirical Hazards . . . . . . . . . . 4.2.2 DtBreslow Estimator . . . . . . . . 4.2.3 MEV Decomposition . . . . . . . . Structural Models . . . . . . . . . . . . . . . 4.3.1 First-passage-time Parameterization 4.3.2 Incorporation of Covariate Eﬀects . Dual-time Cox Regression . . . . . . . . . . . 4.4.1 Dual-time Cox Models . . . . . . . 4.4.2 Partial Likelihood Estimation . . . 4.4.3 Frailty-type Vintage Eﬀects . . . . . Applications in Retail Credit Risk Modeling . 4.5.1 Credit Card Portfolios . . . . . . . 4.5.2 Mortgage Competing Risks . . . . . Summary . . . . . . . . . . . . . . . . . . . . Supplementary Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.3

4.4

4.5

4.6 4.7

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

v

LIST OF FIGURES

Figure 1.1 Moody’s-rated corporate default rates, bond spreads and NBERdated recessions. Data sources: a) Moody’s Baa & Aaa corporate bond yields (http://research.stlouisfed.org/fred2/categories/119); b) Moody’s Special Comment on Corporate Default and Recovery Rates, 1920-2008 (http://www.moodys.com/); c) NBER-dated recessions (http://www.nber.org/cycles/). . . . . . . . . . . . . . . . . . . Simulated drifted Wiener process, ﬁrst-passage-time and hazard rate. Illustration of retail credit portfolios and vintage diagram. . . . . . Moody’s speculative-grade default rates for annual cohorts 1970-2008: projection views in lifetime, calendar and vintage origination time. . A road map of thesis developments of statistical methods in credit risk modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Heterogeneous smooth functions: (1) Doppler function simulated with noise, (2) HeaviSine function simulated with noise, (3) MotorcycleAccident experimental data, and (4) Credit-Risk tweaked sample. . . OrdSS function estimate and its 2nd-order derivative (upon standardization): scaled signal from f (t) = sin(ωt) (upper panel) and f (t) = sin(ωt4 ) (lower panel), ω = 10π, n = 100 and snr = 7. The sin(ωt4 ) signal resembles the Doppler function in Figure 2.1; both have time-varying frequency. . . . . . . . . . . . . . . . . . . . . . . OrdSS curve ﬁtting with m = 2 (shown by solid lines). The dashed lines represent 95% conﬁdence intervals. In the credit-risk case, the log loss rates are considered as the responses, and the time-dependent weights are speciﬁed proportional to the number of replicates. . . .

4 8 19

1.2 1.3 1.4

21

1.5

22

2.1

26

2.2

34

2.3

39

vi

2.4

Simulation study of Doppler and HeaviSine functions: OrdSS (blue), AdaSS (red) and the heterogeneous truth (light background). . . . . Non-stationary OrdSS and AdaSS for Motorcycle-Accident Data. . .

40 42

2.5 2.6 2.7

OrdSS, non-stationary OrdSS and AdaSS: performance in comparison. 42 AdaSS estimate of maturation curve for Credit-Risk sample: piecewiseconstant ρ−1 (t) (upper panel) and piecewise-linear ρ−1 (t) (lower panel). 44 Vintage diagram upon truncation and exempliﬁed prototypes. . . . Synthetic vintage data analysis: (top) underlying true marginal effects; (2nd) simulation with noise; (3nd) MEV Backﬁtting algorithm upon convergence; (bottom) Estimation compared to the underlying truth. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Synthetic vintage data analysis: (top) projection views of ﬁtted values; (bottom) data smoothing on vintage diagram. . . . . . . . . . . Results of MEV Backﬁtting algorithm upon convergence (right panel) based on the squared exponential kernels. Shown in the left panel is the GCV selection of smoothing and structural parameters. . . . . . MEV ﬁtted values: Moody’s-rated Corporate Default Rates . . . . . Vintage data analysis of retail loan loss rates: (top) projection views of emprical loss rates in lifetime m, calendar t and vintage origination ˆ ˆ time v; (bottom) MEV decomposition eﬀects f (m), g (t) and h(v) (at ˆ log scale). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.1 3.2

76

3.3

77

3.4

79 79

3.5 3.6

82

4.1

Dual-time-to-default: (left) Lexis diagram of sub-sampled simulations; (right) empirical hazard rates in either lifetime or calendar time. 90 DtBreslow estimator vs one-way nonparametric estimator using the data of (top) both pre-2005 and post-2005 vintages; (middle) pre2005 vintages only; (bottom) post-2005 vintages only. . . . . . . . .

4.2

98

4.3

MEV modeling of empirical hazard rates, based on the dual-time-todefault data of (top) both pre-2005 and post-2005 vintages; (middle) pre-2005 vintages only; (bottom) post-2005 vintages only. . . . . . . 101 Nonparametric analysis of credit card risk: (top) one-way empirical hazards, (middle) DtBrewlow estimation, (bottom) MEV decomposition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 vii

4.4

4.5

MEV decomposition of credit card risk for low, medium and high FICO buckets: Segment 1 (top panel); Segment 2 (bottom panel). . 123 Home prices and unemployment rate of California state: 2000-2008. 125

4.6 4.7 4.8

MEV decomposition of mortgage hazards: default vs. prepayment . 125 Split time-varying covariates into little time segments, illustrated. . 133

viii

LIST OF TABLES

Table 1.1 Moody’s speculative-grade default rates. Data source: Moody’s special comment (release: February 2009) on corporate default and recovery rates, 1920-2008 (http://www.moodys.com/) and author’s calculations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AdaSS parameters in Doppler and HeaviSine simulation study. . . . . Synthetic vintage data analysis: MEV modeling exercise with GCVselected structural and smoothing parameters. . . . . . . . . . . . .

23 41

2.1 3.1

77

4.1 4.2

Illustrative data format of pooled credit card loans . . . . . . . . . . 120 Loan-level covariates considered in mortgage credit risk modeling, where NoteRate could be dynamic and others are static upon origination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Maximum partial likelihood estimation of mortgage covariate eﬀects in dual-time Cox regression models. . . . . . . . . . . . . . . . . . . 128

4.3

ix

ABSTRACT

This research deals with some statistical modeling problems that are motivated by credit risk analysis. Credit risk modeling has been the subject of considerable research interest in ﬁnance and has recently drawn the attention of statistical researchers. In the ﬁrst chapter, we provide an up-to-date review of credit risk models and demonstrate their close connection to survival analysis. The ﬁrst statistical problem considered is the development of adaptive smoothing spline (AdaSS) for heterogeneously smooth function estimation. Two challenging issues that arise in this context are evaluation of reproducing kernel and determination of local penalty, for which we derive an explicit solution based on piecewise type of local adaptation. Our nonparametric AdaSS technique is capable of ﬁtting a diverse set of ‘smooth’ functions including possible jumps, and it plays a key role in subsequent work in the thesis. The second topic is the development of dual-time analytics for observations involving both lifetime and calendar timescale. It includes “vintage data analysis” (VDA) for continuous type of responses in the third chapter, and “dual-time survival analysis” (DtSA) in the fourth chapter. We propose a maturation-exogenous-vintage (MEV) decomposition strategy in order to understand the risk determinants in terms of self-maturation in lifetime, exogenous inﬂuence by macroeconomic conditions, and heterogeneity induced from vintage originations. The intrinsic identiﬁcation problem is discussed for both VDA and DtSA. Speciﬁcally, we consider VDA under Gaussian process models, provide an eﬃcient MEV backﬁtting algorithm and assess its performance with both simulation and real examples.

x

DtSA on Lexis diagram is of particular importance in credit risk modeling where the default events could be triggered by both endogenous and exogenous hazards. We consider nonparametric estimators, ﬁrst-passage-time parameterization and semiparametric Cox regression. These developments extend the family of models for both credit risk modeling and survival analysis. We demonstrate the application of DtSA to credit card and mortgage risk analysis in retail banking, and shed some light on understanding the ongoing credit crisis.

xi

CHAPTER I

An Introduction to Credit Risk Modeling

Credit risk is a critical area in banking and is of concern to a variety of stakeholders: institutions, consumers and regulators. It has been the subject of considerable research interest in banking and ﬁnance communities, and has recently drawn the attention of statistical researchers. By Wikipedia’s deﬁnition, “Credit risk is the risk of loss due to a debtor’s non-payment of a loan or other line of credit.” (Wikipedia.org, as of March 2009) Central to credit risk is the default event, which occurs if the debtor is unable to meet its legal obligation according to the debt contract. The examples of default event include the bond default, the corporate bankruptcy, the credit card chargeoﬀ, and the mortgage foreclosure. Other forms of credit risk include the repayment delinquency in retail loans, the loss severity upon the default event, as well as the unexpected change of credit rating. An enormous literature in credit risk has been fostered by both academics in ﬁnance and practitioners in industry. There are two parallel worlds based upon a simple dichotomous rule of data availability: (a) the direct measurements of credit performance and (b) the prices observed from credit market. The data availability leads to two streams of credit risk modeling that have key distinctions.

1

1.1

Two Worlds of Credit Risk

The two worlds of credit risk can be simply characterized by the types of default probability, one being actual and the other being implied. The former corresponds to the direct observations of defaults, also known as the physical default probability in ﬁnance. The latter refers to the risk-neutral default probability implied from the credit market data, e.g. corporate bond yields. The academic literature of corporate credit risk has been inclined to the study of the implied defaults, which is yet a puzzling world.

1.1.1

Credit Spread Puzzle

The credit spread of a defaultable corporate bond is its excess yield over the default-free Treasury bond of the same time to maturity. Consider a zero-coupon corporate bond with unit face value and maturity date T . The yield-to-maturity at t < T is deﬁned by Y (t, T ) = − log B(t, T ) T −t (1.1)

where B(t, T ) is the bond price of the form

T

B(t, T ) = E exp − t (r(s) + λ(s))ds

,

(1.2)

given the independent term structures of the interest rate r(·) and the default rate λ(·). Setting λ(t) ≡ 0 gives the benchmark price B0 (t, T ) and the yield Y0 (t, T ) of Treasury bond. Then, the credit spread can be calculated as log(B(t, T )/B0 (t, T )) log(1 − q(t, T )) =− T −t T −t (1.3)

Spread(t, T ) = Y (t, T ) − Y0 (t, T ) = −

where q(t, T ) = E e−

RT t λ(t)ds

is the conditional default probability P[τ ≤ T |τ > t] and

τ denotes the time-to-default, to be detailed in the next section. 2

The credit spread is supposed to co-move with the default rate. For illustration, Figure 1.1 (top panel) plots the Moody’s-rated corporate default rates and BaaAaa bond spreads ranging from 1920-2008. The shaded backgrounds correspond to NBER’s latest announcement of recession dates. Most spikes of the default rates and the credit spreads coincide with the recessions, but it is clear that the movements of two time series diﬀer in both level and change. Such a lack of match is the so-called credit spread puzzle in the latest literature of corporate ﬁnance; the actual default rates could be successfully implied from the market data of credit spreads by none of the existing structural credit risk models. The default rates implied from credit spreads mostly overestimate the expected default rates. One factor that helps explain the gap is the liquidity risk – a security cannot be traded quickly enough in the market to prevent the loss. Figure 1.1 (bottom panel) plots, in ﬁne resolution, the Moody’s speculative-grade default rates versus the high-yield credit spreads: 1994-2008. It illustrates the phenomenon where the spread changed in response to liquidity-risk events, while the default rates did not react signiﬁcantly until quarters later; see the liquidity crises triggered in 1998 (Russian/LTCM shock) and 2007 (Subprime Mortgage meltdown). Besides the liquidity risk, there exist other known (e.g. tax treatments of corporate bonds vs. government bonds) and unknown factors that make incomplete the implied world of credit risk. As of today, we lack a thorough understanding of the credit spread puzzle; see e.g. Chen, Lesmond and Wei (2007) and references therein. The shaky foundation of the default risk implied from market credit spreads without looking at the historical defaults leads to further questions about the credit derivatives, e.g. Credit Default Swap (CDS) and Collateralized Debt Obligation (CDO). The 2007-08 collapse of credit market in Wall Street is partly due to overcomplication in “innovating” such complex ﬁnancial instruments on the one hand, and over-simpliﬁcation in quantifying their embedded risks on the other.

3

Figure 1.1: Moody’s-rated corporate default rates, bond spreads and NBERdated recessions. Data sources: a) Moody’s Baa & Aaa corporate bond yields (http://research.stlouisfed.org/fred2/categories/119); b) Moody’s Special Comment on Corporate Default and Recovery Rates, 1920-2008 (http://www.moodys.com/); c) NBER-dated recessions (http://www.nber.org/cycles/).

4

1.1.2

Actual Defaults

The other world of credit risk is the study of default probability bottom up from the actual credit performance. It includes the popular industry practices of a) credit rating in corporate ﬁnance, by e.g. the three major U.S. rating agencies: Moody’s, Standard & Poor’s, and Fitch. b) credit scoring in consumer lending, by e.g. the three major U.S. credit bureaus: Equifax, Experian and TransUnion. Both credit ratings and scores represent the creditworthiness of individual corporations and consumers. The ﬁnal evaluations are based on statistical models of the expected default probability, as well as judgement by rating/scoring specialists. Let us describe very brieﬂy some rating and scoring basics related to the thesis. The letters Aaa and Baa in Figure 1.1 are examples of Moody’s rating system, which use Aaa, Aa, A, Baa, Ba, B, Caa, Ca, C to represent the likelihood of default from the lowest to the highest. The speculative grade in Figure 1.1 refers to Ba and the worse ratings. The speculative-grade corporate bonds are sometimes said to be high-yield or junk. FICO, developed by Fair Isaac Corporation, is the best-known consumer credit score and it is the most widely used by U.S. banks and other credit card or mortgage lenders. It ranges from 300 (very poor) to 850 (best), and intends to represent the creditworthiness of a borrower such that he or she will repay the debt. For the same borrower, the three major U.S. credit bureaus often report inconsistent FICO scores based on their own proprietary models. Compared to either the industry practices mentioned above or the academics of the implied default probability, the academic literature based on the actual defaults is much smaller, which we believe is largely due to the limited access for an academic researcher to the proprietary internal data of historical defaults. A few academic 5

works will be reviewed later in Section 1.3. In this thesis, we make an attempt to develop statistical methods based on the actual credit performance data. For demonstration, we shall use the synthetic or tweaked samples of retail credit portfolios, as well as the public release of corporate default rates by Moody’s.

1.2

Credit Risk Models

This section reviews the ﬁnance literature of credit risk models, including both structural and intensity-based approaches. Our focus is placed on the probability of default and the hazard rate of time-to-default. 1.2.1 Structural Approach

In credit risk modeling, structural approach is also known as the ﬁrm-value approach since a ﬁrm’s inability to meet the contractual debt is assumed to be determined by its asset value. It was inspired by the 1970s Black-Scholes-Merton methodology for ﬁnancial option pricing. Two classic structural models are the Merton model (Merton, 1974) and the ﬁrst-passage-time model (Black and Cox, 1976). The Merton model assumes that the default event occurs at the maturity date of debt if the asset value is less than the debt level. Let D be the debt level with maturity date T , and let V (t) be the latent asset value following a geometric Brownian motion dV (t) = µV (t)dt + σV (t)dW (t), (1.4)

with drift µ, volatility σ and the standard Wiener process W (t). Recall that EW (t) = 0, EW (t)W (s) = min(t, s). Given the initial asset value V (0) > D, by It´’s lemma, o V (t) = exp V (0) 1 µ − σ 2 t + σWt 2 1 µ − σ 2 t, σ 2 t , 2

∼ Lognormal

(1.5)

from which one may evaluate the default probability P(V (T ) ≤ D). 6

The notion of distance-to-default facilitates the computation of conditional default probability. Given the sample path of asset values up to t, one may ﬁrst estimate the unknown parameters in (1.4) by maximum likelihood method. According to Duﬃe and Singleton (2003), let us deﬁne the distance-to-default X(t) by the number of standard deviations such that log Vt exceeds log D, i.e.

X(t) = (log V (t) − log D)/σ.

(1.6)

Clearly, X(t) is a drifted Wiener process of the form

X(t) = c + bt + W (t), µ−σ 2 /2 σ

t≥0

(1.7)

with b =

and c =

log V (0)−log D . σ

Then, it is easy to verify that the conditional

probability of default at maturity date T is X(t) + b(T − t) √ T −t

P(V (T ) ≤ D|V (t) > D) = P(X(T ) ≤ 0|X(t) > 0) = Φ

,

(1.8)

where Φ(·) is the cumulative normal distribution function. The ﬁrst-passage-time model by Black and Cox (1976) extends the Merton model so that the default event could occur as soon as the asset value reaches a pre-speciﬁed debt barrier. Figure 1.2 (left panel) illustrates the ﬁrst-passage-time of a drifted Wiener process by simulation, where we set the parameters b = −0.02 and c = 8 (s.t. a constant debt barrier). The key diﬀerence of Merton model and Black-Cox model lies in the green path (the middle realization viewed at T ), which is treated as default in one model but not the other. By (1.7) and (1.8), V (t) hits the debt barrier once the distance-to-default process X(t) hits zero. Given the initial distance-to-default c ≡ X(0) > 0, consider the

7

Figure 1.2: Simulated drifted Wiener process, ﬁrst-passage-time and hazard rate. ﬁrst-passage-time τ = inf{t ≥ 0 : X(t) ≤ 0}, (1.9)

where inf ∅ = ∞ as usual. It is well known that τ follows the inverse Gaussian distribution (Schr¨dinger, 1915; Tweedie, 1957; Chhikara and Folks, 1989) with the o density (c + bt)2 c f (t) = √ t−3/2 exp − 2t 2π , t ≥ 0. (1.10)

See also other types of parametrization in Marshall and Olkin (2007; §13). The survival function S(t) is deﬁned by P(τ > t) for any t ≥ 0 and is given by c + bt √ t − e−2bc Φ −c + bt √ t

S(t) = Φ

.

(1.11)

The hazard rate, or the conditional default rate, is deﬁned by the instantaneous rate of default conditional on the survivorship, 1 f (t) P(t ≤ τ < t + ∆t|τ ≥ t) = . ∆t↓0 ∆t S(t)

λ(t) = lim

(1.12)

Using the inverse Gaussian density and survival functions, we obtain the form of the 8

ﬁrst-passage-time hazard rate: √ λ(t; c, b) = Φ c exp − (c + bt)2 2t , −c + bt −2bc √ −e Φ t

2πt3 c + bt √ t

c > 0.

(1.13)

This is one of the most important forms of hazard function in structural approach to credit risk modeling. Figure 1.2 (right panel) plots λ(t; c, b) for b = −0.02 and c = 4, 6, 8, 10, 12, which resemble the default rates from low to high credit qualities in terms of credit ratings or FICO scores. Both the trend parameter b and the initial distance-to-default parameter c provide insights to understanding the shape of the hazard rate; see the details in Chapter IV or Aalen, Borgan and Gjessing (2008; §10). Modern developments of structural models based on Merton and Black-Cox models can be referred to Bielecki and Rutkowski (2004; §3). Later in Chapter IV, we will discuss the dual-time extension of ﬁrst-passage-time parameterization with both endogenous and exogenous hazards, as well as non-constant default barrier and incomplete information about structural parameters.

1.2.2

Intensity-based Approach

The intensity-based approach is also called the reduced-form approach, proposed independently by Jarrow and Turnbull (1995) and Madan and Unal (1998). Many follow-up papers can be found in Lando (2004), Bielecki and Rutkowski (2004) and references therein. Unlike the structural approach that assumes the default to be completely determined by the asset value subject to a barrier, the default event in the reduced-form approach is governed by an externally speciﬁed intensity process that may or may not be related to the asset value. The default is treated as an unexpected event that comes ‘by surprise’. This is a practically appealing feature, since in the real world the default event (e.g. Year 2001 bankruptcy of Enron Corporation) is

9

often all of a sudden happening without announcement. The default intensity corresponds to the hazard rate λ(t) = f (t)/S(t) deﬁned in (1.12) and it has roots in statistical reliability and survival analysis of time-to-failure. When S(t) is absolutely continuous with f (t) = d(1 − S(t))/dt, we have that −dS(t) d[log S(t)] =− , S(t)dt dt t λ(t) =

S(t) = exp −

0

λ(s)ds ,

t ≥ 0.

In survival analysis, λ(t) is usually assumed to be a deterministic function in time. In credit risk modeling, λ(t) is often treated as stochastic. Thus, the default time τ is doubly stochastic. Note that Lando (1998) adopted the term “doubly stochastic Poisson process” (or, Cox process) that refers to a counting process with possibly recurrent events. What matters in modeling defaults is only the ﬁrst jump of the counting process, in which case the default intensity is equivalent to the hazard rate. In ﬁnance, the intensity-based models are mostly the term-structure models borrowed from the literature of interest-rate modeling. Below is an incomplete list:

Vasicek: Cox-Ingersoll-Roll: Aﬃne jump:

dλ(t) = κ(θ − λ(t))dt + σdWt dλ(t) = κ(θ − λ(t))dt + σ λ(t)dWt (1.14)

dλ(t) = µ(λ(t))dt + σ(λ(t))dWt + dJt

with reference to Vasicek (1977), Cox, Ingersoll and Roll (1985) and Duﬃe, Pan and Singleton (2000). The last model involves a pure-jump process Jt , and it covers both the mean-reverting Vasicek and CIR models by setting µ(λ(t)) = κ(θ − λ(t)) and σ(λ(t)) =

2 2 σ0 + σ1 λ(t).

The term-structure models provide straightforward ways to simulate the future default intensity for the purpose of predicting the conditional default probability. However, they are ad hoc models lacking fundamental interpretation of the default event. The choices (1.14) are popular because they could yield closed-form pricing 10

formulas for (1.2), while the real-world default intensities deserve more ﬂexible and meaningful forms. For instance, the intensity models in (1.14) cannot be used to model the endogenous shapes of ﬁrst-passage-time hazards illustrated in Figure 1.2. The dependence of default intensity on state variables z(t) (e.g. macroeconomic covariates) is usually treated through a multivariate term-structure model for the joint of (λ(t), z(t)). This approach essentially presumes a linear dependence in the diﬀusion components, e.g. by correlated Wiener processes. In practice, the eﬀect of a state variable on the default intensity can be non-linear in many other ways. The intensity-based approach also includes the duration models for econometric analysis of actual historical defaults. They correspond to the classical survival analysis in statistics, which opens another door for approaching credit risk.

1.3

Survival Models

Credit risk modeling in ﬁnance is closely related to survival analysis in statistics, including both the ﬁrst-passage-time structural models and the duration type of intensity-based models. In a rough sense, default risk models based on the actual credit performance data exclusively belong to survival analysis, since the latter by deﬁnition is the analysis of time-to-failure data. Here, the failure refers to the default event in either corporate or retail risk exposures; see e.g. Duﬃe, Saita and Wang (2007) and Deng, Quigley and Van Order (2000). We ﬁnd that the survival models have at least four-fold advantages in approach to credit risk modeling: 1. ﬂexibility in parametrizing the default intensity, 2. ﬂexibility in incorporating various types of covariates, 3. eﬀectiveness in modeling the credit portfolios, and

11

4. being straightforward to enrich and extend the family of credit risk models. The following materials are organized in a way to make these points one by one. 1.3.1 Parametrizing Default Intensity

In survival analysis, there are a rich set of lifetime distributions for parametrizing the default intensity (i.e. hazard rate) λ(t), including

Exponential: Weibull: Lognormal: Log-logistic:

λ(t) = α λ(t) = αtβ−1 1 φ(log t/α) tα Φ(− log t/α) (β/α)(t/α)β−1 λ(t) = , 1 + (t/α)β λ(t) = (1.15)

α, β > 0

where φ(x) = Φ (x) is the density function of normal distribution. Another example is the inverse Gaussian type of hazard rate function in (1.13). More examples of parametric models can be found in Lawless (2003) and Marshall and Olkin (2007) among many other texts. They all correspond to the distributional assumptions of the default time τ , while the inverse Gaussian distribution has a beautiful ﬁrst-passagetime interpretation. The log-location-scale transform can be used to model the hazard rates. Given a latent default time τ0 with baseline hazard rate λ0 (t) and survival function S0 (t) = exp − t 0

λ0 (s)ds , one may model the ﬁrm-speciﬁc default time τi by

log τi = µi + σi log τ0 ,

−∞ < µi < ∞, σi > 0

(1.16)

with individual risk parameters (µi , σi ). Then, the ﬁrm-speciﬁc survival function takes the form Si (t) = S0 t eµi 12

1/σi

(1.17)

and the ﬁrm-speciﬁc hazard rate is d log Si (t) 1 t = dt σi t eµi

1/σi

λi (t) = −

λ0

t eµi

1/σi

.

(1.18)

For example, given the Weibull baseline hazard rate λ0 (t) = αtβ−1 , the log-locationscale transformed hazard rate is given by β βµi α σ −1 t i exp − σi σi

λi (t) =

.

(1.19)

1.3.2

Incorporating Covariates

Consider ﬁrm-speciﬁc (or consumer-speciﬁc) covariates zi ∈ Rp that are either static or time-varying. The dependence of default intensity on zi can be studied by regression models. In the context of survival analysis, there are two popular classes of regression models, namely 1. the multiplicative hazards regression models, and 2. the accelerated failure time (AFT) regression models. The hazard rate is taken as the modeling basis in the ﬁrst class of regression models:

λi (t) = λ0 (t)r(zi (t))

(1.20)

where λ0 (t) is a baseline hazard function and r(zi (t)) is a ﬁrm-speciﬁc relative-risk multiplier (both must be positive). For example, one may use the inverse Gaussian hazard rate (1.13) or pick one from (1.15) as the baseline λ0 (t). The relative risk term is often speciﬁed by r(zi (t)) = exp{θ T zi (t)} with parameter θ ∈ Rp . Then, the hazard ratio between any two ﬁrms, λi (t)/λj (t) = exp{θ T ([zi (t) − zj (t)]}, is constant if the covariates diﬀerence zi (t)−zj (t) is constant in time, thus ending up with proportional hazard rates. Note that the covariates zi (t) are sometimes transformed before entering 13

the model. For example, an econometric modeler usually takes diﬀerence of Gross Domestic Product (GDP) index and uses the GDP growth as the entering covariate. For a survival analyst, the introduction of the multiplicative hazards model (1.20) is incomplete without introducing the Cox’s proportional hazards (CoxPH) model. The latter is among the most popular models in modern survival analysis. Rather than using the parametric form of baseline hazard, Cox (1972, 1975) considered λi (t) = λ0 (t) exp{θ T zi (t)} by leaving the baseline hazard λ0 (t) unspeciﬁed. Such semiparametric modeling could reduce the estimation bias of the covariate eﬀects due to baseline misspeciﬁcation. One may use the partial likelihood to estimate θ, and ˆ the nonparametric likelihood to estimate λ0 (t). The estimated baseline λ0 (t) can be further smoothed upon parametric, term-structure or data-driven techniques. The second class of AFT regression models are based on the aforementioned loglocation-scale transform of default time. Suppose that zi is not time-varying. Given τ0 with baseline hazard rate λ0 (t), consider (1.16) with the location µi = θ T zi and the scale σi = 1 (for simplicity), i.e. log τi = θ T zi + log τ0 . Then, it is straightforward to get the survival function and hazard rate by (1.17) and (1.18):

Si (t) = S0 t exp{−θ T zi } ,

λi (t) = λ0 t exp{−θ T zi } exp{−θ T zi }

(1.21)

An interesting phenomenon is when using the Weibull baseline, by (1.19), we have that λi (t) = λ0 (t) exp{−βθ T zi }. This is equivalent to use the multiplicative hazards model (1.20), as a unique property of the Weibull lifetime distribution. For more details about AFT regression, see e.g. Lawless (2003; §6). Thus, the covariates zi could induce the ﬁrm-speciﬁc heterogeneity via either a relative-risk multiplier λi (t)/λ0 (t) = eθ eθ

T T

zi (t)

or the default time acceleration τi /τ0 =

zi

. In situations where certain hidden covariates are not accessible, the unexplained

variation also leads to heterogeneity. The frailty models introduced by Vaupel, et al.

14

(1979) are designed to model such unobserved heterogeneity as a random quantity, say, Z ∼ Gamma(δ −1 , δ) with shape δ −1 and scale δ > 0, where a Gamma(k, δ) distribution has the density z k−1 exp{−z/δ} , z>0 Γ(k)δ k

p(z; k, δ) =

(1.22)

with mean kδ and variance kδ 2 . Then, the proportional frailty model extends (1.20) to be λi (t) = Z · λ0 (t)r(zi (t)), (1.23)

for each single obligor i subject to the odd eﬀect Z; see Aalen, et al. (2008; §6). On the other hand, the frailty models can be also used to characterize the default correlation among multiple obligors, which is the next topic.

1.3.3

Correlating Credit Defaults

The issue of default correlation embedded in credit portfolios has drawn intense discussions in the recent credit risk literature, in particular along with today’s credit crisis involving problematic basket CDS or CDO ﬁnancial instruments; see e.g. Bluhm and Overbeck (2007). In eyes of a fund manager, the positive correlation of defaults would increase the level of total volatility given the same level of total expectation (cf. Markowitz portfolio theory). In academics, among other works, Das, et al. (2007) performed an empirical analysis of default times for U.S. corporations and provided evidence for the importance of default correlation. The default correlation could be eﬀectively characterized by multivariate survival analysis. In a broad sense, there exists two diﬀerent approaches: 1. correlating the default intensities through the common covariates, 2. correlating the default times through the copulas.

15

The ﬁrst approach is better known as the conditionally independent intensity-based approach, in the sense that the default times are independent conditional on the common covariates. Examples of common covariates include the market-wide variables, e.g. the GDP growth, the short-term interest rate, and the house-price appreciation index. Here, the default correlation induced by the common covariates is in contrast to the aforementioned default heterogeneity induced by the ﬁrm-speciﬁc covariates. As announced earlier, the frailty models (1.23) can also be used to capture the dependence of multi-obligor default intensities in case of hidden common covariates. They are usually rephrased as shared frailty models, for diﬀerentiation from singleobligor modeling purpose. The frailty eﬀect Z is usually assumed to be time-invariant. One may refer to Hougaard (2000) for a comprehensive treatment of multivariate survival analysis from a frailty point of view. See also Aalen et al. (2008; §11) for some intriguing discussion of dynamic frailty modeled by diﬀusion and L´vy processes. e In ﬁnance, Duﬃe, et al. (2008) applied the shared frailty eﬀect across ﬁrms, as well as a dynamic frailty eﬀect to represent the unobservable macroeconomic covariates. The copula approach to default correlation considers the default times as the modeling basis. A copula C : [0, 1]n → [0, 1] is a function used to formulate the multivariate joint distribution based on the marginal distributions, e.g.,

Gaussian: Student-t: Archimedean:

CΣ (u1 , . . . , un ) = ΦΣ (Φ−1 (u1 ), . . . , Φ−1 (un )) CΣ,ν (u1 , . . . , un ) = ΘΣ,ν (Θ−1 (u1 ), . . . , Θ−1 (un )) ν ν n (1.24)

CΨ (u1 , . . . , un ) = Ψ−1 i=1 Ψ(ui )

where Σ ∈ Rn×n , ΦΣ denotes the multivariate normal distribution, ΘΣ,ν denotes the multivariate Student-t distribution with degrees of freedom ν, and Ψ is the generator of Archimedean copulas; see Nelsen (2006) for details. When Ψ(u) = − log u, the Archimedean copula reduces to n i=1

ui (independence copula). By Sklar’s theorem,

16

for a multivariate joint distribution, there always exists a copula that can link the joint distribution to its univariate marginals. Therefore, the joint survival distribution of (τ1 , . . . , τn ) can be characterized by Sjoint (t1 , . . . , tn ) = C(S1 (t1 ), . . . , Sn (tn )), upon an appropriate selection of copula C. Interestingly enough, the shared frailty models correspond to the Archimedean copula, as observed by Hougaard (2000; §13) and recast by Aalen, et al. (2008; §7). For each τi with underlying hazard rates Z ·λi (t), assume λi (t) to be deterministic and Z to be random. Then, the marginal survival distributions are found by integrating over the distribution of Z, t Si (t) = EZ exp −

0

λi (s)ds

= L (Λi (t)), i = 1, . . . , n

(1.25)

where L (x) = EZ [exp{−xZ}] is the Laplace transform of Z and Λi (t) is the cumulative hazard. Similarly, the joint survival distribution is given by n n

Sjoint (t1 , . . . , tn ) = L i=1 Λi (ti )

=L i=1 L −1 (Si (ti )) ,

(1.26)

where the second equality follows (1.25) and leads to an Archimedean copula. For example, given the Gamma frailty Z ∼ Gamma(δ −1 , δ), we have that L (x) = (1 + δx)−1/δ , so n ti −1/δ

Sjoint (t1 , . . . , tn ) =

1+δ i=1 0

λi (s)ds

.

In practice of modeling credit portfolios, one must ﬁrst of all check the appropriateness of the assumption behind either the frailty approach or the copula approach. An obvious counter example is the recent collapse of the basket CDS and CDO market for which the rating agencies have once abused the use of the over-simpliﬁed Gaussian copula. Besides, the correlation risk in these complex ﬁnancial instruments is highly contingent upon the macroeconomic conditions and black-swan events, which

17

are rather dynamic, volatile and unpredictable.

Generalization The above survey of survival models is too brief to cover the vast literature of survival analysis that deserves potential applications in credit risk modeling. To be selective, we have only cited Hougaard (2000), Lawless (2003) and Aalen, et al. (2008), but there are numerous texts and monographs on survival analysis and reliability. It is straightforward to enrich and extend the family of credit risk models, by either the existing devices in survival analysis, or the new developments that would beneﬁt both ﬁelds. The latter is one of our objectives in this thesis, and we will focus on the dual-time generalization.

1.4

Scope of the Thesis

Both old and new challenges arising in the context of credit risk modeling call for developments of statistical methodologies. Let us ﬁst preview the complex data structures in real-world credit risk problems before deﬁning the scope of the thesis. The vintage time series are among the most popular versions of economic and risk management data; see e.g. ALFRED digital archive1 hosted by U.S. Federal Reserve Bank of St. Louis. For risk management in retail banking, consider for instance the revolving exposures of credit cards. Subject to the market share of the card-issuing bank, thousands to hundred thousands of new cards are issued monthly, ranging from product types (e.g. speciality, reward, co-brand and secured cards), acquisition channels (e.g. banking center, direct mail, telephone and internet), and other distinguishable characteristics (e.g. geographic region where the card holder resides). By convention, the accounts originated from the same month constitute a monthly vintage (quarterly and yearly vintages can be deﬁned in the same fashion),

1

http://alfred.stlouisfed.org/

18

Figure 1.3: Illustration of retail credit portfolios and vintage diagram. where “vintage” is a borrowed term from wine-making industry to account for the sharing origination. Then, a vintage time series refers to the longitudinal observations and performance measurements for the cohort of accounts that share the same origination date and therefore the same age. Figure 1.3 provides an illustration of retail credit portfolios and the scheme of vintage data collection, where we use only the geographic MSA (U.S. Metropolitan Statistical Area) as a representative of segmentation variables. Each segment consists of multiple vintages that have diﬀerent origination dates, and each vintage consists of varying numbers of accounts. The most important feature of the vintage data is its dual-time coordinates. Let the calendar time be denoted by t, each vintage be denoted by its origination time v, then the lifetime (i.e. age, or called the “month-on-book” for a monthly vintage) is calculated by m = t − v for t ≥ v. Such (m, t; v) tuples lie in the dual-time domain, called the vintage diagram, with x-axes representing the calendar time t and y-axis representing the lifetime m. Unlike a usual 2D space, the vintage diagram consists of unit-slope trajectories each representing the evolvement of a diﬀerent vintage. On a vintage diagram, all kinds of data can be collected including the continuous, binary or count observations. In particular, when the dual-time

19

observations are the individual lives together with birth and death time points, and when these observations are subject to truncation and censoring, one may draw a line segment for each individual to represent the vintage data graphically. Such a graph is known as the Lexis diagram in demography and survival analysis; see Figure 4.1 based on a dual-time-to-default simulation. For corporate bond and loan issuers, the observations of default events or default rates share the same vintage data structure. Table 1.1 shows the Moody’s speculativegrade default rates for annual cohorts 1970-2008, which are calculated from the cumulative rates released recently by Moody’s Global Credit Policy (February 2009). Each cohort is observed by year end of 2008, and truncated by 20 years maximum in lifetime. Thus, the performance window for the cohorts originated in the latest years becomes shorter and shorter. The default rates in Table 1.1 are aligned in lifetime, while they can be also aligned in calendar time. To compare the cohort performances in dual-time coordinates, the marginal plots in both lifetime and calendar time are given Figure 1.4, together with the side-by-side box-plots for checking the vintage (i.e. cohort) heterogeneity. It is clear that the average default rates gradually decrease in lifetime, but very volatile in calendar time. Note that the calendar dynamics follow closely the NBER-dated recessions shown in Figure 1.1. Besides, there seems to be heterogeneity among vintages. These projection views are only an initial exploration. Further analysis of this vintage data will be performed in Chapter III. The main purpose of the thesis is to develop a dual-time modeling framework based on observations on the vintage diagram, or “dual-time analytics”. First of all, it is worth mentioning that the vintage data structure is by no means an exclusive feature of economic and ﬁnancial data. It also appears in clinical trials with staggered entry, demography, epidemiology, dendrochronology, etc. As an interesting example in dendrochronology, Esper, Cook and Schweingruber (Science, 2002) analyzed the long historical tree-ring records in a dual-time manner in order to reconstruct the

20

Figure 1.4: Moody’s speculative-grade default rates for annual cohorts 1970-2008: projection views in lifetime, calendar and vintage origination time. past temperature variability. We believe that the dual-time analytics developed in the thesis can be also applied to non-ﬁnancial ﬁelds. The road map in Figure 1.5 outlines the scope of the thesis. In Chapter II, we discuss the adaptive smoothing spline (AdaSS) for heterogeneously smooth function estimation. Two challenging issues are evaluation of reproducing kernel and determination of local penalty, for which we derive an explicit solution based upon piecewise type of local adaptation. Four diﬀerent examples, each having a speciﬁc feature of heterogeneous smoothness, are used to demonstrate the new AdaSS technique. Note that the AdaSS may estimate ‘smooth’ functions including possible jumps, and it plays a key role in the subsequent work in the thesis. In Chapter III, we develop the vintage data analysis (VDA) for continuos type of dual-time responses (e.g. loss rates). We propose an MEV decomposition framework based on Gaussian process models, where M stands for the maturation curve, E for the exogenous inﬂuence and V for the vintage heterogeneity. The intrinsic identiﬁcation problem is discussed. Also discussed is the semiparametric extension of MEV models in the presence of dual-time covariates. Such models are motivated from the practical needs in ﬁnancial risk management. An eﬃcient MEV Backﬁtting algorithm is provided for model estimation, and its performance is assessed by a sim-

21

'

Chapter I: Introduction Credit Risk Modeling Survival Analysis

$

'

Chapter II: AdaSS Smoothing Spline Local Adaptation

$

&

'

Chapter IV: DtSA

?

% & @ @ @ @ @ ? $ ' @ R @ Chapter III:

%

VDA

$

&

Nonparametric Estimation Structural Parametrization Dual-time Cox Regression

%

&

MEV Decomposition Gaussian Process Models Semiparametric Regression

%

Figure 1.5: A road map of thesis developments of statistical methods in credit risk modeling. ulation study. Then, we apply the MEV risk decomposition strategy to analyze both corporate default rates and retail loan loss rates. In Chapter IV, we study the dual-time survival analysis (DtSA) for time-to-default data observed on Lexis diagram. It is of particular importance in credit risk modeling where the default events could be triggered by both endogenous and exogenous hazards. We consider (a) nonparametric estimators under one-way, two-way and three-way underlying hazards models, (b) structural parameterization via the ﬁrst-passage-time triggering system with an endogenous distance-to-default process associated with exogenous time-transformation, and (c) dual-time semiparametric Cox regression with both endogenous and exogenous baseline hazards and covariate eﬀects, for which the method of partial likelihood estimation is discussed. Also discussed is the random-eﬀect vintage heterogeneity modeled by the shared frailty. Finally, we demonstrate the application of DtSA to credit card and mortgage risk analysis in retail banking, and shed some light on understanding the ongoing credit crisis from a new dual-time analytic perspective.

22

Table 1.1: Moody’s speculative-grade default rates. Data source: Moody’s special comment (release: February 2009) on corporate default and recovery rates, 1920-2008 (http://www.moodys.com/) and author’s calculations.

23

Cohort Year1 Year2 1970 8.772 1.082 1971 1.152 1.989 1972 1.957 0.812 1.271 0.876 1973 1974 1.330 0.931 1975 1.735 0.912 1976 0.864 1.361 1977 1.339 1.406 1978 1.798 0.000 0.420 0.897 1979 1980 1.613 0.859 1981 0.701 4.053 1982 3.571 4.104 1983 3.824 3.153 1984 3.333 3.797 1985 3.448 6.668 1986 5.644 4.563 1987 4.222 3.894 1988 3.580 5.890 1989 5.794 9.980 1990 9.976 8.732 9.370 5.059 1991 1992 5.154 3.495 1993 3.072 1.848 1994 2.073 2.869 1995 2.920 1.785 1996 1.637 2.145 1997 2.028 3.570 1998 3.152 5.642 5.384 6.667 1999 2000 6.339 9.440 2001 10.124 7.713 2002 7.921 5.375 2003 5.123 2.955 2004 2.346 1.652 2005 1.732 1.895 2006 1.688 1.169 2007 0.918 4.792 2008 4.129

Year3 1.866 0.829 0.840 0.916 0.493 1.439 1.424 0.000 1.012 0.958 4.045 1.932 2.902 3.681 8.199 3.860 3.282 5.441 8.335 8.176 4.654 3.371 1.596 3.340 1.901 1.874 3.544 4.597 5.933 8.743 7.048 4.947 2.927 1.654 1.943 1.141 4.717

Year4 0.778 0.859 0.878 0.484 1.037 1.007 0.000 1.052 1.086 3.031 1.884 3.675 3.403 7.308 2.793 3.753 4.603 8.261 8.032 3.955 3.237 1.401 2.362 1.399 1.265 2.763 4.187 4.727 8.324 6.299 4.105 2.730 1.691 2.050 1.013 3.880

Year5 0.804 0.898 0.463 1.018 1.090 0.000 0.531 1.128 3.463 2.128 3.993 3.423 7.673 2.871 3.535 5.041 7.460 7.547 3.618 3.238 1.349 2.155 1.185 1.608 2.327 3.862 4.119 6.921 5.068 3.723 2.562 1.542 2.247 1.224 3.070

Year6 0.840 0.474 0.977 1.069 0.000 0.000 1.134 3.599 1.223 3.396 3.138 7.796 2.213 2.315 4.052 7.454 6.458 3.644 3.078 1.280 1.811 0.914 2.017 1.871 3.883 3.963 6.152 4.697 3.792 2.449 1.365 1.851 1.281 3.392

Year7 0.443 1.009 1.034 0.000 0.000 1.205 3.607 0.638 0.651 2.963 7.272 2.514 0.495 3.180 7.148 5.696 2.966 2.653 1.191 1.759 0.876 2.078 2.338 3.146 3.553 6.297 4.894 2.617 2.415 0.832 1.744 1.341 3.159

Year8 0.941 1.072 0.000 0.000 1.303 4.462 0.639 0.000 2.709 7.579 2.451 0.560 2.208 6.013 5.788 1.657 2.435 1.132 1.678 0.997 1.982 2.440 2.225 3.101 4.197 4.960 2.780 1.444 0.620 1.398 1.348 2.780

Year9 Year10 Year11 Year12 Year13 Year14 Year15 Year16 Year17 Year18 Year19 Year20 1.000 0.000 0.000 1.250 3.329 0.711 0.000 1.612 3.500 1.975 0.000 1.205 0.000 0.000 1.339 3.573 0.761 0.000 1.721 4.659 2.110 0.000 1.293 0.000 0.000 1.266 4.049 0.719 0.000 2.439 4.382 1.967 0.000 1.185 1.263 6.647 1.289 4.755 0.718 0.000 1.608 4.343 2.934 0.000 1.157 1.231 7.698 1.372 4.794 0.726 0.000 1.620 6.098 2.930 0.000 1.177 2.507 7.855 1.406 1.578 0.676 0.000 1.521 5.721 2.741 1.001 1.078 2.283 7.125 1.266 1.386 0.000 0.000 1.430 5.348 2.561 0.943 1.009 2.124 7.814 1.204 1.308 0.000 0.000 1.419 5.305 2.536 0.927 0.983 2.070 7.650 1.186 1.297 0.000 0.000 0.000 6.496 2.411 0.884 0.943 2.994 7.486 1.173 2.575 0.000 0.000 0.000 0.000 2.087 0.765 0.824 3.507 6.668 1.045 2.276 0.000 0.000 0.000 0.000 0.000 0.680 2.222 3.925 6.966 1.967 2.125 0.000 0.000 0.000 1.223 0.000 0.000 2.470 4.702 6.855 1.752 1.906 0.000 0.000 0.000 1.120 1.154 0.000 0.000 4.266 6.256 1.603 1.739 0.000 0.000 0.000 1.103 1.169 0.000 0.000 2.575 6.904 1.631 1.833 0.000 0.000 0.000 1.181 1.261 1.306 1.355 2.783 2.864 1.349 2.278 0.000 0.000 0.000 1.050 1.157 1.194 1.234 1.280 2.623 0.000 1.873 0.000 1.477 0.000 0.856 0.954 1.014 1.071 2.275 2.326 0.000 0.000 1.084 1.194 0.000 1.419 0.813 1.773 1.861 3.960 1.030 0.000 0.000 1.150 1.264 0.471 1.065 1.239 1.376 2.889 3.072 4.113 0.000 0.000 1.016 0.000 0.754 1.268 2.435 1.640 2.364 3.871 4.210 0.000 0.000 0.890 0.000 2.119 1.890 2.631 1.477 2.674 3.527 5.134 0.000 0.000 0.798 0.000 1.887 3.020 2.310 1.730 2.815 4.139 4.534 0.000 0.000 0.735 0.000 1.663 3.501 1.859 3.069 4.534 4.951 0.000 0.759 0.829 0.000 0.923 3.849 2.949 4.290 5.170 0.000 0.712 0.791 0.000 0.926 4.049 4.210 4.056 0.501 0.559 0.607 1.303 0.715 3.194 3.896 1.474 1.269 0.471 1.568 0.586 3.255 2.696 1.247 0.695 1.165 0.877 2.448 1.362 0.617 1.057 1.213 2.715 0.468 1.351 1.261 2.829 1.466 1.522 2.468 1.494 2.482 2.607

CHAPTER II

Adaptive Smoothing Spline

2.1

Introduction

For observational data, statisticians are generally interested in smoothing rather than interpolation. Let the observations be generated by the signal plus noise model

yi = f (ti ) + ε(ti ),

i = 1, . . . , n

(2.1)

where f is an unknown smooth function with unit interval support [0, 1], and ε(ti ) is the random error. Function estimation from noisy data is a central problem in modern nonparametric statistics, and includes the methods of kernel smoothing [Wand and Jones (1995)], local polynomial smoothing [Fan and Gijbels (1996)] and wavelet shrinkage [Vidakovic (1999)]. In this thesis, we concentrate on the method of smoothing spline that has an elegant formulation through the mathematical device of reproducing kernel Hilbert space (RKHS). There is a rich body of literature on smoothing spline since the pioneering work of Kimeldorf and Wahba (1970); see the monographs by Wahba (1990), Green and Silverman (1994) and Gu (2002). Ordinarily, the smoothing spline is formulated by the following variation problem 1 n n 1

f ∈Wm

min

w(ti )[yi − f (ti )]2 + λ i=1 0

[f (m) (t)]2 dt ,

λ>0

(2.2)

24

where the target function is an element of the m-order Sobolev space

Wm = {f : f, f , . . . , f (m−1) absolutely continuous, f (m) ∈ L2 [0, 1]}.

(2.3)

The objective in (2.2) is a sum of two functionals, the mean squared error MSE =

1 n n i=1

w(ti )[yi − f (ti )]2 and the roughness penalty PEN =

1 (m) [f (t)]2 dt; 0

hence the

tuning parameter λ controls the trade-oﬀ between ﬁdelity to the data and the smoothness of the estimate. Let us call the ordinary smoothing spline (2.2) OrdSS in short. The weights w(ti ) in MSE come from the non-stationary error assumption of (2.1) where ε(t) ∼ N (0, σ 2 /w(t)). They can be also used for aggregating data with replicates. Suppose the data are generated by (2.1) with stationary error but time-varying sampling frequency rk for k = 1, . . . , K distinct points, then 1 n

K rk

MSE =

[ykj − f (tk )]2 = k=1 j=1

1 K

K

w(tk ) yk − f (tk ) ¯ k=1 2

+ Const.

(2.4)

where yk ∼ N (f (tk ), σ 2 /rk ) for k = 1, . . . , K are the pointwise averages. To train ¯ OrdSS, the raw data can be replaced by yk s together with the weights w(tk ) ∝ rk . ¯ The OrdSS assumes that f (t) is homogeneously smooth (or rough), then performs regularization based on the simple integral of squared derivatives [f (m) (t)]2 . It excludes the target functions with spatially varying degrees of smoothness, as noted by Wahba (1985) and Nycha (1988). In practice, there are various types of functions of non-homogeneous smoothness, and four such scenarios are illustrated in Figure 2.1. The ﬁrst two are the simulation examples following exactly the same setting as Donoho and Johnstone (1994), by adding the N (0, 1) noise to n = 2048 equally spaced signal responses that are re-scaled to attain signal-to-noise ratio 7. The last two are the sampled data from automotive engineering and ﬁnancial engineering. 1. Doppler function: f (t) = t(1 − t) sin(2π(1 + a)/(t + a)), a = 0.05, t ∈ [0, 1]. It

25

Figure 2.1: Heterogeneous smooth functions: (1) Doppler function simulated with noise, (2) HeaviSine function simulated with noise, (3) Motorcycle-Accident experimental data, and (4) Credit-Risk tweaked sample. has both time-varying magnitudes and time-varying frequencies. 2. HeaviSine function: f (t) = 3 sin 4πt − sgn(t − 0.3) − sgn(0.72 − t), t ∈ [0, 1]. It has a downward jump at t = 0.3 and an upward jump at 0.72. 3. Motorcycle-Accident experiment: a classic data from Silverman (1985). It consists of accelerometer readings taken through time from a simulated motorcycle crash experiment, in which the time points are irregularly spaced and the noises vary over time. In Figure 2.1, a hypothesized smooth shape is overlaid on the observations, and it illustrates the fact that the acceleration stays relatively constant until the crash impact. 4. Credit-Risk tweaked sample: retail loan loss rates for the revolving credit exposures in retail banking. For academic use, business background is removed and 26

the sample is tweaked. The tweaked sample includes monthly vintages booked in 7 years (2000-2006), where a vintage refers to a group of credit accounts originated from the same month. They are all observed up to the end of 2006, so a vintage booked earlier has a longer observation length. In Figure 2.1, the x-axis represents the months-on-book, and the circles are the vintage loss rates. The point-wise averages are plotted as solid dots, which are wiggly on large months-on-book due to decreasing sample sizes. Our purpose is to estimate an underlying shape (illustrated by the solid line), whose degree of smoothness, by assumption, increases upon growth. The ﬁrst three examples have often been discussed in the nonparametric analysis literature, where function estimation with non-homogeneous smoothness has been of interest for decades. A variety of adaptive methods has been proposed, such as variable-bandwidth kernel smoothing (M¨ller and Stadtm¨ller, 1987), multivariate u u adaptive regression splines (Friedman, 1991), adaptive wavelet shrinkage (Donoho and Johnstone, 1994), local-penalty regression spline (Ruppert and Carroll, 2000), as well as the treed Gaussian process modeling (Gramacy and Lee, 2008). In the context of smoothing spline, Luo and Wahba (1997) proposed a hybrid adaptive spline with knot selection, rather than using every sampling point as a knot. More naturally, Wahba (1995), in a discussion, suggested to use the local penalty

1 1

PEN =

0

ρ(t)[f (m) (t)]2 dt,

s.t. ρ(t) > 0 and

0

ρ(t)dt = 1

(2.5)

where the add-on function ρ(t) is adaptive in response to the local behavior of f (t) so that heavier penalty is imposed to the regions of lower curvature. This local penalty functional is the foundation of the whole chapter. Relevant works include Abramovich and Steinberg (1996) and Pintore, Speckman and Holmes (2006). This chapter is organized as follows. In Section 2.2 we formulate the AdaSS (adap-

27

tive smoothing spline) through the local penalty (2.5), derive its solution via RKHS and generalized ridge regression, then discuss how to determine both the smoothing parameter and the local penalty adaptation by the method of cross validation. Section 2.3 covers some basic properties of the AdaSS. The experimental results for the aforementioned examples are presented in Section 2.4. We summarize the chapter with Section 2.5 and give technical proofs in the last section.

2.2

AdaSS: Adaptive Smoothing Spline

The AdaSS extends the OrdSS (2.2) based on the local penalty (2.5). It is formulated as 1 n n 2 1 2

f ∈Wm

min

w(ti ) yi − f (ti ) i=1 +λ

0

ρ(t) f (m) (t) dt ,

λ > 0.

(2.6)

The following proposition has its root in Kimeldorf and Wahba (1971); or see Wahba (1990; §1). Proposition 2.1 (AdaSS). Given a bounded integrable positive function ρ(t), deﬁne the r.k.

1

K(t, s) =

0

ρ−1 (u)Gm (t, u)Gm (s, u)du,

t, s ∈ [0, 1]

(2.7)

where Gm (t, u) =

(t−u)m−1 + . (m−1)!

Then, the solution of (2.6) can be expressed as n m−1

f (t) = j=0 αj φj (t) + i=1 ci K(t, ti ) ≡ αT φ(t) + cT ξ n (t)

(2.8)

where φj (t) = tj /j! for j = 0, . . . , m − 1, α ∈ Rm , c ∈ Rn . Since the AdaSS has a ﬁnite-dimensional linear expression (2.8), it can be solved by generalized linear regression. Write y = (y1 , . . . , yn )T , B = (φ(t1 ), · · · , φ(tn ))T , W = diag(w(t1 ), . . . , w(tn )) and Σ = [K(ti , tj )]n×n . Substituting (2.8) into (2.6), we 28

have the regularized least squares problem 1 y − Bα − Σc n

2 W 2 Σ

min α,c +λ c

,

(2.9)

where x

2 A

= xT Ax for A ≥ 0. By setting partial derivatives w.r.t. α and c to be

zero, we obtain the equations

BT W Bα + Σc = BT Wy,

ΣW Bα + (Σ + nλW−1 )c = ΣWy,

or Bα + (Σ + nλW−1 )c = y, BT c = 0. (2.10)

. . Consider the QR-decomposition B = (Q1 . 2 )(RT . T )T = Q1 R with orthogonal .Q .0 . (Q1 . 2 )n×n and upper-triangular Rm×m , where Q1 is n × m and Q2 is n × (n − m). .Q Multiplying both sides of y = Bα + (Σ + nλW−1 )c by QT and using the fact 2 that c = Q2 QT c (since BT c = 0), we have that QT y = QT (Σ + nλW−1 )c = 2 2 2 QT (Σ + nλW−1 )Q2 QT c. Then, simple algebra yields 2 2 c = Q2 QT (Σ + nλW−1 )Q2 2

−1

QT y, 2

α = R−1 QT y − (Σ + nλW−1 )c . (2.11) 1

ˆ The ﬁtted values at the design points are given by y = Bα + Σc. By (2.10), y = Bα + (Σ + nλW−1 )c, so the residuals are given by

ˆ e = y − y = nλW−1 c ≡ I − Sλ,ρ y

−1

(2.12)

where Sλ,ρ = I − nλW−1 Q2 QT (Σ + nλW−1 )QT 2 2 matrix in the sense that

QT is often called the smoothing 2

ˆ y = Sλ,ρ y = nλW−1 Q2 QT (Σ + nλW−1 )QT 2 2

−1

QT y. 2

(2.13)

29

We use the notation Sλ,ρ to denote its explicit dependence on both the smoothing parameter λ and the local adaptation ρ(·), where ρ is needed for the evaluation of the matrix Σ = [K(ti , tj )] based on the r.k. (2.7). The AdaSS is analogous to OrdSS in constructing the conﬁdence interval under a Bayesian interpretation by Wahba (1983). For the sampling time points in particular, we may construct the 95% interval pointwise by

ˆ f (ti ) ± z0.025

σ 2 w−1 (ti )Sλ,ρ [i, i],

i = 1, . . . , n

(2.14)

where Sλ,ρ [i, i] represents the i-th diagonal element of the smoothing matrix. The estimate of noise variance is given by σ 2 = (I − Sλ,ρ )y

2 W

(n − p), where p =

trace(Sλ,ρ ) can be interpreted as the equivalent degrees of freedom. 2.2.1 Reproducing Kernels

The AdaSS depends on the evaluation of the r.k. of the integral form (2.7). For general ρ(t), we must resort to numerical integration techniques. In what follows we present a closed-form expression of K(t, s) when ρ−1 (t) is piecewise linear. Proposition 2.2 (Reproducing Kernel). Let ρ−1 (t) = ak +bk t, t ∈ [τk−1 , τk ) for some ﬁxed (ak , bk ) and the knots {0 ≡ τ0 < τ1 < · · · < τK ≡ 1}. Then, the AdaSS kernel (2.7) can be explicitly evaluated by κt∧s m−1

K(t, s) = k=1 j=0

(−1)j (ak + bk u) m−1−j (u − t)m−1−j (u − s)m+j (m − 1 − j)!(m + j)!

− t∧s∧τk

−bk l=0 (u − t)m−1−j−l (u − s)m+1+j+l (−1)l (m − 1 − j − l)!(m + 1 + j + l)!

(2.15) τk−1 where κt∧s = min{k : τk > t ∧ s}, κ1∧1 = K, and {F (u)}|τ = F (τ ) − F (τ ). τ Proposition 2.2 is general enough to cover many types of r.k.’s. For example,

30

setting bk ≡ 0 gives the piecewise-constant results for ρ−1 (t) obtained by Pintore, et al. (2006). For another example, set m = 2, K = 1 and ρ−1 (t) = a + bt for t ∈ [0, 1]. (By (2.5), a = b/(eb − 1) and b ∈ R.) Then, a 2 (3t s − t3 ) + 6 a (3ts2 − s3 ) + 6 b (2t3 s 12 b (2ts3 12

− t4 ),

4

if t ≤ s, (2.16)

K(t, s) =

− s ), otherwise.

When b = 0, (2.16) reduces to the OrdSS kernel for ﬁtting cubic smoothing splines. The derivatives of the r.k. are also of interest. In solving boundary value differential equations, the Green’s function Gm (t, u) has the property that f (t) =

1 0

Gm (t, u)f (m) (u)du for f ∈ Wm with zero f (j) (0), j = 0, . . . , m − 1. The equa-

tion (2.7) is such an example of applying Gm (t, ·) to K(t, s) (s ﬁxed), implying that (aκt + bκt t) (s−t) , if t ≤ s ∂ m K(t, s) (m−1)! −1 = ρ (t)Gm (s, t) = m 0, ∂t otherwise. m−1 (2.17)

The well-deﬁned higher-order derivatives of the kernel can be derived from (2.17), and they should be treated from knot to knot if ρ−1 (t) is piecewise. Lower-order l-th derivatives (l < m) of (2.15) can be obtained by straightforward but tedious calculations. Here, we present only the OrdSS kernels (by setting ρ(t) ≡ 1 in Proposition 2.2) and their derivatives to be used in next section: m−1 K(t, s) = j=0 (−1)j tm−1−j sm+j (−1)m (s − t)2m−1 + I(t ≤ s), (m − 1 − j)!(m + j)! (2m − 1)!

(2.18)

∂ l K(t, s) = ∂tl

m−1−l

j=0

(−1)j tm−1−j−l sm+j (−1)m+l (s − t)2m−1−l + I(t ≤ s).(2.19) (m − 1 − j − l)!(m + j)! (2m − 1 − l)!

Note that the kernel derivative coincides with (2.17) when l = m. Remarks: Numerical methods are inevitable for approximating (2.7) when ρ−1 (t)

31

takes general forms. Rather than directly approximating (2.7), one may approximate ρ−1 (t) ﬁrst by a piecewise function with knot selection, then use the closed-form results derived in Proposition 2.2.

2.2.2

Local Penalty Adaptation

One of the most challenging problems for the AdaSS is selection of smoothing parameter λ and local penalty adaptation function ρ(t). We use the method of crossvalidation based on the leave-one-out scheme. ˆ Conditional on (λ, ρ(·)), let f (t) denote the AdaSS trained on the complete sample ˆ of size n, and f [k] (t) denote the AdaSS trained on the leave-one-out sample {(ti , yi )}i=k for k = 1, . . . , n. Deﬁne the score of cross-validation by 1 n n 2

CV(λ, ρ) =

ˆ w(ti ) yi − f [i] (ti ) . i=1 (2.20)

For the linear predictor (2.13) with smoothing matrix Sλ,ρ , Craven and Wahba (1979) ˆ ˆ ˆ justiﬁed that f (ti ) − f [i] (ti ) = Sλ,ρ (yi − f [i] (ti )) for i = 1, . . . , n. It follows that 1 n n CV(λ, ρ) =

i=1

ˆ 1 w(ti )(yi − f (ti ))2 = yT W(I − diag(Sλ,ρ ))−2 (I − Sλ,ρ )2 y, (2.21) (1 − Sλ,ρ [i, i])2 n

where diag(Sλ,ρ ) is the diagonal part of the smoothing matrix. Replacing the elements of diag(Sλ ) by their average n−1 trace(Sλ,ρ ), we obtain the so-called generalized crossvalidation score 1 GCV(λ, ρ) = n n i=1

ˆ w(ti )(yi − fλ (ti ))2 . (1 − n−1 trace(Sλ,ρ ))2

(2.22)

For ﬁxed ρ(·), the best tuning parameter λ is usually found by minimizing (2.21) or (2.22) on its log scale, i.e. by varying log λ on the real line. To select ρ(·), consider the rules: (1) choose a smaller value of ρ(t) if f (·) is less smooth at t; and (2) the local roughness of f (·) is quantiﬁed by the squared m-th

32

derivative f (m) (t) . If the true f (m) (t) were known, it is reasonable to determine the shape of ρ(t) by ρ−1 (t) ∝ f (m) (t) ,

2

2

(2.23)

Since the true function f (t) is unknown a priori, so are its derivatives. Let us consider the estimation of the derivatives. In the context of smoothing splines, nonparametric derivative estimation is usually obtained by taking derivatives of the spline estimate. Using that as an initial estimate of local roughness, we may adopt a two-step procedure for selection of ρ(t), as follows. Step 1: run the OrdSS with m = m + a (a = 0, 1, 2) and cross-validated λ to ˜ ˜ obtain the initial function estimate: f (t) = where K(t, s) =

1 0 m+l−1 j=0

αj φj (t) + ˜

n ˜ i=1 ci K(t, ti ),

Gm+a (t, u)Gm+a (s, u)du.

˜ Step 2: calculate f (m) (t) and use it to determine the shape of ρ−1 (t). In Step 1, with no prior knowledge, we start with OrdSS upon the improper adaptation ρ(t) ≡ 1. The extra order a = 0, 1, 2 imposed on penalization of derivatives ˜

1 (m+a) [f (t)]2 dt 0

is to control the smoothness of the m-th derivative estimate. So, the

˜ choice of a depends on how smooth the estimated f (m) (t) is desired to be. ˜ In Step 2, the m-th derivative of f (t) can be obtained by diﬀerentiating the bases, a−1 n

˜ f (m) (t) = j=0 αm+j φj (t) + ˜ i=1 ci ˜

∂m K(t, ti ) ∂tm

(2.24)

where

∂m K(t, ti ) ∂tm

is calculated according to (2.19) after replacing m by m + a. The

goodness of the spline derivative depends largely on the spline estimate itself. Rice and Rosenblatt (1983) studied the large-sample properties of spline derivatives (2.24) that have slower convergence rates than the spline itself. Given the ﬁnite sample, we ﬁnd by simulation study that (2.24) tends to under-smooth, but it may capture the rough shape of the derivatives upon standardization. Figure 2.2 demonstrates the 33

Figure 2.2: OrdSS function estimate and its 2nd-order derivative (upon standardization): scaled signal from f (t) = sin(ωt) (upper panel) and f (t) = sin(ωt4 ) (lower panel), ω = 10π, n = 100 and snr = 7. The sin(ωt4 ) signal resembles the Doppler function in Figure 2.1; both have time-varying frequency. estimation of the 2nd-order derivative for f (t) = sin(ωt) and f (t) = sin(ωt4 ) using the OrdSS with m = 2, 3. In both sine-wave examples, the order-2 OrdSS could yield rough estimates of derivative, and the order-3 OrdSS could give smooth estimates but large bias on the boundary. Despite these downside eﬀects, the shape of the derivative can be more or less captured, meeting our needs for estimating ρ(t) below. ˜ In determining the shape of ρ−1 (t) from f (m) (t), we restrict ourselves to the piecewise class of functions, as in Proposition 2.2. By (2.23), we suggest to estimate ρ−1 (t) piecewise by the method of constrained least squares:

2γ

˜ f (m) (t)

+ ε = a0 ρ−1 (t) = aκt + bκt t,

κt = min{k : τk ≥ t}

(2.25)

subject to the constraints (1) positivity: ρ−1 (t) > 0 for t ∈ [0, 1], (2) continuity: ak + 34

bk τk = ak+1 + bk+1 τk for k = 1, . . . , K − 1 and (3) normalization: a0 = a0

K k=1

− τk dt τk−1 ak +bk t

1 0

ρ(t)dt =

> 0, for given γ ≥ 1 and pre-speciﬁed knots {0 ≡ τ0 < τ1 < . . . <

− τK−1 < 1} and τK = 1. Clearly, ρ−1 (t) is an example of linear regression spline. If the

continuity condition is relaxed and bk ≡ 0, we obtain the piecewise-constant ρ−1 (t). ˜ The power transform f (m) (t)

2γ

˜ , γ ≥ 1 is used to stretch [f (m) (t)]2 , in order to

make up for (a) overestimation of [f (m) (t)]2 in slightly oscillating regions and (b) underestimation of [f (m) (t)]2 in highly oscillating regions, since the initial OrdSS is trained under the improper adaptation ρ(t) ≡ 1. The lower-right panel of Figure 2.2 ˜ is such an example of overestimating the roughness by the OrdSS. The knots {τk }K ∈ [0, 1] can be speciﬁed either regularly or irregularly. In most k=1 situations, we may choose equally spaced knots with K to be determined. For the third example in Figure 2.1 that shows ﬂat vs. rugged heterogeneity, one may choose sparse knots for the ﬂat region and dense knots for the rugged region, and may also choose regular knots such that ρ−1 (t) is nearly constant in the ﬂat region. However, for the second example in Figure 2.1 that shows jump-type heterogeneity, one should shrink the size of the interval containing the jump. Given the knots, the parameters (λ, γ) can be jointly determined by the crossvalidation criteria (2.21) or (2.22). Remarks: The selection of the local adaptation ρ−1 (t) is a non-trivial problem, for ˜ which we suggest to use ρ−1 (t) ∝ f (m) (t)

2γ

together with piecewise approximation.

This can be viewed as a joint force of Abramovich and Steinberg (1996) and Pintore, et al. (2006), where the former directly executed (2.23) on each sampled ti , and the latter directly impose the piecewise constant ρ−1 (t) without utilizing the fact (2.23). Note that the approach of Abramovich and Steinberg (1996) may have the estimation ˜ bias of f (m) (t)

2

as discussed above, and the r.k. evaluation by approximation based

on a discrete set of ρ−1 (ti ) remains another issue. As for Pintore, et al. (2006), the high-dimensional parametrization of ρ−1 (t) requires nonlinear optimization whose 35

stability is usually of concern. All these issues can be resolved by our approach, which therefore has both analytical and computational advantages.

2.3

AdaSS Properties

Recall that the OrdSS (2.2) has many interesting statistical properties, e.g. (a) it is a natural polynomial spline of degree (2m − 1) with knots placed t1 , . . . , tn (hence, the OrdSS with m = 2 is usually referred to the cubic smoothing spline); (b) it has a Bayesian interpretation through the duality between RKHS and Gaussian stochastic processes (Wahba, 1990, §1.4-1.5); (c) its large sample properties of consistency and rate of convergence depends on the smoothing parameter (Wahba, 1990, §4 and references therein); (d) it has an equivalent kernel smoother with variable bandwidth (Silverman, 1984). Similar properties are desired to be established for the AdaSS. We leave (c) and (d) to future investigation. For (a), the AdaSS properties depend on the r.k. (2.7) and the structure of local adaptation ρ−1 (t). Pintore, et al. (2006) investigated the piecewiseconstant case of ρ−1 (t) and argued that the AdaSS is also a piecewise polynomial spline of degree (2m − 1) with knots on t1 , . . . , tn , while it allows for more ﬂexible lower-order derivatives than the OrdSS. The piecewise-linear case of ρ−1 (t) can be analyzed in a similar manner. For the AdaSS (2.8) with the r.k. (2.15), m−1 f (t) = j=0 αj φj (t) + i:ti ≤t

ci K(t, ti ) + i:ti >t

ci K(t, ti )

with degree (m − 1) in the ﬁrst two terms, and degree 2m in the third term (after one degree increase due to the trend of ρ−1 (t)).

36

For (b), the AdaSS based on the r.k. (2.7) corresponds to the following zero-mean Gaussian process

1

Z(t) =

0

ρ−1/2 (u)Gm (t, u)dW (u),

t ∈ [0, 1]

(2.26)

where W (t) denotes the standard Wiener process. The covariance kernel is given by E[Z(t)Z(s)] = K(t, s). When ρ(t) ≡ 1, the Gaussian process (2.26) is the (m − 1)fold integrated Wiener process studied by Shepp (1966). Following Wahba (1990; §1.5), consider a random eﬀect model F (t) = αT φ(t) + b1/2 Z(t), then the best linear unbiased estimate of F (t) from noisy data Y (t) = F (t) + ε corresponds to the AdaSS solution (2.8), provided that the smoothing parameter λ = σ 2 /nb. Moreover, one may set an improper prior on the coeﬃcients α, derive the posterior of F (t), then obtain a Bayesian type of conﬁdence interval estimate; see Wahba (1990; §5). Such conﬁdence intervals on the sampling points are given in (2.14). The inverse function ρ−1 (t) in (2.26) can be viewed as the measurement of nonstationary volatility (or, variance) of the m-th derivative process such that dm Z(t) dW (t) = ρ−1/2 (t) . m dt dt

(2.27)

By the duality between Gaussian processes and RKHS, dm Z(t)/dtm corresponds to f (m) (t) in (2.6). This provides an alternative reasoning for (2.23) in the sense that ρ−1 (t) can be determined locally by the second-order moments of dm Z(t)/dtm . To illustrate the connection between the AdaSS and the Gaussian process with nonstationary volatility, consider the case m = 1 and piecewise-constant ρ−1 (t) = ak , t ∈ [τk−1 , τk ) for some ak > 0 and pre-speciﬁed knots 0 ≡ τ0 < τ1 < · · · < τK ≡ 1. By (2.15), the r.k. is given by κt∧s K(t, s) = k=1 ak t ∧ s − τk−1 ,

t, s ∈ [0, 1].

(2.28)

37

By (2.26) and (2.27), the corresponding Gaussian process satisﬁes that √

dZ(t) =

ak dW (t), for t ∈ [τk−1 , τk ), k = 1, . . . , K

(2.29)

where ak is the local volatility from knot to knot. Clearly, Z(t) reduces to the standard Wiener process W (t) when ak ≡ 1. As one of the most appealing features, the construction (2.29) takes into account the jump diﬀusion, which occurs in [τk−1 , τk ) when the interval shrinks and the volatility ak blows up.

2.4

Experimental Results

The implementation of OrdSS and AdaSS diﬀers only in the formation of Σ through the r.k. K(t, s). It is most crucial for the AdaSS to select the local adaptation ρ−1 (t). Consider the four examples in Figure 2.1 with various scenarios of heterogeneous smoothness. The curve ﬁtting results by the OrdSS with m = 2 (i.e. cubic smoothing spline) are shown in Figure 2.3, as well as the 95% conﬁdence intervals for the last two cases. In what follows, we discuss the need of local adaptation in each case, and compare the performance of OrdSS and AdaSS. Simulation Study. The Doppler and HeaviSine functions are often taken as the benchmark examples for testing adaptive smoothing methods. The global smoothing by the OrdSS (2.2) may fail to capture the heterogeneous smoothness, and as a consequence, over-smooth the region where the signal is relatively smooth and under-smooth where the signal is highly oscillating. For the Doppler case in Figure 2.3, the underlying signal increases its smoothness with time, but the OrdSS estimate shows lots of wiggles for t > 0.5. In the HeaviSine case, both jumps at t = 0.3 and t = 0.72 could be captured by the OrdSS, but the smoothness elsewhere is sacriﬁced as a compromise. The AdaSS ﬁtting results with piecewise-constant ρ−1 (t) are shown in Figure 2.4. 38

Figure 2.3: OrdSS curve ﬁtting with m = 2 (shown by solid lines). The dashed lines represent 95% conﬁdence intervals. In the credit-risk case, the log loss rates are considered as the responses, and the time-dependent weights are speciﬁed proportional to the number of replicates. Initially, we ﬁtted the OrdSS with cross-validated tuning parameter λ, then estimated the 2nd-order derivatives by (2.24). To estimate the piecewise-constant local adaptation ρ−1 (t), we pre-speciﬁed 6 equally spaced knots (including 0 and 1) for the Doppler function, and pre-speciﬁed the knots (0, 0.295, 0.305, 0.5, 0.715, 0.725, 1) for the HeaviSine function. (Note that such knots can be determined graphically based on the plots of the squared derivative estimates.) Given the knots, the function ρ−1 (t) is estimated by constrained least squares (2.25) conditional on the power transform parameter γ, where the best γ ∗ is found jointly with λ by minimizing the score of cross-validation (2.21). In our implementation, we assumed that both tuning parameters are bounded such that log λ ∈ [−40, 10] and γ ∈ [1, 5], then performed the bounded nonlinear optimization. Table 2.1 gives the numerical results in both simulation cases, including the cross-validated (λ, γ), the estimated variance of noise (cf. true σ 2 = 1), and the

39

Figure 2.4: Simulation study of Doppler and HeaviSine functions: OrdSS (blue), AdaSS (red) and the heterogeneous truth (light background).

40

Table 2.1: AdaSS parameters in Doppler and HeaviSine simulation study. Simulation n snr λ γ σ2 ˆ dof CV Doppler 2048 7 1.21e-012 2.13 0.9662 164.19 2180.6 7 1.86e-010 4.95 1.0175 35.07 2123.9 HeaviSine 2048 equivalent degrees of freedom. Figure 2.4 (bottom panel) shows the improvement of AdaSS over OrdSS after zooming in. The middle panel shows the data-driven estimates of the adaptation function ρ(t) plotted at log scale. For the Doppler function, the imposed penalty ρ(t) in the ﬁrst interval [0, 0.2) is much less than that in the other four intervals, resulting in heterogenous curve ﬁtting. For the HeaviSine function, low penalty is assigned to where the jump occurs, while the relatively same high level of penalty is assigned to other regions. Both simulation examples demonstrate the advantage of using the AdaSS. A similar simulation study of the Doppler function can be found in Pintore, et al.(2006) who used a small sample size (n = 128) and high-dimensional optimization techniques. Motorcycle-Accident Data. Silverman (1985) originally used the motorcycle-accident experimental data to test the non-stationary OrdSS curve ﬁtting; see the conﬁdence intervals in Figure 2.3. Besides error non-stationarity, smoothness non-homogeneity is another important feature of this data set. To ﬁt such data, the treatment from a heterogenous smoothness point of view is more appropriate than using only the non-stationary error treatment. Recall that the data consists of acceleration measurements (subject to noise) of the head of a motorcycle rider during the course of a simulated crash experiment. Upon the crash occurrence, it decelerates to about negative 120 gal (a unit gal is 1 centimeter per second squared). The data indicates clearly that the acceleration rebounds well above its original level before setting back. Let us focus on the period prior to the crash happening at t ≈ 0.24 (after time re-scaling), for which it is reasonable to

41

Figure 2.5: Non-stationary OrdSS and AdaSS for Motorcycle-Accident Data.

Figure 2.6: OrdSS, non-stationary OrdSS and AdaSS: performance in comparison.

42

assume the acceleration stays constant. However, the OrdSS estimate in Figure 2.3 shows a slight bump around t = 0.2, a false discovery. In presence of non-stationary errors N (0, σ 2 /w(ti )) with unknown weights w(ti ), a common approach is to begin with an unweighted OrdSS and estimate w−1 (ti ) from the local residual sums of squares; see Silverman (1985). The local moving averaging (e.g. loess) is usually performed; see the upper-left panel in Figure 2.5. Then, the smoothed weight estimates are plugged into (2.2) to run the non-stationary OrdSS. To perform AdaSS (2.6) with the same weights, we estimated the local adaptation ρ−1 (t) from the derivatives of non-stationary OrdSS. Both results are shown in Figure 2.5. Compared to Figure 2.3, both non-stationary spline models result in tighter estimates of 95% conﬁdence intervals than the unweighted OrdSS. Figure 2.6 compares the performances of stationary OrdSS, non-stationary OrdSS and non-stationary AdaSS. After zooming in around t ∈ [0.1, 0.28], it is clear that AdaSS performs better than both versions of OrdSS in modeling the constant acceleration before the crash happening. For the setting-back process after attaining the maximum acceleration at t ≈ 0.55, AdaSS gives a more smooth estimate than OrdSS. Besides, they diﬀer in estimating the minimum of the acceleration curve at t ≈ 0.36, where the non-stationary OrdSS seems to have underestimated the magnitude of deceleration, while the estimate by the non-stationary AdaSS looks reasonable. Credit-Risk Data. The last plot in Figure 2.1 represents a tweaked sample in retail credit risk management. It is of our interest to model the growth or the maturation curve of the log of loss rates in lifetime (month-on-book). Consider the sequence of vintages booked in order from 2000 to 2006. Their monthly loss rates are observed up to December 2006 (right truncation). Aligning them to the same origin, we have the observations accumulated more in smaller month-on-book and less in larger month-on-book, i.e., the sample size decreases in lifetime. As a consequence, the simple pointwise or moving 43

Figure 2.7: AdaSS estimate of maturation curve for Credit-Risk sample: piecewiseconstant ρ−1 (t) (upper panel) and piecewise-linear ρ−1 (t) (lower panel). averages have increasing variability. Nevertheless, our experiences tell that the maturation curve is expected to have increasing degrees of smoothness once the lifetime exceeds certain month-on-book. This is also a desired property of the maturation curve for the purpose of extrapolation. By (2.4), spline smoothing may work on the pointwise averages subject to nonstationary errors N (0, σ 2 /w(ti )) with weights determined by the number of replicates. The non-stationary OrdSS estimate is shown in Figure 2.3, which however does not smooth out the maturation curve for large month-on-book. The reasons of the unstability in the rightmost region could be (a) OrdSS has the intrinsic boundary bias for especially small w(ti ), and (b) the raw inputs for training purpose are contaminated and have abnormal behavior. The exogenous cause of contamination in (b) will be explained in Chapter III. Alternatively, we applied the AdaSS with judgemental prior on heterogeneous smoothness of the target function. For demonstration, we purposefully speciﬁed 44

ρ−1 (t) to take the forms of Figure 2.7, including (a) piecewise-constant and (b) piecewise-linear. Both speciﬁcations are non-decreasing in lifetime and they span about the same range. The curve ﬁtting results for both ρ−1 (t) are nearly identical upon visual inspection. Compared with the OrdSS performance in the large monthon-book region, the AdaSS yields smoother estimate of maturation curve, as well as tighter conﬁdence intervals based on judgemental prior. The AdaSS estimate of maturation curve, transformed to the original scale, is plotted by the red solid line in Figure 2.1. This credit-risk sample will appear again in Chapter III of vintage data analysis. The really interesting story is yet to unfold.

2.5

Summary

In this chapter we have studied the adaptive smoothing spline for modeling heterogeneously ‘smooth’ functions including possible jumps. In particular, we studied the AdaSS with piecewise adaptation of local penalty and derived the closed-form reproducing kernels. To determine the local adaptation function, we proposed a shape estimation procedure based the function derivatives, together with the method of cross-validation for parameter tuning. Four diﬀerent examples, each having a speciﬁc feature of heterogeneous smoothness, were used to demonstrate the improvement of AdaSS over OrdSS. Future works include the AdaSS properties w.r.t. (c) and (d) listed in Section 2.3. Its connection with a time-warping smoothing spline is also under our investigation. However, it is not clear how the AdaSS can be extended to multivariate smoothing in a tensor-product space, as an analogy of the smoothing spline ANOVA models in Gu (2002). In next chapter, we will consider the application of AdaSS in a dual-time domain, where the notion of smoothing spline will be generalized via other Gaussian processes. 45

2.6

Technical Proofs

Proof to Proposition 2.1: by minor modiﬁcation on Wahba (1990; §1.2-1.3). By Taylor’s theorem with remainder, any function in Wm (2.3) can be expressed by m−1 j

f (t) = j=0 t (j) f (0) + j!

1

Gm (t, u)f (m) (u)du

0

(2.30)

where Gm (t, u) = problem: if f1 f1 (t) =

1 0 (m)

(t−u)+ (m−1)!

(m−1)

is the Green’s function for solving the boundary value

= g with f1 ∈ Bm = {f : f (k) (0) = 0, for k = 0, 1, . . . , m − 1}, then

1 0

Gm (t, u)g(u)du. The remainder functions f1 (t) =

Gm (t, u)f (m) (u)du lie

in the subspace H = Wm ∩ Bm . Given the bounded integrable function ρ(t) > 0 on t ∈ [0, 1], associate H with the inner product

1 (m) (m)

f1 , g1

H

=

0

f1 (u)g1 (u)λ(u)du,

2 H 1 0 (m)

(2.31)

and the induced square norm f1

=

λ(u)[f1 (u)]2 du. By similar arguments to

Wahba (1990; §1.2), one may verify that H is an RKHS with r.k. (2.7), where the reproducing property

f1 (·), K(t, ·)

∂m K(t, s) ∂tm

H

= f1 (t),

∀f1 ∈ H

(2.32)

follows that

= ρ(t)−1 Gm (s, t).

Then, using the arguments of Wahba (1990; §1.3) would prove the ﬁnite-dimensional expression (2.8) that minimizes the AdaSS objective functional (2.6). Proof to Proposition 2.2: The r.k. with piecewise ρ−1 (t) equals κt∧s − t∧s∧τk

K(t, s) = k=1 τk−1

(ak + bk u)

(u − t)m−1 (u − s)m−1 du. (m − 1)! (m − 1)!

46

For any τk−1 ≤ t1 ≤ t2 < τk , using the successive integration by parts, t2 (u − t)m−1 (u − s)m (u − t)m−1 (u − s)m−1 (ak + bk u) du = d (m − 1)! (m − 1)! (m − 1)! m! t1 t1 t2 (u − t)m−2 (u − s)m (u − t)m−1 (u − s)m t2 (ak + bk u) = (ak + bk u) − du (m − 1)! m! (m − 2)! m! t1 t1 t2 (u − t)m−1 (u − s)m −bk du = · · · , (m − 1)! m! t1 t2

(ak + bk u)

it is straightforward (but tedious) to derive that t2 (ak + bk u) t1 m−1

(u − t)m−1 (u − s)m−1 du (m − 1)! (m − 1)! j t2 t1 m−1

(u − t)m−1−j (u − s)m+j = (−1) (ak + bk u) (m − 1 − j)! (m + j)! j=0

− bk j=0 (−1)j Tj

in which Tj denotes the integral evaluations for j = 0, 1, . . . , m − 1 t2 Tj ≡ t1 (u − t)m−1−j (u − s)m+j du = (m − 1 − j)! (m + j)!

m−1−j

(−1)l l=0 (u − t)m−1−j−l (u − s)m+1+j+l (m − 1 − j − l)!(m + 1 + j + l)!

t2

. t1 − Choosing τ1 = τk−1 , τ2 = t ∧ s ∧ τk for k = 1, . . . , κt∧s and adding them all gives the

ﬁnal r.k. evaluation by κt∧s m−1

K(t, s) = k=1 j=0

(−1)j (ak + bk u) m−1−j (u − t)m−1−j (u − s)m+j (m − 1 − j)!(m + j)!

− t∧s∧τk

−bk l=0 (u − t)m−1−j−l (u − s)m+1+j+l (−1)l (m − 1 − j − l)!(m + 1 + j + l)!

. τk−1 47

CHAPTER III

Vintage Data Analysis

3.1

Introduction

The vintage data represents a special class of functional, longitudinal or panel data with dual-time characteristics. It shares the cross-sectional time series feature such that there are J subjects each observed at calendar time points {tjl , l = 1, . . . , Lj }, where Lj is the total number of observations on the j-th subject. Unlike purely longitudinal setup, each subject j we consider corresponds to a vintage originated at time vj such that v1 < v2 < . . . < vJ and the elapsed time mjl = tjl − vj measures the lifetime or age of the subject. To this end, we denote the vintage data by

y(mjl , tjl ; vj ),

l = 1, . . . , Lj , j = 1, . . . , J ,

(3.1)

where the dual-time points are usually sampled from the regular grids such that for each vintage j, mjl = 0, 1, 2, . . . and tjl = vj , vj + 1, vj + 2, . . .. We call the grids of such (mjl , tjl ; vj ) a vintage diagram. Also observed on the vintage diagram could be the covariate vectors x(mjl , tjl ; vj ) ∈ Rp . In situations where more than one segment of dual-time observations are available, we may denote the (static) segmentation variables by zk ∈ Rq , and denote the vintage data by {y(mkjl , tkjl ; vkj ), x(mkjl , tkjl ; vkj )} for segment k = 1, . . . , K. See Figure 1.3 for an illustration of vintage diagram and 48

dual-time collection of observations, as well as the hierarchical structure of multisegment credit portfolios. The vintage diagram could be truncated in either age or time, and several interesting prototypes are illustrated in Figure 3.1. The triangular prototype on the top-left panel is the most typical as the time series data are often subject to the systematic truncation from right (say, observations up to today). Each vintage series evolves in the 45◦ direction such that for the same vintage vj , the calendar time tj and the lifetime mj move at the same speed along the diagonal — therefore, the vintage diagram corresponds to the hyperplane {(t, m, v) : v = t − m} in the 3D space. In practice, there exist other prototypes of vintage data truncation. The rectangular diagram is obtained after truncation in both lifetime and calendar time. The corporate default rates released by Moody’s, shown in Table 1.1, are truncated in lifetime to be 20 years maximum. The other trapezoidal prototype corresponds to the time-snapshot selection common to credit risk modeling exercises. Besides, the U.S. Treasury rates released for multiple maturity dates serve as an example of the parallelogram prototype. The age, time and vintage dimensions are all of potential interest in ﬁnancial risk management. The empirical evidences tell that the age eﬀects of self-maturation nature are often subject to smoothness in variation. The time eﬀects are often dynamic and volatile due to macroeconomic environment. Meanwhile, the vintage eﬀects are conceived as the forward-looking measure of performance since origination, and they could be treated as random eﬀects upon appropriate structural assumption. Bearing in mind these practical considerations, we aim to develop a formal statistical framework for vintage data analysis (VDA), by integrating ideas from functional data analysis (FDA), longitudinal data analysis (LDA)1 and time series analysis (TSA). To achieve this, we will take a Gaussian process modeling approach, which is ﬂexible

Despite the overlap between FDA and LDA, we try to diﬀerentiate them methodologically by tagging FDA with nonparametric smoothing and tagging LDA with random-eﬀect modeling.

1

49

Figure 3.1: Vintage diagram upon truncation and exempliﬁed prototypes.

50

enough to cover a diversity of scenarios. This chapter develops the VDA framework for continuous type of responses (usually, rates). It is organized as follows. In Section 3.2, we discuss the maturationexogenous-vintage (MEV) decomposition in general. The MEV models based on Gaussian processes are presented in Section 3.3, where we discuss Kriging, smoothing spline and kernel methods. An eﬃcient backﬁtting algorithm is provided for model estimation. Section 3.4 covers the semiparametric regression given the dual-time covariates and segmentation variables. Computational results are presented in Section 3.5, including a simulation study for assessing the MEV decomposition technique, and real applications in both corporate and retail credit risk cases. We conclude the chapter in Section 3.6 and give technical proofs in the last section.

3.2

MEV Decomposition Framework

Let V = {vj : 0 ≡ v1 < v2 < . . . < vJ < ∞} be a discrete set of origination times of J vintages, and deﬁne the dual-time domain

Ω = (m, t; v) : m ≥ 0, t = v + m, v ∈ V ,

where m denotes the lifetime and t denotes the calendar time. Unlike a usual 2D (m, t) space, the dual-time domain Ω consists of isolated 45◦ lines {(m, t) : t − m = vj , m ≥ 0} for each ﬁxed vintage vj ∈ V. The triangular vintage diagram in Figure 3.1 is the discrete counterpart of Ω upon right truncation in calendar time. Let Ω be the support of the vintage data (3.1), for which we propose the MEV (maturation-exogenous-vintage) decomposition modeling framework

η(y(m, t; v)) = f (m) + g(t) + h(v) + ε,

(3.2)

51

where η is a pre-speciﬁed transform function, ε is the random error, and the three MEV components have the following practical interpretations: a) f (m) represents the maturation curve characterizing the endogenous growth, b) g(t) represents the exogenous inﬂuence by macroeconomic conditions, c) h(v) represents the vintage heterogeneity measuring the origination quality. For the time being, let f, g, h be open to ﬂexible choices of parametric, nonparametric or stochastic process models. We begin with some general remarks about model identiﬁcation. Suppose the maturation curve f (m) in (3.2) absorbs the intercept eﬀect, and the constraints Eg = Eh = 0 are set as usual: tmax E tmin g(t)dt = E

V

h(v)dv = 0.

(3.3)

The question is, do these constraints suﬃce to ensure the model identiﬁability? It turns out for the dual-time domain Ω, all the MEV decomposition models are subject to the linear dependency, as stated formally in the following lemma. Lemma 3.1 (Identiﬁcation). For the MEV models (3.2) deﬁned on the dual-time domain Ω, write f = f0 + f1 with the linear part f0 and the nonlinear part f1 and similarly write g = g0 + g1 , h = h0 + h1 . Then, one of f0 , g0 , h0 is not identiﬁable. The identiﬁcation (or collinearity) problem in Lemma 3.1 is incurred by the hyperplane nature of the dual-time domain Ω in the 3D space: {(t, m, v) : v = t − m} (see Figure 3.1 top right panel). We need to break up such linear dependency in order to make the MEV models estimable. A practical way is to perform binning in age, time or vintage direction. For example, given the vintage data of triangular prototype, one may group the large m region, the small t region and the large v region, as

52

these regions have few observations. A commonly used binning procedure for h(v) is to assume the piecewise-constant vintage eﬀects by grouping monthly vintages every quarter or every year. An alternative strategy for breaking up the linear dependency is to directly remove the trend for one of f, g, h functions. Indeed, this strategy is simple to implement, provided that f, g, h all could be expanded by certain basis functions. Then, one only needs to ﬁx the linear bases for any two of f, g, h. By default, we remove the trend of h and let it measure only the nonlinear heterogeneity. A word of caution is in order in situations where there does exist an underlying trend h0 (v), then it would be absorbed by f (m) and g(t). One may retrieve h0 (v) = γv by simultaneously adjusting f (m) → f (m) + γm and g(t) → g(t) − γt. To determine the slope γ requires extra knowledge about the trend of at least one of f, g, h. The trend removal for h(v) is needed only when there is no such extra knowledge. The MEV decomposition with zero-trend vintage eﬀects may be referred to as the ad hoc approach. In what follows we review several existing approaches that are related to MEV decomposition. The ﬁrst example is a conventional age-period-cohort (APC) model in social sciences of demography and epidemiology. The second example is the generalized additive model (GAM) in nonparametric statistics, where the smoothing spline is used as the scatterplot smoother. The third example is an industry GAMEED approach that may reduce to a sequential type of MEV decomposition. The last example is another industrial DtD technique involving the interactions of MEV components. They all can be viewed as one or another type of vintage data analysis. The APC Model. Suppose there are I distinct m-values and L distinct t-values in the vintage data. By (3.2), setting f (m) = µ +

I i=1

αi I(m = mi ), g(t) =

L l=1

βl I(t = tl ) and h(v) =

53

J j=1

γj I(v = vj ) gives us the the APC (age-period-cohort) model

η(y(mi , tl ; vj )) = µ + αi + βl + γj + ε,

(3.4)

where the vintage eﬀects {γj s} used to be called the cohort eﬀects; see Mason, et al. (1973) and Mason and Wolﬁnger (2002). Assume i αi =

l

βl =

j

vj = 0, as

usual; then one may reformulate the APC model as y = Xθ + ε, where y is the the vector consisting of η(y(mi , tl ; vj )) and

(α) (α) (β) (β) (γ) (γ)

X = 1, x1 , . . . , xI−1 , x1 , . . . , xL−1 , x1 , . . . , xJ−1

(α) (α)

with dummy variables. For example, xi

is deﬁned by xi (m) = I(m = mi )−I(m =

mI ) for i = 1, . . . , I − 1. Without loss of generality, let us select the factorial levels (mI , tL , vJ ) to satisfy mI − tL + vJ = 0 (after reshuﬄing the indices of v). The identiﬁcation problem of the APC model (3.4) follows Lemma 3.1, since the regression matrix is linearly dependent through

I−1 L−1 J−1

(mi − i=1 (α) m)xi ¯

− l=1 (tl −

¯ (β) t )xl

+ j=1 ¯ ¯ ¯ (vj − v )xj = m − t + v , ¯

(γ)

(3.5)

where m = ¯

I i=1

¯ mi /I, t =

L l=1 tl /L

and v = ¯

J j=1

vj /J. So, the ordinary least

squares (OLS) cannot be used for ﬁtting y = Xθ + ε. This is a key observation by Kupper (1985) that generated long-term discussions and debates on the estimability of the APC model. Among others, Fu (2000) suggested to estimate the APC model by ridge regression, whose limiting case leads to a so-called intrinsic estimator θ = (XT X)+ XT y, where A+ is the Moore-Penrose generalized inverse of A satisfying that AA+ A = A. The discrete APC model tends to massively identify the eﬀects at the grid level and often ends up with unstable estimates. Since the vintage data are always unbalanced

54

in m, t or v projection, there would be very limited observations for estimating certain factorial eﬀects in (3.4). The APC model is formulated over-ﬂexibly in the sense that it does not regularize the grid eﬀect by neighbors. Fu (2008) suggested to use the spline model for h(v); however, such spline regularization would not bypass the identiﬁcation problem unless it does not involve a linear basis. The Additive Splines. If f, g, h are all assumed to be unspeciﬁed smooth functions, the MEV models would become the generalized additive models (GAM); see Hastie and Tibshirani (1990). For example, using the cubic smoothing spline as the scatterplot smoother, we have the additive splines model formulated by

J Lj 2 2

arg min f,g,h j=1 l=1

η(yjl ) − f (mjl ) − g(tjl ) − h(vj )

2

+ λf

f (2) (m) dm h(2) (v) dv ,

2

+λg

g (2) (t) dt + λh

(3.6)

where λf , λg , λh > 0 are the tuning parameters separately controlling the smoothness of each target function; see (2.2). Usually, the optimization (3.6) is solved by the backﬁtting algorithm that iterates

ˆ f ← Sf g ← Sg ˆ ˆ h ← Sh

ˆ η(yjl ) − g (tjl ) − h(vj ) ˆ ˆ ˆ η(yjl ) − f (mjl ) − h(vj )

j,l j,l j,l

(centered) (centered)

(3.7)

ˆ η(yjl ) − f (mjl ) − g (tjl ) ˆ

after initializing g = h = 0. See Chapter II about the formation of smoothing matrices ˆ ˆ Sf , Sg , Sh , for which the corresponding smoothing parameters can be selected by the method of generalized cross-validation. However, the linear parts of f, g, h are not identiﬁable, by Lemma 3.1. To get around the identiﬁcation problem, we may take

55

the ad hoc approach to remove the trend of h(v). The trend-removal can be achieved by appending to every iteration of (3.7):

J J 2 −1 vj j=1 j=1

ˆ ˆ h(v) ← h(v) − v

ˆ vj h(vj ).

We provide more details on (3.6) to shed light on its further development in the next section. By Proposition 2.1 in Chapter II, each of f, g, h can be expressed by the basis expansion of the form (2.8). Taking into account the identiﬁability of both the intercept and the trend, we may express f, g, h by

I i=1

f (m) = µ0 + µ1 m + g(t) = µ2 t + h(v) =

J j=1 L l=1

αi K(m, mi ) (3.8)

βl K(t, tl )

γj K(v, vj ).

Denote µ = (µ0 , µ1 , µ2 )T , α = (α1 , . . . , αI )T , β = (β1 , . . . , βL )T , and γ = (γ1 , . . . , γJ )T . We may equivalently write (3.6) as the regularized least squares problem 1 y − Bµ − Σf α − Σg β − Σh γ n

J j=1 2 2 Σf 2 Σg 2 Σh

min

+ λf α

+ λg β

+ λh γ

, (3.9)

where y is the vector of n =

Lj observations {η(yjl )}, B, Σf , Σf are formed by

B=

(1, mjl , tjl )

j,l n×3

,

Σf = K(mjl , mi )

n×I

,

Σf = K(mi , mi )

I×I

and Σg , Σg , Σh , Σh are formed similarly. The GAMEED Approach. The GAMEED refers to an industry approach, namely the generalized additive maturation and exogenous eﬀects decomposition, adopted by Enterprise Quantitative Risk Management, Bank of America, N.A. (Sudjianto, et al., 2006). It has also been 56

documented as part of the patented work “Risk and Reward Assessment Mechanism” (USPTO: 20090063361). Essentially, it has two practical steps: ˆ 1. Estimate the maturation curve f (m) and the exogenous eﬀect g (t) from ˆ

η(y(m, t; v)) = f (m; α) + g(t; β) + ε,

Eg = 0

where f (m; α) can either follow a spline model or take a parametric form supported by prior knowledge, and g(t; β) can be speciﬁed like in (3.4) upon necessary time binding while retaining the ﬂexibility in modeling the macroeconomic inﬂuence. If necessary, one may further model g (t; β) by a time-series model in ˆ order to capture the autocorrelation and possible jumps. ˆ 2. Estimate the vintage-speciﬁc sensitivities to f (m) and g (t) by performing the ˆ regression ﬁtting

(g) (f ) ˆ ˆ η(y(mjl , tjl ; vj )) = γj + γj f (mjl ) + γj g (tjl ) + ε

for each vintage j (upon necessary bucketing). The ﬁrst step of GAMEED is a reduced type of MEV decomposition without suﬀering theidentiﬁcation problem. The estimates of f (m) and g(t) can be viewed as the common factors to all vintages. In the second step, the vintage eﬀects are measured through not only the main (intercept) eﬀects, but also the interactions

(f ) (g) ˆ with f (m) and g (t). Clearly, when the interaction sensitivities γj = γj = 1, the ˆ

industry GAMEED approach becomes a sequential type of MEV decomposition. The DtD Technique. The DtD refers to the dual-time dynamics, adopted by Strategic Analytics Inc., for consumer behavior modeling and delinquency forecasting. It was brought to the

57

public domain by Breeden (2007). Using our notations, the DtD model takes the form η(y(mjl , tjl ; vj )) = γj f (mjl ) + γj g(tjl ) + ε

(f ) (g)

(3.10)

and η −1 (·) corresponds to the so-called superposition function in Breeden (2007). Clearly, the DtD model involves the interactions between the vintage eﬀects and (f (m), g(t)). Unlike the sequential GAMEED above, Breeden (2007) suggested to iteratively ﬁt f (m), g(t) and γj , γj

(f ) (g)

in a simultaneous manner.

However, the identiﬁcation problem could as well happen to the DtD technique, unless it imposes structural assumptions that could break up the trend dependency. To see this, one may use Taylor expansion for each nonparametric function in (3.10) then apply Lemma 3.1 to the linear terms. Unfortunately, such identiﬁcation problem was not addressed by Breeden (2007), and the stability of the iterative estimation algorithm might be an issue.

3.3

Gaussian Process Models

In MEV decomposition framework (3.2), the maturation curve f (m), the exogenous inﬂuence g(t) and the vintage heterogeneity h(v) are postulated to capture the endogenous, exogenous and origination eﬀects on the dual-time responses. In this section, we propose a general approach to MEV decomposition modeling based on Gaussian processes and provide an eﬃcient algorithm for model estimation.

3.3.1

Covariance Kernels

Gaussian processes are rather ﬂexible in modeling a function Z(x) through a mean function µ(·) and a covariance kernel K(·, ·), E[Z(x) − µ(x)][Z(x ) − µ(x )] = σ 2 K(x, x )

EZ(x) = µ(x),

(3.11)

58

where σ 2 is the variance scale. For ease of discussion, we will write µ(x) separately and only consider the zero-mean Gaussian process Z(x). Then, any ﬁnite-dimensional collection of random variables Z = (Z(x1 ), . . . , Z(xn )) follows a multivariate normal distribution Z ∼ Nn (0, σ 2 Σ) with Σ = [K(xi , xj )]n×n . The covariance kernel K(·, ·) plays a key role in constructing Gaussian processes. It is symmetric K(x, x ) = K(x , x) and positive deﬁnite: x,x K(x, x )f (x)f (x )dxdx >

0 for any non-zero f ∈ L2 . The kernel is said to be stationary if K(x, x ) depends only on |xi −xi |, i = 1, . . . , d for x ∈ Rd . It is further said to be isotropic or radial if K(x, x ) depends only on the distance x−x . For x ∈ R, a stationary kernel is also isotropic. There is a parsimonious approach to using stationary kernels K(·, ·) to model certain nonstationary Gaussian processes such that Cov[Z(x), Z(x )] = σ(x)σ(x )K(x, x ), where the variance scale σ 2 (x) requires a separate model of either parametric or nonparametric type. For example, Fan, Huang and Li (2007) used the local polynomial approach to smoothing σ 2 (x). Several popular classes of covariance kernels are listed below, including the adaptive smoothing spline (AdaSS) kernels discussed in Chapter II. |x − x | , 0 < κ ≤ 2, φ > 0 φ |x − x | 21−κ |x − x | κ K(x, x ) = Kκ , κ, φ > 0 (3.12) Γ(κ) φ φ K(x, x ) = exp − K(x, x ) =

X κ

Exponential family: Mat´rn family: e AdaSS r.k. family:

φ−1 (u)Gκ (x, u)Gκ (x , u)du, φ(x) > 0.

More examples of covariance kernels can be referred to Stein (1999) and Rasmussen and Williams (2006). Both the exponential and Mat´rn families consist of stationary kernels, for which e φ is the scale parameter and κ can be viewed as the shape parameter. In the Mat´rn e family, Kκ (·) denotes the modiﬁed Bessel function of order κ; see Abramowitz and Stegun (1972). Often used are the half-integer order κ = 1/2, 3/2, 5/2 and the corre-

59

sponding Mat´rn covariance kernels take the following forms e K(x, x ) = e− K(x, x ) = e− K(x, x ) = e−

|x−x | φ |x−x | φ

, κ = 1/2 |x − x | , κ = 3/2 φ |x − x | |x − x |2 + , κ = 5/2. 1+ φ 3φ2 1+ (3.13)

|x−x | φ

It is interesting that these half-integer Mat´rn kernels correspond to autocorrelated e Gaussian processes of order κ + 1/2; see Stein (1999; p. 31). In particular, the κ = 1/2 Mat´rn kernel, which is equivalent to the exponential kernel with κ = 1, is e the covariance function of the Ornstein-Uhlenbeck (OU) process. The AdaSS family of reproducing kernels (r.k.’s) are based on the Green’s function κ−1 Gκ (x, u) = (x − u)+ /(κ − 1)! with order κ and local adaptation function φ(x) > 0.

In Chapter 2.2.1, we have derived the explicit r.k. expressions with piecewise φ−1 (x). They are nonstationary covariance kernels with the corresponding Gaussian processes discussed in Chapter 2.3. In this chapter, we concentrate on a particular class of AdaSS kernels K(x, x ) = k=1 min{k:τk >x∧x }

φk x ∧ x − τk−1

(3.14)

with piecewise-constant adaptation φ−1 (x) = φk for x ∈ [τk − 1, τk ) and knots x− ≡ min τ0 < τ1 < · · · < τK ≡ x+ . By (2.29), the Gaussian process with kernel (3.14) max generalizes the standard Wiener process with nonstationary volatility, which is useful for modeling the heterogeneous behavior of an MEV component, e.g. g(t).

3.3.2

Kriging, Spline and Kernel Methods

Using Gaussian processes to model the component functions in (3.2) would give us a class of Gaussian process MEV models. Among others, we consider

η(y(m, t; v)) = µ(m, t; v) + Zf (m) + Zg (t) + Zh (v) + ε 60

(3.15)

where ε ∼ N (0, σ 2 ) is the random error, Zf (m), Zg (t), Zh (v) are three zero-mean

2 2 2 Gaussian processes with covariance kernels σf Kf , σg Kg , σh Kh , respectively. To bypass

the identiﬁcation problem in Lemma 3.1, the overall mean function is postulated to be µ(m, t; v) ∈ H0 on the hyperplane of the vintage diagram, where

H0 ⊆ span{1, m, t} ∩ null{Zf (m), Zg (t), Zh (v)},

(3.16)

among other choices. For example, choosing cubic spline kernels for all f, g, h allows for H0 = span{0, m, t}, then µ(m, t; v) = µ0 + µ1 m + µ2 t. In other situations, the functional space H0 might shrink to be span{1} in order to satisfy (3.16), then µ(m, t; v) = µ0 . In general, let us denote µ(m, t; v) = µT b(m, t; v), for µ ∈ Rd . For simplicity, let us assume that Zf (m), Zg (t), Zh (v) are mutually independent for x ≡ (m, t, v) ∈ R3 . Consider the sum of univariate Gaussian processes

Z(x) = Zf (m) + Zg (t) + Zh (v),

(3.17)

with mean zero and covariance Cov[Z(x), Z(x )] = σ 2 K(x, x ). By the independence assumption,

K(x, x ) = λ−1 Kf (m, m ) + λ−1 Kg (t, t ) + λ−1 Kh (v, v ). g f h

(3.18)

2 2 2 with λf = σ 2 /σf , λg = σ 2 /σg and λh = σ 2 /σh . Then, given the vintage data (3.1)

with n =

J j=1

Lj observations, we may write (3.15) in the vector-matrix form Cov[z] = σ 2 Σ + I

y = Bµ + z,

(3.19)

where y = η(yjl )

n×1

, B = bT jl

n×d

, z = Z(xjl ) + ε

n×1

and Σ = K(xjl , xj l )

n×n

.

By either GLS (generalized least squares) or MLE (maximum likelihood estimation),

61

µ is estimated to be

−1 −1 −1

µ = BT Σ + I

B

BT Σ + I

y.

(3.20)

where the invertibility of Σ + I is guaranteed by large values of λf , λg , λh as in the context of ridge regression. In the Gaussian process models there are additional unknown parameters for deﬁning the covariance kernel (3.18), namely (λ, θ) = (λf , λg , λh , θf , θg , θh ), as well as the unknown variance parameter σ 2 (a.k.a. the nugget eﬀect). The λf , λg , λh , as the ratios of σ 2 to the variance scales of Zf , Zg , Zh , are usually referred to as the smoothing parameters. The θf , θg , θh denote the structural parameters, e.g scale parameter φ and shape parameter κ in exponential or Mat´rn family of kernels, or local adape tation φ(·) in AdaSS kernel. By (3.19) and after dropping the constant term, the log-likelihood function is given by 1 1 n (µ, λ, θ, σ 2 ) = − log(σ 2 ) − log Σ + I − 2 (y − Bµ)T (Σ + I)−1 (y − Bµ) 2 2 2σ in which Σ depends on (λ, θ). Whenever (λ, θ) are ﬁxed, the µ estimate is given by (3.20), then one may estimate the variance parameter to be 1 (y − Bµ)Σ−1 (y − Bµ), n−p

σ2 = ˆ

(3.21)

where p = rank(B) is plugged in order to make σ 2 an unbiased estimator. ˆ Usually, numerical procedures like Newton-Raphson algorithm are needed to iteratively estimate (λ, θ) that admit no closed-form solutions; see Fang, Li and Sudjianto (2006, §5). Such a standard maximum likelihood estimation method is computationally intensive. In the next subsection, we will discuss how to iteratively estimate the parameters by an eﬃcient backﬁtting algorithm.

62

MEV Kriging means the prediction based on the Gaussian process MEV model (3.15), where the naming of ‘Kriging’ follows the geostatistical convention; see Cressie (1993) in spatial statistics. See also Fang, Li and Sudjianto (2006) for the use of Kriging in the context of computer experiments. Kriging can be equally treated by either multivariate normal distribution theory or Bayesian approach, since the conditional expectation of multivariate normal distribution equals the Bayesian posterior mean. The former approach is taken here for making prediction from the noisy observations, for which the Kriging behaves as a smoother rather than an interpolator. Let us denote the prediction of η(ˆ(x)) at x = (m, t, v) by ζ(x) = µT b(x) + y E Zf (m) + Zg (t) + Zh (v) + ε z , where E[Z|z] denotes the conditional expectation of Z given z = y − Bµ. By the property of multivariate normal distribution,

−1

ˆ ζ(x) = µT b(x) + ξ T (x) Σ + I n

y − Bµ

(3.22)

where ξ n (x) denotes the vector of K(x, xjl ) pling point.

n×1

evaluated between x and each sam-

The Kriging (3.22) is known to be a best linear unbiased predictor. It has also a smoothing spline reformulation that covers the additive cubic smoothing splines model (3.6) as a special case. Formally, we have the following proposition. Proposition 3.2 (MEV Kriging as Spline Estimator). The MEV Kriging (3.22) can be formulated as a smoothing spline estimator

J Lj 2

arg min ζ∈H j=1 l=1

η(yjl ) − ζ(xjl )

+ ζ

2 H1

(3.23)

where H = H0 ⊕ H1 with H0 ⊆ span{1, m, t} and H1 is the reproducing kernel Hilbert space induced by (3.18) such that H0 ⊥ H1 .

63

The spline solution to (3.23) can be written as a ﬁnite-dimensional expression

J Lj

ζ(x) = µ b(x) + j=1 l=1

T

cjl K(x, xjl )

(3.24)

with coeﬃcients determined by the regularized least squares,

2

min

y − Bµ − Σc

+ c

Σ

.

(3.25)

Furthermore, we have the following simpliﬁcation. Proposition 3.3 (Additive Separability). The MEV Kriging (3.22) can be expressed additively as

I L J

ζ(x) = µ b(x) + i=1 T

αi Kf (m, mi ) + l=1 βl Kg (t, tl ) + j=1 γj Kg (v, vj )

(3.26)

based on I distinct m-values, L distinct t-values and J distinct v-values in (3.1). The coeﬃcients can be estimated by

2 2 Σf 2 Σg 2 Σh

min

y − Bµ − Σf α − Σg β − Σh γ

+ λf α

+ λg β

+ λh γ

, (3.27)

where the matrix notations are the same as in (3.9) up to mild modiﬁcations. Proposition 3.3 converts the MEV Kriging (3.22) to a kernelized additive model of the form (3.26). It connects the kernel methods in machine learning that advocates the use of covariance kernels for mapping the original inputs to some high-dimensional feature space. Based on our previous discussions, such kernelized feature space corresponds to the family of Gaussian processes, or the RKHS; see e.g. Rasmussen and Williams (2006) for more details. Proposition 3.3 breaks down the high-dimensional parameter estimation and it makes possible the development of a backﬁtting algorithm to deal with three uni64

variate Gaussian processes separately. Such backﬁtting procedure provides a more eﬃcient procedure than the standard method of maximum likelihood estimation. Besides, it can easily deal with the collinearity problem among Zf (m), Zg (t), Zh (v) caused by the hyperplane dependency of vintage diagram.

3.3.3

MEV Backﬁtting Algorithm

For given (λ, θ), the covariance kernels in (3.18) are deﬁned explicitly, so one may ﬁnd the optimal solution to (3.27) by setting partial derivatives to zero and solving the linear system of algebraic equations. There are however complications for determining (λ, θ). In this section we propose a backﬁtting algorithm to minimize the regularized least squares criterion (3.27), together with a generalized cross-validation (GCV) procedure for selecting the smoothing parameters λf , λg , λh , as well as the structural parameters θf , θg , θh used for deﬁning the covariance kernels. Here, the GCV criterion follows our discussion in Chapter 2.2.2; see also Golub, Heath and Wahba (1979) in the context of ridge regression. Denote the complete set of parameters by Ξ = {µ; α, λf , θf ; β, λg , θg ; γ, λh , θh ; σ 2 } where λ’s are smoothing parameter and θ’s are structural parameters for deﬁning the covariance kernels. Given the vintage data (3.1) with notations y, B in (3.6): Step 1: Set the initial value of µ to be the ordinary least squares (OLS) estimate (BT B)−1 BT y; Set the initial values of β, γ to be zero. Step 2a: Given Ξ with (α, λf , θf ) unknown, get the pseudo data y = y − Bµ − Σg β − Σh γ. For each input (λf , θf ), estimate α by ridge regression

T −1 T

α = Σf Σf + λf Σf

Σf y

(3.28)

where Σf = Kf (mjl , mi ; θf )

n×I

and Σf = Kf (mi , mj ; θf )

I×I

. Determine the

65

best choice of (λf , θf ) by minimizing 1 (I − S(λf , θf ))y n

T 2

GCV(λf , θf ) =

1−

T

1 Trace S(λf , θf ) n

2

,

(3.29)

−1

where S(λf , θf ) = Σf Σf Σf + λf Σf

Σf .

Step 2b: Run Step 2a with the unknown parameters replaced by (β, λg , θg ) and y = y − Bµ − Σf α − Σh γ. Form Σg , Σg and S(λg , θg ). Select (λg , θg ) by GCV, then estimate β by ridge regression. Step 2c: Run Step 2a with the unknown parameters replaced by (γ, λh , θh ) and y = y − Bµ − Σf α − Σg β. (For the ad hoc approach, remove the trend of y in the vintage direction.) Form Σh , Σh and S(λh , θh ). Select (λh , θh ) by GCV. Estimate γ by ridge regression. Step 2d: re-estimate the µ by GLS (3.20) for given (λf , λg , λh , θf , θg , θh ). Step 3: Repeat steps 2a–2d until convergence, say, when the estimates of µ, α, β, γ change less than a pre-speciﬁed tolerance. Obtain σ 2 by (3.21). ˆ For future reference, the above iterative procedures are named as the MEV Backﬁtting algorithm. It is eﬃcient and can automatically take into account selection of both smoothing and structural parameters. After obtaining the parameter estimates, we may reconstruct the vintage data by y = Bµ + Σf α + Σg β + Σh γ. The following are several remarks for practical implementation: 1. In order to capture a diversity of important features, a combination of diﬀerent covariance kernels can be included in the same MEV model. The exponential, Mat´rn and AdaSS r.k. families listed in (3.12), (3.13), (3.14) are of our primary e interest here, whereas other types of kernels can be readily employed. When the underlying function is heterogeneously smooth, e.g. involving jumps, one may choose the AdaSS kernel (3.14). 66

2. For the ridge regression (3.28), the matrix Σf contains row-wise replicates and can be reduced by weighted least squares, in which case the response vector y is reduced to the pointwise averages; see (2.4). The evaluation of the GCV criterion (3.29) can be modiﬁed accordingly; see (2.22). 3. When the mean function µ(m, t; v) is speciﬁed to be the intercept eﬀect, i.e. µ(m, t; v) = µ0 , one may replace µ0 by the mean of y in step 1 and step 2d. In ˆ ˆ this case, we may write the MEV Kriging as ζ(x) = f (m) + g (t) + h(v) with ˆ

I L J

ˆ f (m) = µ0 + ˆ i=1 αi Kf (m, mi ), g (t) = ˆ ˆ l=1 ˆ ˆ βl Kg (t, tl ), h(v) = j=1 γj Kh (v, vj ). ˆ

ˆ 4. In principal, both g (t) and h(v) in step 2b and step 2c have mean zero, but in ˆ practice, they may need the centering adjustment due to machine rounding.

3.4

Semiparametric Regression

Having discussed the MEV decomposition models based on Gaussian processes, we consider in this section the inclusion of covariates in the following two scenarios: 1. x(mjl , tjl ; vj ) ∈ Rp (or xjl in short) for a ﬁxed segment, where xjl denotes the covariates of j-th vintage at time tjl and age mjl = tjl − vj ; 2. x(mkjl , tkjl ; vkj ) ∈ Rp and zk ∈ Rq for multiple segments that share the same vintage diagram, where zk represents the static segmentation variables. Of our interest are the the semiparametric MEV regression models with both nonparametric f (m), g(t), h(v) and parametric covariate eﬀects. After a brief discussion of the ﬁrst situation with a single segment, we will spend more time in the second situation with multiple segments. For simplicity, the covariates xjl or xkjl are assumed to be deterministic, i.e. no error in variables.

67

3.4.1

Single Segment

Given a single segment with dual-time observations {y(mjl , tjl ; vj ), x(mjl , tjl ; vj )} for j = 1, . . . , J and l = 1, . . . , Lj , we consider the partial linear model that adds the linear covariate eﬀects to the MEV decomposition (3.2),

η(yjl ) = f (mjl ) + g(tjl ) + h(vj ) + π T xjl + ε = µT bjl + π T xjl + Zf (mjl ) + Zg (tjl ) + Zh (vj ) + ε

(3.30)

where xjl ≡ x(mjl , tjl ; vj ), (µ, π) are the parameters for the mean function, and f, g, h are nonparametric and modeled by Gaussian processes. The semiparametric regression model (3.30) corresponds to the linear mixed-eﬀect modeling, and the covariate eﬀects π can be simply estimated in the same way we estimated µ in (3.20). That is, conditional on Σ,

−1 −1 −1

(µ; π) = BT Σ + I

B

BT Σ + I

y.

(3.31)

. where B = [B. with X = [xjl ]n×p . This GLS estimator can be incorporated with .X] the established MEV Backﬁtting algorithm discussed in Section 3.3.3. Only step 1 and step 2d need modiﬁed to be: Step 1: Set (µ, π) initially to be the OLS estimate, i.e. (3.31) with zero Σ. Step 2d: re-estimate (µ; π) by (3.31) for given (λf , λg , λh , θf , θg , θh ). Then, the whole set of parameters in (3.30) can be iteratively estimated by the modiﬁed backﬁtting algorithm. There is a word of caution about multicollinearity, since the orthogonality condition in Proposition 3.2 may be not satisﬁed between the covariate space span{x} and the Gaussian processes. In this case, instead of starting from the fully nonparametric approach, we may assume the f, g, h components in (3.30) can be approximated by the kernel basis expansion through (3.26), then run 68

backﬁtting. Fortunately, the smoothing parameters λf , λg , λh may guard against possible multicollinearity and give the shrinkage estimate of the covariate eﬀects π. As a trade-oﬀ, part of Gaussian process components might be over-smoothed.

3.4.2

Multiple Segments

Suppose we are given K segments of data that share the same vintage diagram,

y(mkjl , tkjl ; vkj ), x(mkjl , tkjl ; vkj ), zk ,

(3.32)

for k = 1, . . . , K, j = 1, . . . , J and l = 1, . . . , Lj , where xkjl ∈ Rp and z ∈ Rq . Denote by N the total number of observations across K segments. There are various approaches to modeling cross-segment responses, e.g.

a. b. c.

η(ykjl ) = f (mkjl ) + g(tkjl ) + h(vkj ) + π T xkjl + ω T zk + ε η(ykjl ) = fk (mkjl ) + gk (tkjl ) + hk (vkj ) + π T xkjl + ε k

(k) (k)

(3.33)

(k) η(ykjl ) = uf f (mkjl ) + ug g(tkjl ) + uh h(vkj ) + π T xkjl + W (zk ) + ε. k

Here, the ﬁrst approach assumes that the segments all perform the same MEV components and covariate eﬀect π, and diﬀer only through ω T zk . The second approach takes the other extreme assumption such that each segment performs marginally diﬀerently in terms of f, g, h and covariate eﬀects, therefore equivalent to K separate single-segment models. Clearly, both of these approaches can be treated by the single-segment modeling technique discussed in the last section. Of our interest of study is the third approach that allows for multiple segments to have diﬀerent sensitivities uf , ug , uh ∈ R to the same underlying f (m), g(t), h(v). In another word, the functions f (m), g(t), h(v) represent the common factors across segments, while the coeﬃcients uf , ug , uh represent the idiosyncratic multipliers. Besides, it allows for the segment-speciﬁc add-on eﬀects W (zk ) and segment-speciﬁc 69

(k) (k) (k) (k) (k) (k)

covariate eﬀect π k . Thus, the model (c) is postulated as a balance between the over-stringent model (a) and the over-ﬂexible model (b). Diﬀerent strategies can be applied to model the segment-speciﬁc eﬀects W (zk ). Among others, we consider (c1) the linear modeling W (zk ) = ω T zk with parameter ω ∈ Rq and (c2) the Gaussian process modeling

EW (zk ) = 0,

EW (zk )W (zk ) = λW KW (zk , zk )

(3.34)

for the smoothing parameter λW > 0 and some pre-speciﬁced family of covariance kernels KW . For example, we may use the squared exponential covariance kernel in (3.12). The estimation of these two types of segment eﬀects are detailed as follows. Linear Segment Eﬀects. Given the multi-segment vintage data (3.32), consider

(k) (k)

η(ykjl ) = uf f (mkjl ) + u(k) g(tkjl ) + uh h(vkj ) + π T xkjl + ω T zk + ε g k

(3.35)

with basis approximation

I L J

f (m) = µ0 + i=1 αi Kf (m, mi ), g(t) = l=1 βl Kg (t, tl ), h(v) = j=1 γj Kg (v, vj )

through the exponential or Mat´rn kernels. (The smoothing spline kernels can be e also used after appropriate modiﬁcation of the mean function.) Using the matrix notations, the model takes the form

uf , Bµ + Σf α + ug , Σg β + uh , Σh γ + Xπ + Zω

where the underscored vectors and matrices are formed based on multi-segment observations, (uf , ug , uh ) are the extended vector of segment-wise multipliers, X has

70

K × p columns corresponding to the vectorized coeﬃcients π = (π 1 ; . . . ; π K ), and Z is as usual. When both the kernel parameters (λ, θ) for Kf , Kg , Kh and the (condensed K-vector) multipliers uf , ug , uh are ﬁxed, the regularized least squares can be used for parameter estimation, using the same penalty terms as in (3.27). Iterative procedures can be used to estimate all the parameters simultaneously, including {µ, α, β, γ, λ, θ} for the underlying f, g, h functions and uf , ug , uh , π, ω across segments. The following algorithm is constructed by utilizing the MEV Backﬁtting algorithm as an intermediate routine. Step 1: Set the initial values of (π, ω) to be the OLS estimates for ﬁtting y = Xπ + Zω + ε based on all N observations across K segments. Step 2: Run the MEV Backﬁtting algorithm for the pseudo data y − Xπ − Zω ˆ ˆ for obtaining the estimates of f (m), g (t), h(v) ˆ Step 3: Update (uf , ug , uh ) and (π, ω) by OLS estimates for ﬁtting

y = Fuf + Gug + Huh + Xπ + Zω + ε

where F, G, H are all N × K regression sub-matrices constructed based on ˆ ˆ f (mkjl ), g (tkjl ), h(vkj ), respectively. ˆ Step 4: Repeat step 2 and step 3 until convergence. ˆ ˆ Note that in Step 2, the common factors f (m), g (t) and h(v) are estimated ﬁrst ˆ by assuming uk = Euk = 1 for each of f, g, h. In Step 3, the segment-wise multipliers ˆ ˆ ˆ uk ’s are updated by regressing over f , g and h. In so doing, the concern of stability is one reason. Disregarding the segment variability in Step 2 is also reasonable, since f (m), g(t), h(v) are by deﬁnition the common factors in the overall sense. Then, including the segment variability in Step 3 improves both the estimation of (π, ω)

71

and the estimation of f, g, h in the next iteration of Step 2. Such ﬁtting procedure is practically interesting and computationally tractable. Spatial Segment Heterogeneity. We move to consider the segment eﬀect modeling by a Gaussian process

(k) (k)

η(ykjl ) = uf f (mkjl ) + u(k) g(tkjl ) + uh h(vkj ) + W (zk ) + π T xkjl + ε g k

(3.36)

where f, g, h are approximated in the same way as in (3.35) and W (zk ) is assumed to follow (3.34). The orthogonality constraint between W (z) and {f (m), g(t), h(v)} can be justiﬁed from a tensor-product space point of view, but the orthogonality is not guaranteed between W (zk ) and the span of xkjl . Despite of the or-

thogonality constraint, let us directly approximate W (z) by the basis expansion W (z) =

K k=1

ωk KW (z, zk ) associated with the unknown scale parameter (denoted

by θW ). Then, we have the following two iterative stages of model estimation. Stage 1: for given π k and uk ≡ 1, k = 1, . . . , K, estimate the common-factor functions f (m), g(t), h(v) and the spatial segment eﬀect W (zk ) in

I L

η(ykjl ) − π T xkjl = µ0 + k i=1 J

αi Kf (mkjl , mi ) + i=1 K

βi Kg (tkjl , ti )

+ i=1 γi Kg (vkj , vi ) + i=1 ωi KW (zk , zi ) + ε

by the regularized least squares

2

min

y − Bµ − Σf α − Σg β − Σh γ − ΣW ω +λf α

2 Σf

+ λg β

2 Σg

+ λh γ

2 Σh

+ λW ω

2 ΣW

,

(3.37)

where y is the pseudo response vector of η(ykjl ) − π T xkjl for all (k, j, l). The k regression sub-matrices B, Σf , Σg , Σh , ΣW are constructed accordingly, all hav72

ing N rows. The matrices used for regularization are of size I × I, L × L, J × J and K × K, respectively. Both the smoothing parameters λf , λg , λh , λW > 0 and the structural parameters θf , θg , θh , θW need to be determined. Stage 2: for given f, g, h, estimate the parameters (uf , ug , uh ) and π based on Cov[w] = σ 2 λW ΣW + I

y = Fuf + Gug + Huh + Xπ + w,

where ΣW = KW (zk , zk )

N ×N

is obtained by evaluating every pair of obser-

vations either within or between segments. The parameter estimation can be obtained by the generalized least squares. Obviously, the regularized least squares in Stage 1 extends the MEV Backﬁtting algorithm to entertain a fourth kernel component KW . Let us call the extended algorithm MEVS Backﬁtting, where the added letter “S” means the segment eﬀect. Iterate Stage 1 and Stage 2 until convergence, then we obtain the following list of eﬀect estimates (as a summary): ˆ 1. Common-factor maturation curve: f (m) = µ0 + ˆ 2. Common-factor exogenous inﬂuence: g (t) = ˆ ˆ 3. Common-factor vintage heterogeneity: h(v) =

L l=1 I i=1

αi Kf (m, mi ) ˆ

ˆ βl Kg (t, tl ) γj Kh (v, vj ) ˆ

J j=1

(k) (k) (k) ˆˆ ˆ 4. Idiosyncratic multipliers to (f , g , h): uf , ug , uh for k = 1, . . . , K

5. Segment-speciﬁc covariate eﬀects: π k , for k = 1, . . . , K 6. Spatial segment heterogeneity: W (z) =

K k=1

ωk KW (z, zk ). ˆ

3.5

Applications in Credit Risk Modeling

This section presents the applications of MEV decomposition methodology to the real data of (a) Moody’s speculative-grade corporate default rates shown in Ta73

ble 1.1, and (b) the tweaked sample of retail loan loss rates discussed in Chapter II. Both examples demonstrate the rising challenges for analysis of the credit risk data with dual-time coordinates, to which we make only an initial attempt based on the MEV Backﬁtting algorithm. Our focus is on understanding (or reconstruction) of the marginal eﬀects ﬁrst, then perform data smoothing on the dual-time domain. To test the MEV decomposition modeling, we begin with a simulation study based on the synthetic vintage data.

3.5.1

Simulation Study

Let us synthetize the year 2000 to 2008 monthly vintages according to the diagram shown in Figure 1.3. Assume each vintage is originated at the very beginning of the month, and its follow-up performance is observed at each month end. Let the horizontal observation window be left truncated from the beginning of 2005 (time 0) and right truncated by the end of 2008 (time 48). Vertically, all the vintages are observed up to 60 months-on-book. It thus ends with the rectangular diagram shown in Figure 3.1, consisting of vintages j = −60, . . . , −1 originated before 2005 and j = 0, 1, . . . , 47 originated after 2005. For each vintage j with origination time vj , the dual-time responses are simulated by ε ∼ N (0, σ 2 )

log(yjl ) = f0 (mjl ) + g0 (tjl ) + h0 (vj ) + ε,

(3.38)

where tjl runs through max{vj , 0} to 48 and mjl = tjl − vj . Regardless of the noise, the responses yjl is the product of three marginal eﬀects, exp(f0 (m)) · exp(g0 (t)) · exp(h0 (v)). In our simulation setting, the underlying truth exp(f0 (m)) takes the smooth form of inverse Gaussian hazard rate function (1.13) with parameter c = 6 and b = −0.02, the underlying exp(g0 (t)) is volatile with exponential growth and a jump at t = 23, and the underlying h0 (v) is assumed to be a dominating sine

74

wave within [0, 40] and relatively ﬂat elsewhere. See their plots in Figure 3.2 (top panel). Adding the noise with σ = 0.1, we obtain the synthetic vintage data, whose projection views in lifetime, calendar and vintage origination time are shown in the second panel of Figure 3.2. One may check the pointwise averages (or medians in the vintage box-plots) in each projection view, compare them with the underlying truths in order to see the contamination eﬀects by other marginals. This synthetic data can be used to demonstrate the ﬂexibility of MEV decomposition technique such that a combination of diﬀerent kernels can be employed. We manually choose the order-3/2 Mat´rn kernel (3.13) for f (m), the AdaSS r.k. (3.14) e for g(t), and the squared exponential kernel (3.12) for h(v), then run the MEV Backﬁtting algorithm based on the log-transformed data. Since our data has very few observations in both ends of the vintage direction, we performed binning for v ≥ 40, as well as binning for v ≤ 0 based on the prior knowledge that pre-2005 vintages can be treated as a single bucket. Then, the MEV Backtting algorithm outputs the estimated marginal eﬀects plotted in Figure 3.2 (bottom panel). They match the underlying truth except for slight boundary bias. The GCV-selected structural and smoothing parameters are tabulated in Table 3.1. In ﬁtting g(t) with AdaSS, we have speciﬁed the knots τk to handle the jump by diagnostic from the raw marginal plots; for simplicity we have enforced φk to be the same for regions without jump. It is interesting to check the ordering of cross-validated smoothing parameters. By (3.18), the smoothing parameters correspond to the reciprocal ratios of their variance scales to the nugget eﬀect σ 2 , therefore, the smaller the smoothing parameter is, the larger variation the corresponding Gaussian process holds. Thus, the ordering of cross-validated λf > λh > λg conﬁrms with the smoothness assumption for f0 , g0 , h0 in our simulation setting. Combining the three marginal estimates, and taking the inverse transform gives the ﬁtted rates; see Figure 3.3 for the projection views. The noise-removed recon-

75

Figure 3.2: Synthetic vintage data analysis: (top) underlying true marginal eﬀects; (2nd) simulation with noise; (3nd) MEV Backﬁtting algorithm upon convergence; (bottom) Estimation compared to the underlying truth.

76

Table 3.1: Synthetic vintage data analysis: MEV modeling exercise with GCVselected structural and smoothing parameters. MEV Kernel choice Structural parameter Smoothing parameter f (m) Mat´rn (κ = 3/2) e φ = 2.872 20.086 AdaSS r.k. Eq.(3.14) τk = 21, 25, 47 1.182 g(t) φk = 0.288, 8.650, 0.288 h(v) Squared exponential φ = 3.000 7.389

Figure 3.3: Synthetic vintage data analysis: (top) projection views of ﬁtted values; (bottom) data smoothing on vintage diagram.

77

struction can reveal the prominent marginal features. It is clear that the MEV decomposition modeling performs like a smoother on the dual-time domain. One may also compare the level plots of the raw data versus the reconstructed data in Figure 3.3 to see the dual-time smoothing eﬀect on the vintage diagram.

3.5.2

Corporate Default Rates

In the Figure 1.4 of Chapter 1 we have previewed the annual corporate default rates (in percentage) of Moody’s speculative-grade cohorts: 1970-2008. Each yearly cohort is regarded as a vintage. For vintages originated earlier than 1988, the default rates are observed up to 20 years maximum in lifetime; for vintages originated after 1988, the default rates are observed up to the end of year 2008. Such truncation in both lifetime and calendar time corresponds to the trapezoidal prototype of vintage diagram illustrated in Figure 3.1. Figure 1.4 gives the projection views of the empirical default rates in lifetime, calendar and vintage origination time. By checking the marginal averages or medians plotted on top of each view, it is evident that exogenous inﬂuence by macroeconomic environment is more ﬂuctuating than both the maturation and vintage eﬀects. However, such marginal plots are contaminated by each other; see e.g. the many spikes shown in the lifetime projection actually correspond to the exogenous eﬀects. It is our purpose to separate these marginal eﬀects. We made slight data preprocessing. First, the transform function is pre-speciﬁed to be η(y) = log(y/100) where y is the raw percentage measurements and zero responses are treated as missing values (one may also match them to a small positive value). Second, since there are very limited number of observations for small calendar time, the exogenous eﬀects g(t) for t = 2, 3 are merged to t = 4, and the leftmost observation at t = 1 is removed because of its jumpy behavior. Besides, the vintage eﬀects are merged for large v = 30, . . . , 39.

78

Figure 3.4: Results of MEV Backﬁtting algorithm upon convergence (right panel) based on the squared exponential kernels. Shown in the left panel is the GCV selection of smoothing and structural parameters.

Figure 3.5: MEV ﬁtted values: Moody’s-rated Corporate Default Rates

79

To run the MEV Backﬁtting algorithm, Kf , Kg , Kh are all chosen to be the squared exponential covariance kernels, for which the mean function is ﬁxed to be the intercept eﬀect. For each univariate Gaussian process ﬁtting, say Zf (m), the GCV criterion selects the smoothing parameter λf , as well as the structural parameter θf (i.e. the scale parameter φ in (3.12)). For demonstration, we used the grid search on the log scale of λ with only 3 diﬀerent choices of φ = 1, 2, 3 (φ = 3 is set to be the largest scale for stability, otherwise the reciprocal conditional number of the ridged regression matrix would vanish). The backﬁtting results upon convergence are shown in Figure 3.4, where GCV selects the largest scale parameter in each iteration. An interesting result is the GCV-selected smoothing parameter λ, which turns out to be 9.025, 0.368, 6.050 for Zf , Zg , Zh , respectively. By (3.18), it is implied that the exogenous inﬂuence shares the largest variation, followed by the vintage heterogeneity, while the maturation curve has the least variation. Furthermore, the changes of estimated exogenous eﬀects in Figure 3.4 follow the NBER recession dates shown in Figure 1.1. The inverse transform η −1 (·) maps the ﬁtted values to the original percentage scale. Figure 3.5 plots the ﬁtted values in both lifetime and calendar views. Compared to the raw plots in Figure 1.4, the MEV modeling could retain some of most interesting features, while smoothing out others.

3.5.3

Retail Loan Loss Rates

Recall the tweaked credit-risk sample discussed in Chapter II. It was used as a motivating example for the development of adaptive smoothing spline in ﬁtting the pointwise averages that suppose to smooth out in lifetime (month-on-book, in this case). However, the raw responses behave rather abnormally in large month-on-book, partly because of the small size of replicates, and partly because of the contamination by the exogenous macroeconomic inﬂuence. In this section, the same tweaked sample

80

is analyzed on the dual-time domain, for which we model not only the maturation curve, but also the exogenous inﬂuence and the vintage heterogeneity eﬀects. The tweaked sample of dual-time observations are supported on the triangular vintage diagram illustrated in Figure 3.1, which is mostly typical in retail risk management. They measure the loss rates in retail revolving exposures from January 2000 to December 2006. Figure 3.6 (top panel) shows the projection views in lifetime, calendar and vintage origination time, where the new vintages originated in 2006 are excluded since they have too short window of performance measurements. Some of the interesting features for such vintage data are listed below: 1. The rates are ﬁxed to be zero for small month-on-book m ≤ 4. 2. There are less and less observations when the month-on-book m increases. 3. There are less and less observations when the calendar month t decreases. 4. The rates behave relatively smooth in m, from the lifetime view. 5. The rates behave rather dynamic and volatile in t, from the calendar view. 6. The rates show heterogeneity across vintages, from the vintage view. To model the loss rates, timescale binding was performed ﬁrst for large monthon-book m ≥ 72 and small calendar time t ≤ 12. We pre-speciﬁed the log transform for the positive responses (upon removal of zero responses). The MEV decomposition model takes the form of log(yjl ) = f (mjl ) + g(tjl ) + h(vj ) + ε, subject to Eg = Eh = 0. In this way, the original responses are decomposed multiplicatively to be ef (m) , eg(t) and eh(v) , where the exponentiated maturation curve could be regarded as the baseline, together with the exponentiated exogenous and vintage heterogeneity multipliers. In this MEV modeling exercise, the Mat´rn kernels were chosen to run the MEV e Backﬁtting algorithm, and the GCV criterion was used to select the smoothing and structural parameters. The ﬁtting results are presented in Figure 3.6 (bottom panel), 81

Figure 3.6: Vintage data analysis of retail loan loss rates: (top) projection views of emprical loss rates in lifetime m, calendar t and vintage origination time ˆ ˆ v; (bottom) MEV decomposition eﬀects f (m), g (t) and h(v) (at log scale). ˆ ˆ ˆ which shows the estimated f (m), g (t), h(v) in the log response scale. Each of these esˆ timated functions follows the general dynamic pattern of the corresponding marginal averages in the empirical plots above. The diﬀerences in local regions reﬂect the improvements in each marginal direction after removing the contamination of other marginals. Compared to both the exogenous and vintage heterogeneity eﬀects, the maturation curve is rather smooth. It is also worth pointing out that the exogenous spike could be captured relatively well by the Mat´rn kernel. e

3.6

Discussion

For the vintage data that emerge in credit risk management, we propose an MEV decomposition framework to estimate the maturation curve, exogenous inﬂuence and

82

vintage heterogeneity on the dual-time domain. One diﬃculty associated with the vintage data is the intrinsic identiﬁcation problem due to their hyperplane nature, which also appears in the conventional cohort analysis under the age-period-cohort model. To regularize the three-way marginal eﬀects, we have studied nonparametric smoothing based on Gaussian processes, and demonstrated its ﬂexibility through choosing covariance kernels, structural and smoothing parameters. An eﬃcient MEV Backﬁtting algorithm is provided for model estimation. The new technique is then tested through a simulation study and applied to analyze both examples of corporate default rates and retail loss rates, for which some preliminary results are presented. Beyond our initial attempt made in this chapter, there are other open problems associated with the vintage data analysis (VDA) worthy of future investigation. 1. The dual-time model assessment and validation remains an issue. In the MEV Backﬁtting algorithm we have used the leave-one-out cross-validation for ﬁtting each marginal component function. It is however not directly performing crossvalidation on the dual-time domain, which would be an interesting subject of future study. 2. The model forecasting is often a practical need in ﬁnancial risk management, as it is concerning about the future uncertainty. To make prediction within the MEV modeling framework, one needs to extrapolate f (m), g(t), h(v) in each marginal direction in order to know about their behaviors out of the training bounds of m, t, v. By Kriging theory, for any x = (m, t; v) on the vintage diagram, the formula (3.22) provides the conditional expectation given the historical performance. Furthermore, it is straightforward to simulate the random paths for a sequence of inputs (x1 , . . . , xp ) based on the multivariate conditional normal distribution Np (µ(c) , Σ(c) ), where the conditional mean and covariance

83

are given by (c) µ = Bµ + Υ Σ + I −1 y − Bµ Σ(c) = Σ − Υ Σ + I −1 ΥT Υ = K(x , xjl ) i p×n with Σ = K(x , x ) i j p×p

(3.39)

using the aggregated kernel given by (3.18) and the notations in (3.19). However, there is a word of caution about making prediction of the future from the historical training data. In the MEV modeling framework, extrapolating the maturation curve f (m) in lifetime is relatively safer than extrapolating the exogenous and vintage eﬀects g(t) and h(v), since f (m) is of self-maturation nature, while g(t) can be interfered by future macroeconomic changes and h(v) is also subject to the future changes of vintage origination policy. Before making forecasts in terms of calendar time, the out-of-sample test is recommended. 3. For extrapolation of MEV component functions, parametric modeling approach is sometimes more tractable than using Gaussian processes. For example, if given the prior knowledge that the maturation curve f (m) follows the shape of the inverse Gaussian hazard rate (1.13), like our synthetic case in Section 3.5, one may estimate the parametric f (m) with nonlinear regression technique. As for the exogenous eﬀect g(t) that is usually dynamic and volatile, one may use parametric time series models; see e.g. Fan and Yao (2003). Such parametric approach would beneﬁt the loss forecasting as they become more and more validated through experiences. 4. The MEV decomposition we have considered so far is simpliﬁed to exclude the interaction eﬀects involving two or all of age m, time t and vintage v arguments. In formulating the models by Gaussian processes, the mutual independence among Zf (m), Zg (t), Zh (v) is also postulated. One of the ideas for taking into account the interaction eﬀects is to cross-correlate Zf (m), Zg (t), Zh (v). Consider

84

the combined Gaussian process (3.17) with covariance

Cov[Zf (m) + Zg (t) + Zh (v), Zf (m ) + Zg (t ) + Zh (v )] = EZf (m)Zf (m ) + EZf (m)Zg (t ) + EZf (m)Zh (v ) + EZg (t)Zf (m ) + EZg (t)Zg (t ) + EZg (t)Zh (v ) + EZh (v)Zf (m ) + EZh (v)Zg (t ) + EZh (v)Zh (v ),

which reduces to (3.18) when the cross-correlations between Zf (m), Zg (t) and Zh (v) are all zero. In general, it is straightforward to incorporate the crosscorrelation structure and follow immediately the MEV Kriging procedure. For example, one of our works in process is to assume that σ2 Kgh (t, v), λg λh

EZg (t)Zh (v) =

where Kgh (t, v) = ρI(t ≥ v), ρ ≥ 0

in order to study the interaction eﬀect between the vintage heterogeneity and the exogenous inﬂuence. Results will be presented by a future report. Besides the open problems listed above, the scope of VDA can be also generalized to study non-continuous types of observations. For examples we considered in Section 3.5, the Moody’s default rates were actually calculated from some raw binary indicators (of default or not), while the loss rates in the synthetic retail data can be viewed as the ratio of loss units over the total units. Thus, if we are given the raw binary observations or unit counts, one may consider the generalized MEV models upon the speciﬁcation of a link function, as an analogy to the generalized linear models (Mccullagh and Nelder, 1989). The transform η in (3.2) plays a similar role as such a link function, except for that it is functioning on the rates that are assumed to be directly observed. One more extension of VDA is to analyze the time-to-default data on the vintage

85

diagram, which is deferred to the next chapter where we will develop the dual-time survival analysis.

3.7

Technical Proofs

Proof of Lemma 3.1: By that m − t + v = 0 on the vintage diagram Ω, at least one of the linear parts f0 , g0 , h0 is not identiﬁable, since otherwise f0 (m) − m, g0 (t) + t, h0 (v) − v would always satisfy (3.2). Proof of Proposition 3.2: By applying the general spline theorem to (3.23), the target function has a ﬁnite-dimensional representation ζ(x) = µ0 + µ1 m + µ2 t + cT ξ n , where cT ξ n =

J j=1 Lj l=1 cjl K(x, xjl ).

Substitute it back into (3.23) and use the

H1

reproducing property K(·, xjl ), K(·, xj l )

= K(xjl , xj l ), then we get

2

min

y − Bµ − Σc

+ c

Σ

using the notations in (3.19). Taking the partial derivatives w.r.t. µ and c and setting them to zero gives the solution

−1

µ = c =

BT (Σ + I)−1 B Σ+I

−1

BT (Σ + I)−1 y

(y − Bµ),

which leads ζ(x) to have the same form as the Kriging predictor (3.22). Proof of Proposition 3.3: By (3.18) and comparing (3.24) and (3.26), it is

not hard to derive the relationships between two sets of kernel coeﬃcients,

Lj

αi =

λ−1 f j,l cjl I(mjl = mi ), βl =

λ−1 g j,l cjl I(tjl = tl ), γj =

λ−1 h l=1 cjl

(3.40)

that aggregate cjl for the replicated kernel bases for every mi , tl , vj , respectively. 86

Then, using the new vectors of coeﬃcients α = (α1 , . . . , αI )T , β = (β1 , . . . , βL )T and γ = (γ1 , . . . , γJ )T , we have that

Σc = Σf α + Σg β + Σh γ cT Σc = λf αT Σf α + λg β T Σg β + λh γ T Σh γ.

Plugging them into (3.25) leads to (3.27), as asserted.

87

CHAPTER IV

Dual-time Survival Analysis

4.1

Introduction

Consider in this chapter the default events naturally observed on the dual-time domain, lifetime (or, age) in one dimension and calendar time in the other. Of our interest is the time-to-default from multiple origins, such that the events happening at the same calendar time may diﬀer in age. Age could be the lifetime of a person in a common sense, and it could also refer to the elapsed time since initiation of a treatment. In credit risk modeling, age refers to the duration of a credit account since its origination (either a corporate bond or a retail loan). For many accounts having the same origin, they share the same age at any follow-up calendar time, and group as a single vintage. Refer to Chapter I about basics of survival analysis in lifetime scale, where the notion of hazard rate is also referred to be the default intensity in credit risk context. As introduced in Chapter 1.4, dual-time observations of survival data subject to truncation and censoring can be graphically represented by the Lexis diagram (Keiding, 1990). To begin with, let us simulate a Lexis diagram of dual-time-to-default observations: (Vj , Uji , Mji , ∆ji ), i = 1, . . . , nj , j = 1, . . . , J (4.1)

88

where Vj denotes the origination time of the jth vintage that consists of nj accounts. For each account, Uji denotes the time of left truncation, Mji denotes the lifetime of event termination, and ∆ji indicates the status of termination such that ∆ji = 1 if the default event occurs prior to getting censored and 0 otherwise. Let us ﬁx the rectangular vintage diagram with age m ∈ [0, 60], calendar time t ∈ [0, 48] and vintage origination Vj = −60, . . . , −1, 0, 1, . . . , 47, same as the synthetic vintage data in Chapter 3.5. So, the observations are left truncated at time 0. For the censoring mechanism, besides the systematic censoring with mmax = 60 and tmax = 48, an independent random censoring with the constant hazard rate λatt is used to represent the account attrition (i.e., a memoryless exponential lifetime distribution). Denote by λv (m, t) or λj (m, t) the underlying dual-time hazard rate of vintage v = Vj at lifetime m and calendar time t. Then, one may simulate the dual-time-todefault data (4.1) according to: m τji = inf m :

0

λj (s, Vj + s)ds ≥ − log X, X ∼ Unif 0, 1

(default)

Cji = inf m : λatt m ≥ − log X, X ∼ Unif 0, 1] (attrition) ¯ ¯ Uji = max{Uji , Vj } − Vj , with Uji ≡ 0 (left truncation) Book (Vj , Uji , Mji , ∆ji ) if Uji < min{τji , Cji } : Mji = min{τji , Cji , mmax , tmax − Vj } (termination time) ∆ji = I τji ≤ min{Cji , mmax , tmax − Vj } (default indicator) (4.2)

for each given Vj = −60, . . . , −1, 0, 1, . . . , 47. For each vintage j, one may book nj number of origination accounts by repeatedly running (4.2) nj times independently. Let us begin with nj ≡ 1000 and specify the attrition rate λatt = 50bp (a basis point is 1%). Among other types of dual-time default intensities, assume the multiplicative form λv (m, t) = λf (m)λg (t)λh (v) = exp f (m) + g(t) + h(v) (4.3)

89

Figure 4.1: Dual-time-to-default: (left) Lexis diagram of sub-sampled simulations; (right) empirical hazard rates in either lifetime or calendar time. where the maturation curve f (m), the exogenous inﬂuence g(t) and the unobserved heterogeneity h(v) are interpreted in the sense of Chapter III and they are assumed to follow Figure 3.2 (top panel). For simulating the left-truncated survival accounts, we assume the zero exogenous inﬂuence on their historical hazards. Figure 4.1 (left) is an illustration of the Lexis diagram for one random simulation per vintage, where each asterisk at the end of a line segment denotes a default event and the circle denotes the censorship. To see default behavior in dual-time coordinates, we calculated the empirical hazard rates by (4.14) in lifetime and (4.15) in calendar time, respectively, and the results are plotted in Figure 4.1 (right). Each marginal view of the hazard rates shows clearly the pattern matching our underlying assumption, namely 1. The endogenous (or maturation) performance in lifetime is relatively smooth; 2. The exogenous performance in calendar time is relatively dynamic and volatile.

90

This is typically the case in credit risk modeling, where the default risk is greatly aﬀected by macroeconomic conditions. It is our purpose to develop the dual-time survival analysis in not only lifetime, but also simultaneously in calendar time. Despite that the Lexis diagram has appeared since Lexis (1875), statistical literature of survival analysis has dominantly focused on the lifetime only rather than the dual-time scales; see e.g. Andersen, et al. (1993) with nine to one chapter coverage. In another word, lifetime has been taken for granted as the basic timescale of survival, with very few exceptions that take either calendar time as the primary scale (e.g. Arjas (1986)) or both time scales (see the survey by Keiding (1990)). There is also discussion on selection of the appropriate timescale in the context of multiple timescale reduction; see e.g. Farewell and Cox (1979) about a rotation technique based on a naive Cox proportional hazards (CoxPH) model involving only the linearized time-covariate eﬀect. Such timescale dimension reduction approach can be also referred to Oakes (1995) and Duchesne and Lawless (2000), given the multiple dimensions that may include usage scales like mileage of a vehicle. The literature of regression models based on dual-time-to-default data (a.k.a. survival data with staggered entry) has also concentrated on the use of lifetime as the basic timescale together with time-dependent covariates. Mostly used is the CoxPH regression model with an arbitrary baseline hazard function in lifetime. Some weak convergence results can be referred to Sellke and Siegmund (1983) by martingale approximation and Billias, Gu and Ying (1997) by empirical process theory; see also the related works of Slud (1984) and Gu and Lai (1991) on two-sample tests. However, such one-way baseline model speciﬁcation is unable to capture the baseline hazard in calendar time. Then, it comes to Efron (2002) with symmetric treatment of dual timescales under a two-way proportional hazards model

λji (m, t) = λf (m)λg (t) exp{θ T zji (m)}, i = 1, . . . , nj , j = 1, . . . , J.

(4.4)

91

Note that Efron (2002) considered only the special case of (4.4) with cubic polynomial expansion for log λf and log λg , as well as the time-independent covariates. In general, we may allow for arbitrary (non-negative) λf (m) and λg (t), hence name (4.4) to be a two-way Cox regression model. In credit risk, there seems to be no formal literature on dual-time-to-default risk modeling. It is our objective of this chapter to develop the dual-time survival analysis (DtSA) for credit risk modeling, including the methods of nonparametric estimation, structural parameterization, intensity-based semiparametric regression, and frailty speciﬁcation accounting for vintage heterogeneity eﬀects. The aforementioned dualtime Cox model (4.4) will be one of such developments, whereas the Efron’s approach with polynomial baselines will fail to capture the rather diﬀerent behaviors of endogenous and exogenous hazards illustrated in Figure 4.1. This chapter is organized as follows. In Section 4.2 we start with the generic form of likelihood function based on the dual-time-to-default data (4.1), then consider the one-way, two-way and three-way nonparametric estimators on Lexis diagram. The simulation data described above will be used to assess the performance of nonparametric estimators. In Section 4.3 we discuss the dual-time feasibility of structural models based on the ﬁrst-passage-time triggering system, which is deﬁned by an endogenous distance-to-default process associated with the exogenous time-transformation. Section 4.4 is devoted to the development of dual-time semiparametric Cox regression with both endogenous and exogenous baseline hazards and covariate eﬀects, for which the method of partial likelihood estimation plays a key role. Also discussed is the frailty type of vintage heterogeneity by a random-eﬀect formulation. In Section 4.5, we demonstrate the applications of both nonparametric and semiparametric dualtime survival analysis to credit card and mortgage risk modeling, based on our real data-analytic experiences in retail banking. We conclude the chapter in Section 4.6 and provide some supplementary materials at the end.

92

4.2

Nonparametric Methods

Denote by λji (m) ≡ λji (m, t) with t ≡ Vj + m the hazard rates of lifetime-todefault τji on the Lexis diagram. Given the dual-time-to-default data (4.1) with independent left-truncation and right-censoring, consider the joint likelihood under either continuous or discrete setting:

J nj

j=1 i=1

pji (Mji ) + Sji (Uji )

∆ji

+ Sji (Mji ) + Sji (Uji )

1−∆ji

(4.5)

where pji (m) denotes the density and Sji (m) denotes the survival function of τji . On the continuous domain, m+ = m, Sji (m) = e−Λji (m) with the cumulative hazard Λji (m) = m 0

λji (s)ds, then the log-likelihood function has the generic form

J nj

= j=1 i=1 J nj

∆ji log λji (Mji ) + log Sji (Mji ) − log Sji (Uji )

Mji

= j=1 i=1

∆ji log λji (Mji ) −

Uji

λji (m)dm .

(4.6)

On the discrete domain of m, Sji (m) = (4.5) can be rewritten as

J nj

s 0. Then, when c gradually approaches zero, the hazard

rate would be shaped from (a) monotonic increasing to (b) ﬁrst-increasingthen-decreasing and ﬁnally to (c) monotonic decreasing. See Figure 1.2 for the illustration with ﬁxed b and varying c values. 2. The b-parameter represents the trend of DD process, and it determines the level of stabilized hazard rate as m → ∞. In Aalen, et al. (2008; §10), let φm (x) be the conditional density of X(m) given that min X(s) > 0 and it is said to

0 0 in calendar time. We set the constraint E log ψ = 0 on the exogenous eﬀect, in the sense of letting the ‘average’ DD process be captured endogenously. Such timetransformation approach can be viewed as a natural generalization of the log-locationscale transformation (1.16) with correspondence σi = 1 and ψ(·) = e−µi . It has been used to extend the accelerated failure time (AFT) models with time-varying covariates; see e.g. Lawless (2003; §6.4.3). The exogenous function ψ(t) rescales the original lifetime increment ∆m to be ψ(Vj + m)∆m, and therefore transforms the diﬀusion dXj (m) = bj dm + dWj (m) to

∗ dXj (m) = bj ψ(Vj + m)dm +

ψ(Vj + m)dWj (m)

(4.29)

w.r.t. the zero default boundary. Alternatively, one may rewrite it as

∗ dXj (m) =

ψ(Vj + m)dXj (m) − bj

ψ(Vj + m) − ψ(Vj + m) dm.

(4.30)

Let us introduce a new process Xj (m) such that dXj (m) = introduce a time-varying default boundary m ψ(Vj + m)dXj (m), and

Dj (m) = bj

0

ψ(Vj + s) − ψ(Vj + s) ds.

106

∗ Then, by (4.30), it is clear that the time-to-default τj∗ = inf{m ≥ 0 : Xj (m) ≤ 0} =

inf{m ≥ 0 : X(m) ≤ Dj (m)}. By time-transformation (4.28), it is straightforward to obtain the survival function of the ﬁrst-passage-time τj∗ by

∗ ∗ Sj (m) = P Xj (s) > 0 : s ∈ [0, m] = P Xj (s) > 0 : s ∈ [0, Ψj (m)] m

= Sj (Ψj (m)) = Sj

0

ψ(Vj + s)ds

(4.31)

where the benchmark survival Sj (·) is given by the denominator of (4.26). Then, the hazard rate of τj∗ is obtained by λ∗ (m) = j

∗ − log Sj (m) − log Sj (Ψj (m)) dΨj (m) = = λj (Ψj (m))Ψj (m) dm dΨj (m) dm m

= λj

0

ψ(Vj + s)ds ψ(Vj + m)

(4.32)

with λj (·) given in (4.26). From (4.32), the exogenous function ψ(t) not only maps the benchmark hazard rate to the rescaled time Ψj (m), but also scales it by ψ(t). This is the same as the AFT regression model (1.21). Now consider the parameter estimation of both vintage-speciﬁc structural parameters (cj , bj ) and the exogenous time-scaling function ψ(t) from the dual-time-to-default data (4.1). By (4.6), the joint log-likelihood is given by =

J j=1 j ,

with

j

given

∗ by (4.27) based on λ∗ (m; cj , bj ) and Sj (m; cj , bj ). The complication lies in the arbij

trary formation of ψ(t) involved in the time-transformation function. Nonparametric techniques can be used if extra properties like the smoothness can be imposed onto the function ψ(t). In our case the exogenous time-scaling eﬀects are often dynamic and volatile, for which we recommend a piecewise estimation procedure based on the discretized Lexis diagram. 1. Discretize the Lexis diagram with ∆m = ∆t such that t = tmin + l∆t for l =

107

0, 1, . . . , L, where tmin is set to be the left boundary and tmin + L∆t corresponds to the right boundary. Specify ψ(t) to be piecewise constant

ψ(t) = ψl , for t − tmin ∈ (l − 1)∆t, l∆t , l = 1, 2, . . . , L

and ψ(t) ≡ 1 for t ≤ tmin . Then, the time-transformation for the vintage with origination time Vj can be expressed as

(Vj +m−tmin )/∆t

Ψj (m) = l=(Vj −tmin )/∆t

ψ(tmin + l∆t)∆t, for Vj + m ≥ tmin

which reduces to tmin − Vj +

(Vj +m−tmin )/∆t l=1

ψl ∆t in the case of left truncation.

2. For each of l = 1, 2, . . ., get the corresponding likelihood contribution:

[l]

=

(j,i)∈G[l]

∆ji log λj (Ψj (Mji )) + log ψl + log Sj (Ψj (Mji )) − log Sj (Ψj (Uji ))

where G[l] = {(j, i) : Vj + Mji = tmin + l∆t}, and Ψj (Uji ) = max{0, tmin − Vj } if tmin is the systematic left truncation in calendar time. 3. Run nonlinear optimization for estimating (ci , bi ) and ψl iteratively: (a) For ﬁxed ψ and each ﬁxed vintage j = 1, . . . , J, obtain the MLE (cj , bj ) by maximizing (4.27) with Mji replaced by Ψj (Mji ). (b) For all ﬁxed (cj , bj ), obtain the MLE of (ψ1 , . . . , ψL ) by maximizing subject to the constraint

L l=1 L l=1 [l]

log ψl = 0.

In Step (3.b), one may perform further binning for (ψ1 , . . . , ψL ) for the purpose of reducing the dimension of parameter space in constrained optimization. One may also apply the multi-resolution wavelet bases to represent {ψ1 , . . . , ψL ) in a time-frequency perspective. Such ideas will be investigated in our future work. 108

4.3.2

Incorporation of Covariate Eﬀects

It is straightforward to extend the above structural parameterization to incorpo˜ rate the subject-speciﬁc covariates. For each account i with origination Vj , let zji be the static covariates observed upon origination (including the intercept and otherwise centered predictors) and zji (m) for m ∈ (Uji , Mji ] be the dynamic covariates observed since left truncation. Let us specify

˜ cji = exp{η T zji } m (4.33) exp{θT zji (s)}ds, m ≥ Uji

Uji

Ψji (m) = Uji − Vj +

(4.34)

subject to the unknown parameters (η, θ), and suppose the subject-speciﬁc DD process ∗ X (m) = Xji (Ψji (m)), ji X (m) = c + b m + W (m) ji ji j ji

(4.35)

with an independent Wiener process Wji (m), the initial value cji and the vintage∗ speciﬁc trend of credit deterioration bj . Then, the default occurs at τji = inf{m > ∗ 0 : Xji (m) ≤ 0}. Note that when the covariates in (4.34) are not time-varying, we ∗ may specify Ψji (m) = m exp{θT zji } and obtain the AFT regression model log τji =

θT zji + log τji with inverse Gaussian distributed τji . In the presence of calendar-time state variables x(t) such that zji (m) = x(Vj +m) for all (j, i), the time-transformation (4.34) becomes a parametric version of (4.28). By the same arguments as in (4.31) and (4.32), we obtain the hazard rate λ∗ (m) = ji λji (Ψji (m))Ψji (m), or cji + bj Ψji (m) exp − 2Ψji (m) 2πΨ3 (m) ji cji Φ cji + bj Ψji (m) Ψji (m) −e

−2bj cji 2

Ψji (m) (4.36)

λ∗ (m) = ji

Φ

−cji + bj Ψji (m) Ψji (m)

109

∗ and the survival function Sji (m) = Sji (Ψji (m)) given by the denominator in (4.36).

The parameters of interest include (η, θ) and bj for j = 1, . . . , J, and they can be estimated by MLE. Given the dual-time-to-default data (4.1), the log-likelihood function follows the generic form (4.6). Standard nonlinear optimization programs can be employed, e.g. the “optim” algorithm of R (stats-library) and the “fminunc” algorithm of MATLAB (Optimization Toolbox). Remarks: 1. There is no unique way to incorporate the static covariates in the ﬁrst-passagetime parameterization. For example, we considered including them by the initial distance-to-default parameter (4.33), while they may also be incorporated into other structural parameters; see Lee and Whitmore (2006). However, there has been limited discussion on incorporation of dynamic covariates in the ﬁrst-passage-time approach, for which we make an attempt to use the time-transformation technique (4.28) under nonparametric setting and (4.34) under parameter setting. 2. The endogenous hazard rate (4.26) is based on the ﬁxed eﬀects of initial distanceto-default and trend of deterioration, while these structural parameters can be random eﬀects due to the incomplete information available (Giesecke, 2006). One may refer to Aalen, et al. (2008; §10) about the random-eﬀect extension. For example, when the trend parameter b ∼ N (µ, σ 2 ) and the initial value c is ﬁxed, the endogenous hazard rate is given by p(m)/S(m) with (c + µt)2 exp − p(m) = 2(t + σ 2 t2 ) 2π(m3 + σ 2 m4 ) c + µm −c + µm − 2σ 2 cm 2 2 √ S(m) = Φ √ − e−2µc+2σ c Φ m + σ 2 m2 m + σ 2 m2 c

.

3. In the ﬁrst-passage-time parameterization with inverse Gaussian lifetime distri110

bution, the log-likelihood functions (4.27) and (4.6) with the plugged-in (4.26) or (4.36) are rather messy, especially when we attempt to evaluate the derivatives of the log-likelihood function in order to run gradient search; see the supplementary material. Without gradient provided, one may rely on the crude nonlinear optimization algorithms, and the parameter estimation follows the standard MLE procedure. Our discussion in this section mainly illustrates the feasibility of dual-time structural approach.

4.4

Dual-time Cox Regression

The semiparametric Cox models are popular for their eﬀectiveness of estimating the covariate eﬀects. Suppose that we are given the dual-time-to-default data (4.1) together with the covariates zji (m) observed for m ∈ (Uji , Mji ], which may include ˜ ˜ the static covariates zji (m) ≡ zji . Using the traditional CoxPH modeling approach, one may either adopt an arbitrary baseline in lifetime

λji (m) = λ0 (m) exp{θ T zji (m)},

(4.37)

or adopt the stratiﬁed Cox model with arbitrary vintage-speciﬁc baselines

λji (m) = λj0 (m) exp{θ T zji (m)}

(4.38)

for i = 1, . . . , nj and j = 1, . . . , L. However, these two Cox models may not be suitable for (4.1) observed on the Lexis diagram in the following sense: 1. (4.37) is too stringent to capture the variation of vitnage-speciﬁc baselines; 2. (4.38) is too relaxed to capture the interrelation of vintage-speciﬁc baselines. For example, we have simulated a Lexis diagram in Figure 4.1 whose vintage-speciﬁc hazards are interrelated not only in lifetime m but also in calendar time t. 111

4.4.1

Dual-time Cox Models

The dual-time Cox models considered in this section take the following multiplicative hazards form in general,

λji (m, t) = λ0 (m, t) exp{θ T zji (m)}

(4.39)

where λ0 (m, t) is a bivariate base hazard function on Lexis diagram. Upon the speciﬁcation of λ0 (m, t), the traditional models (4.37) and (4.38) can be included as special cases. In order to balance between (4.37) and (4.38), we assume the MEV decomposition framework of the base hazard λ0 (m, t) = λf (m)λg (t)λh (t − v), i.e. the nonparametric model (4.21). Thus, we restrict ourselves to the dual-time Cox models with MEV baselines and relative-risk multipliers

λji (m, t) = λf (m)λg (t)λh (Vj ) exp{θ T zji (m)} = exp{f (m) + g(t) + h(Vj ) + θ T zji (m)}

(4.40)

where λf (m) = exp{f (m)} represents the arbitrary maturation/endogenous baseline, λg (t) = exp{g(t)} and λh (v) = exp{h(v)} represent the exogenous multiplier and the vintage heterogeneity subject to Eg = Eh = 0. Some necessary binning or trendremoval need to be imposed on g, h to ensure the model estimability; see Lemma 3.1. Given the ﬁnite-sample observations (4.1) on Lexis diagram, one may always ﬁnd Lm distinct lifetime points m[1] < · · · < m[Lm ] and L distinct calendar time points t[1] < · · · < t[Lt ] such that there exist at least one default event observed at the corresponding marginal time. Following the least-information assumption of Breslow (1972), let us treat the endogenous baseline hazard λf (m) and the exogenous baseline hazard λg (t) as pointwise functions of the form (4.17) subject to

Lt l=1

βl = 0. For the

discrete set of vintage originations, the vintage heterogeneity eﬀects can be regarded

112

as the categorical covariates eﬀects, essentially. We may either assume the pointwise function of the form h(v) =

J j=1

γj I(v = Vj ) subject to constraints on both mean

and trend (i.e. the ad hoc approach), or assume the piecewise-constant functional form upon appropriate binning in vintage originations

h(Vj ) = γκ , if j ∈ Gκ , κ = 1, . . . , K

(4.41)

where the vintage buckets Gκ = {j : Vj ∈ (νκ−1 , νκ ]} are pre-speciﬁed based on Vj− ≡ ν0 < · · · < νK ≡ VJ (where K < J). Then, the semiparametric model (4.40) is parametrized with dummy variables to be

Lm Lt

λji (m, t) = exp l=1 K

αl I(m = m[l] ) + l=2 βl I(t = t[l] ) (4.42)

+ κ=2 γκ I(j ∈ Gκ ) + θ T zji (m)

where β1 ≡ 0 and γ1 ≡ 0 are excluded from the model due to the identiﬁability constraints. Plugging it into the joint likelihood (4.6), one may then ﬁnd the MLE of (α, β, γ, θ). (Note that it is rather straightforward to convert the resulting parameter estimates α, β and γ to satisfy the zero sum constraints for β and γ, so is it for (4.45) to be studied.) As the sample size tends to inﬁnity, both endogenous and exogenous baseline hazards λf (m) and λg (t) deﬁned on the continuous dual-time scales (m, t) tend to be inﬁnite-dimensional, hence making intractable the above pointwise parameterization. The semiparametric estimation problem in this case is challenging, as the dual-time Cox models (4.40) involves two nonparametric components that need to be proﬁled out for the purpose of estimating the parametric eﬀects. In what follows, we take an “intermediate approach” to semiparametric estimation via exogenous timescale discretization, to which the theory of partial likelihood applies.

113

4.4.2

Partial Likelihood Estimation

Consider an intermediate reformulation of the dual-time Cox models (4.40) by exogenous timescale discretization:

tl = tmin + l∆t, for l = 0, 1, . . . , L

(4.43)

for some time increment ∆t bounded away from zero, where tmin corresponds to the left boundary and tmin + L∆t corresponds to the the right boundary of the calendar time under consideration. (If the Lexis diagram is discrete itself, ∆t is automatically deﬁned. The irregular time spacing is feasible, too.) On each sub-interval, assume the exogenous baseline λg (t) is deﬁned on the L cutting points (and zero otherwise)

L

g(t) = l=1 βl I(t = tmin + l∆t),

(4.44)

subject to

L l=1

βl = 0. Then, consider the one-way CoxPH reformulation of (4.40):

L

λji (m) = λf (m) exp l=2 K

βl I(Vj + m ∈ (tl−1 , tl ]) γκ I(j ∈ Gκ ) + θ T zji (m) κ=2 +

(4.45)

where λf (m) is an arbitrary baseline hazard, g(t) for any t ∈ (tl−1 , tl ] in (4.40) is shifted to g(tl ) in (4.44), and the vintage eﬀect is bucketed by (4.41). It is easy to see that if the underlying Lexis diagram is discrete, the reformulated CoxPH model (4.45) is equivalent to the fully parametrized version (4.42). We consider the partial likelihood estimation of the parameters (β, γ, θ) ≡ ξ. For convenience of discussion, write for each (j, i, m) the (L − 1)-vector of {I(Vj + m ∈ (tl−1 , tl ])}L , the (K −1)-vector of {I(Vj ∈ (νκ−1 , νκ ])}K , and zji (m) as a long vector κ=2 l=2 xji (m). Then, by Cox (1972, 1975), the parameter ξ can be estimated by maximizing

114

the partial likelihood

J nj

PL(ξ) = j=1 i=1

exp{ξ T xji (Mji )} T (j ,i )∈Rji exp ξ xj i (Mji )

∆ji

(4.46)

where Rji = {(j , i ) : Uj i < Mji ≤ Mj i } is the at-risk set at each observed Mji . Given the maximum partial likelihood estimator ξ, the endogenous cumulative baseline hazard Λf (m) = m 0

λf (s)ds can be estimated by the Breslow estimator

J nj

Λf (m) = j=1 i=1

I(m ≥ Mji )∆ji

(j ,i )∈Rji exp ξ xj i (Mji ) T

(4.47)

which is a step function with increments at m[1] < · · · < m[Lm ] . It is well-known in the CoxPH context that both estimators ξ and Λf (·) are consistent and asymptotically normal; see Andersen and Gill (1982) based on counting process martingale theory. √ Speciﬁcally, N (ξ − ξ) converges to a zero-mean multivariate normal distribution with covariance matrix that can be consistently estimated by {I(ξ)/N }−1 , where √ I(ξ) = −∂ 2 log PL(ξ) ∂ξξ T . For the nonparametric baseline, N Λf (m) − Λf (m) converges to a zero-mean Gaussian process; see also Andersen, et al. (1993). The DtBreslow and MEV estimators we have discussed in Section 4.2 can be studied by the theory of partial likelihood above, since the underlying nonparametric models are special cases of (4.45) with θ ≡ 0. Without the account-level covariates, the hazard rates λji (m) roll up to λj (m), under which the partial likelihood function (4.46) can be expressed through the counting notations introduced in (4.9),

J

PL(g, h) = j=1 m

exp{g(Vj + m) + h(Vj )}

J j =1

neventj (m)

nriskj (m) exp{g(Vj + m) + h(Vj )}

(4.48)

where g(t) is parameterized by (4.44) and h(v) is by (4.41). Let λg (t) = exp{g(t)} and λh (v) = exp{h(v)} upon maximization of PL(g, h) and centering in the sense of

115

Eg = Eh = 0. Then, by (4.47), we have the Breslow estimator of the cumulative version of endogenous baseline hazards m J j=1 J j=1

Λf (m) =

0

neventj (s)ds

nriskj (s)λg (Vj + s)λh (Vj )

.

(4.49)

It is clear that the increments of Λf (m) correspond to the maturation eﬀect (4.22) of the MEV estimator. They also reduce to (4.18) of DtBreslow estimator in the absence of vintage heterogeneity. The implementation of dual-time Cox modeling based on (4.45) can be carried out by maximizing the partial likelihood (4.46) to ﬁnd θ and using (4.47) to estimate the baseline hazard. Standard softwares with survival package or toolbox can be used to perform the parameter estimation. In the case when the coxph procedure in these survival packages (e.g. R:survival) requires the time-indepenent covariates, we may convert (4.1) to an enlarged counting process format with piecewise-constant approximation of time-varying covariates xji (m); see the supplementary material in the last section of the chapter. For example, we have tested the partial likelihood estimation using the coxph of R:survival package and the simulation data in Figure 4.1, where the raw data (4.1) are converted to the counting process format upon the same binning preprocessing as in Section 4.2.3. The numerical results of λg , λh by maximizing (4.48) and λf by (4.49) are nearly identical to the plots shown in Figure 4.3.

4.4.3

Frailty-type Vintage Eﬀects

Recall the frailty models introduced in Section 1.3 about their double roles in charactering both the odd eﬀects of unobserved heterogeneity and the dependence of correlated defaults. They provide an alternative approach to model the vintage eﬀects as the random block eﬀects. Let Zκ for κ = 1, . . . , K be a random sample from

116

some frailty distribution, where Zκ represents the shared frailty level of the vintage bucket Gκ according to (4.41). Consider the dual-time Cox frailty models λji (m, t|Zκ ) = λf (m)λg (t)Zκ exp{θ T zji (m)}, if j ∈ Gκ

(4.50)

for i = 1, . . . , nj and j ∈ Gκ for κ = 1, . . . , K. We also write λji (m, t|Zκ ) as λji (m|Zκ ) with t ≡ Vj + m. It is assumed that given the frailty Zκ , the observations for all vintages in Gκ are independent. In this frailty model setting, the between-bucket heterogeneity is captured by realizations of the random variate Z, while the withinbucket dependence is captured by the same realization Zκ . Similar to the intermediate approach taken by the partial likelihood (4.46) under ˜ ˜ exogenously discretized dual-time model (4.45), let us write ξ = (β, θ) and let xji (m) join the vectors of {I(Vj + m ∈ (tl−1 , tl ])}L and zji (m). Then, we may write (4.50) l=2 ˜T ˜ as λji (m|Zκ ) = λf (m)Zκ exp ξ xji (m) . The log-likelihood based on the dual-timeto-default data (4.1) is given by

J nj T K

= j=1 i=1

˜ ˜ ∆ji log λf (Mji ) + ξ xji (m) + κ=1 log (−1)dκ L (dκ ) (Aκ ) .

(4.51)

where L (x) = EZ [exp{−xZ} is the Laplace transform of Z, L (d) (x) is its d-th derivative, dκ = j∈Gκ nj i=1

∆ji and Aκ =

j∈Gκ

nj i=1

Mji Uji

˜T ˜ λf (s) exp ξ xji (s) ds;

see the supplementary material at the end of the chapter. The gamma distribution Gamma(δ −1 , δ) with mean 1 and variance δ is the most frequently used frailty due to its simplicity. It has the Laplace transform L (x) = (1+ δx)−1/δ whose d-th derivative is given by L (d) (x) = (−δ)d (1 + δx)−1/δ−d d i=1 (1/δ

+

i − 1). For a ﬁxed δ, often used is the EM (expectation-maximization) algorithm for estimating (λf (·), ξ); see Nielsen, et al. (1992) or the supplementary material. Alternatively, the gamma frailty models can be estimated by the penalized likelihood approach, by treating the second term of (4.51) as a kind of regularization. 117

Therneau and Grambsch (2000; §9.6) justiﬁed that for each ﬁxed δ, the EM algorithmic solution coincides with the maximizer of the following penalized log-likelihood

J nj

∆ji j=1 i=1

1 log λf (Mji ) + ξ xji (m) − δ

T

K

γκ − exp{γκ } κ=1 (4.52)

where ξ = (β, γ, θ) and xji (m) are the same as in (4.46). They also provide the programs (in R:survival package) with Newton-Raphson iterative solution as an inner loop and selection of δ in an outer loop. Fortunately, these programs can be used to ﬁt the dual-time Cox frailty models (4.50) up to certain modiﬁcations. Remarks: 1. The dual-time Cox regression considered in this section is tricked to the standard CoxPH problems upon exogenous timescale discretization, such that the existing theory of partial likelihood and the asymptotic results can be applied. We have not studied the dual-time asymptotic behavior when both lifetime and calendar time are considered to be absolutely continuous. In practice with ﬁnite-sample observations, the time-discretization trick works perfectly. 2. The dual-time Cox model (4.40) includes only the endogenous covariates zji (m) that vary across individuals. In practice, it is also interesting to see the eﬀects of exogenous covariates z(t) that are invariant to all the individuals at the same calendar time. Such z(t) might be entertained by the traditional lifetime Cox models (4.37) upon the time shift, however, they are not estimable if included as a log-linear term by (4.40), since exp{θ T z(t)} would be absorbed by the exogenous baseline λg (t). Therefore, in order to investigate the exogenous covariate eﬀects of z(t), we recommend a two-stage procedure with the ﬁrst stage ﬁtting the dual-time Cox models (4.40), and the second stage modeling the correlation between z(t) and the estimated λg (t) based on the time series techniques.

118

4.5

Applications in Retail Credit Risk Modeling

The retail credit risk modeling has become of increasing importance in the recent years; see among others the special issues edited by Berlin and Mester (Journal of Banking and Finance, 2004) and Wallace (Real Estate Economics, 2005). In quantitative understanding of the risk determinants of retail credit portfolios, there have been large gaps among academic researchers, banks’ practitioners and governmental regulators. It turns out that many existing models for corporate credit risk (e.g. the Merton model) are not directly applicable to retail credit. In this section we demonstrate the application of dual-time survival analysis to credit card and mortgage portfolios in retail banking. Based on our real dataanalytic experiences, some tweaked samples are considered here, only for the purpose of methodological illustration. Speciﬁcally, we employ the pooled credit card data for illustrating the dual-time nonparametric methods, and use the mortgage loanlevel data to illustrate the dual-time Cox regression modeling. Before applying these DtSA techniques, we have assessed them under the simulation mechanism (4.2); see Figure 4.2 and Figure 4.3.

4.5.1

Credit Card Portfolios

As we have introduced in Section 1.4, the credit card portfolios range from product types, acquisition channels, geographical regions and the like. These attributes may be either treated as the endogenous covariates, or used to deﬁne the multiple segments. Here, we demonstrate the credit card risk modeling with both segmentation and covariates. For simplicity, we consider two generic segments and a categorical covariate with three levels, where the categorical variable is deﬁned by the credit score buckets upon loan origination: Low if FICO < 640, Medium if 640 ≤ FICO < 740 and High if FICO ≥ 740. Other variables like the credit line and utilization rate are not considered here. 119

Table 4.1: Illustrative data format of pooled credit card loans SegID FICO V U M D W 1 Low 200501 0 3 0 988 1 Low 200501 0 3 1 2 1 Low 200501 3 4 0 993 1 Low 200501 3 4 1 5 1 Low 200501 4 5 0 979 1 Low 200501 4 5 1 11 . . . . . . . . 1 Low 200501 . . . . . . . . . . . . 1 Low 200502 . . . . . . . . . . . . . . 1 Medium . . . . . . . . . . . . . . . . . 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . V: vintage. [U,M]: lifetime interval. D: status. W: counts. In this simple case, one may aggregate the observations at the vintage level for each FICO bucket in each segment. See Table 4.1 for an illustration of the counting format of the input data, where each single vintage at each lifetime interval [U, M ] has two records of weight W , namely the number of survived loans (D = 0) and the number of defaulted loans (D = 1). Note that the last column of Table 4.1 also illustrates that in the lifetime interval [3, 4], the 200501 vintage has 3 attrition accounts (i.e. 993 − 979 − 11) which are examples of being randomly censored. Let us consider the card vintages originated from year 2001 to 2008 with observations (a) left truncated at the beginning of 2005, (b) right censored by the end of 2008, and (c) up censored by 48 months-on-book maximum. Refer to Figure 1.3 about the vintage diagram of dual-time data collection, which resembles the rectangular Lexis diagram of Figure 4.1. First of all, we explore the empirical hazards of the two hypothetical segments, regardless of the FICO variable. We applied the one-way, two-way and three-way nonparametric methods developed in Section 4.2 for each segment of dual-time-to-default data. Note that given the pooled data with weights W for D = 0, 1, it is rather straightforward to modify

120

Figure 4.4: Nonparametric analysis of credit card risk: (top) one-way empirical hazards, (middle) DtBrewlow estimation, (bottom) MEV decomposition.

121

the iterative nonparametric estimators in (4.14) – (4.24), and one may also use the weighted partial likelihood estimation based on our discussion in Section 4.4. The numeric results for both segments are shown in Figure 4.4, where we may compare the performances of two segments as follows. 1. The top panel shows the empirical hazard rates in both lifetime and calendar time. By such one-way estimates, Segment 1 has shown better lifetime performance (i.e. lower hazards) than Segment 2, and it has also better calendar-time performance than Segment 2 except for the latest couple of months. 2. The middle panel shows the results of DtBreslow estimator that simultaneously calibrates the endogenous seasoning λf (m) and the exogenous cycle eﬀects λg (t) under the two-way hazard model (4.16). Compared to the one-way calibration above, the seasoning eﬀects are similar, but the exogenous cycle eﬀects show dramatic diﬀerence. Conditional on the endogenous baseline, the credit cycle of Segment 2 grows relatively steady, while Segment 1 has a sharp exogenous trend since t = 30 (middle 2007) and exceeds Segment 1 since about t = 40. 3. The bottom panel takes into the vintage heterogeneity, where we consider only the yearly vintages for simplicity. The estimated of λf (m) and λg (t) are both aﬀected after adding the vintage eﬀects, since the more heterogeneity eﬀects are explained by the vintage originations, the less variations are attributed to the endogenous and exogenous baselines. Speciﬁcally, the vintage performance in both segments have deteriorated from pre-2005 to 2007, and made a turn when entering 2008. Note that Segment 2 has a dramatic improvement of vintage originations in 2008. Next, consider the three FICO buckets for each segment, and we still take the nonparametric approach for each of 2 × 3 segments. The MEV decomposition of empirical hazards are shown in Figure 4.5. It is found that in lifetime the low, medium 122

and high FICO buckets have decreasing levels of endogenous baseline hazard rates, where for Segment 2 the hazard rates are evidently non-proportional. The exogenous cycles of low, medium and high FICO buckets co-move (more or less) within each segment. As for the vintage heterogeneity, it is interesting to note that for Segment 1 the high FICO vintages keep deteriorating from pre-2005 to 2008, which is a risky signal. The vintage heterogeneity in other FICO buckets are mostly consistent with the overall pattern in Figure 4.4.

Figure 4.5: MEV decomposition of credit card risk for low, medium and high FICO buckets: Segment 1 (top panel); Segment 2 (bottom panel). This example mainly illustrates the eﬀectiveness of dual-time nonparametric methods when there are limited or no covariates provided. By the next example of mortgage loans, we shall demonstrate the dual-time Cox regression modeling with various covariate eﬀects on competing risks.

123

4.5.2

Mortgage Competing Risks

The U.S. residential mortgage market is huge. As of March 2008, the ﬁrst-lien mortgage debt is estimated to be about $10 trillion (for 54.7 million loans) with a distribution of $1.2 trillion subprime mortgages, $8.2 trillion prime and near-prime combined, and the remaining being government guaranteed; see U.S. Federal Researve report by Frame, Lehnert and Prescott (2008). In brief, the prime mortgages generally target the borrowers with good credit histories and normal down payments, while the subprime mortgages are made to borrowers with poor credit and high leverage. The rise in mortgage defaults is unprecedented. Mayer, Pence and Sherlund (2009) described the rising defaults in nonprime mortgages in terms of underwriting standards and macroeconomic factors. Quantitatively, Sherlund (2008) considered subprime mortgages by proportional hazards models (4.37) with presumed ﬁxed lifetime baselines (therefore, a parametric setting). One may refer to Gerardi, et al. (2008) and Goodman, et al. (2008) for some further details of the subprime issue. In retrospect, the 2007-08 collapse of mortgage market has rather complicated causes, including the loose underwriting standards, the increased leverage, the originate-thensecuritize incentives (a.k.a. moral hazard), the fall of home prices, and the worsening unemployment rates. We make an attempt via dual-time survival analysis to understand the risk determinants of mortgage default and prepayment. Considered here is a tweaked sample of mortgages loans originated from year 2001 to 2007 with Lexis diagram of observations (a) left truncated at the beginning of 2005, (b) right censored by October 2008, and (c) up censored by 60 months-on-book maximum. These loans resemble the nonconforming mortgages in California, where the regional macroeconomic conditions have become worse and worse since middle 2006; see Figure 4.6 for the plots of 2000-08 home price indices and unemployment rate, where we use the up-to-date data sources from S&P/Case-Shiller and U.S. Bureau of Labor Statistics.

124

Figure 4.6: Home prices and unemployment rate of California state: 2000-2008.

Figure 4.7: MEV decomposition of mortgage hazards: default vs. prepayment

125

For a mortgage loan with competing risks caused by default and prepayment, the observed event time corresponds to time to the ﬁrst event or the censoring time. One may refer to Lawless (2003; §9) for the modeling issues on survival analysis of competing risks. One practically simple approach is to reduce the multiple events to individual survival analysis by treating alternative events as censored. Then, we may ﬁt two marginal hazard rate models for default and prepayment, respectively. One may refer to Therneau and Grambsch (2000; §8) about robust variance estimation for the marginal Cox models of competing risks. We begin with nonparametric exploration of endogenous and exogenous hazards, as well as the vintage heterogeneity eﬀects. Similar to the credit card risk modeling above, the monthly vintages are grouped yearly to be pre-2005, 2005 2006, 2007 originations. The ﬁtting results of MEV hazard decomposition are shown in Figure 4.7 for both default and prepayment. Conditional on the estimated endogenous baseline hazards of default or prepayment, we ﬁnd that 1. The exogenous cycle of the default risk has been low (with exogenous multiplier less than 1) prior to t = 20 (August 2006), then become rising sharply up to about 50 times more risker at about t = 43 (July 2008). In contrast, the exogenous cycle of the prepayment risk has shown a modest town-turn trend during the same calendar time period. 2. The vintage heterogeneity plot implies that the 2005-06 originations have much higher probability of default than the mortgage originations in other years. In contrast, the vintage heterogeneity of prepayment keeps decreasing from pre2005 to 2007. These ﬁndings match the empirical evidences presented lately by Gerardi, et al. (2008). Compare the estimated exogenous eﬀects on default hazards to the macroeconomic conditions in Figure 4.6. It is obvious that the rise in mortgage defaults follows closely

126

the fall of home prices (positively correlated) and the worsening unemployment rate (negatively correlated). However, based on our demonstration data, it is not possible to quantify what percentage of exogenous eﬀects should be attributed to either house prices or unemployment, because (a) the house prices and unemployment in California are themselves highly correlated from 2005 to 2008, and (b) the exogenous eﬀects on default hazards could be also caused by other macroeconomic indicators. One may of course perform multivariate time series analysis in order to study the correlation between exogenous hazards and macroeconomic indicators, which is beyond the scope of the present thesis. Next, we try to decode the vintage eﬀect of unobserved heterogeneity by adding the loan-level covariates. For demonstration purpose, we consider only the few mortgage covariates listed in Table 4.2, where CLTV stands for the combined loan-to-value leverage ratio, FICO is the same measurement of credit score as in the credit card risk, and other underwriting covariates upon loan origination can be referred to Goodman, et al. (2008; §1) or Wikipedia. We also include the mortgage note rate, either ﬁxed or ﬂoating (i.e., adjustable), to demonstrate the capability of Cox modeling with time-varying covariates. Note that the categorical variable settings are simpliﬁed in Table 4.2, and they could be way more complicated in real situations. The mean, standard deviation and range statistics presented for the continuous covariates CLTV, FICO and NoteRate are calculated from our tweaked sample. The dual-time Cox regression models considered here take the form of (4.4) with both endogenous and exogenous baselines. After incorporating the underwriting and dynamic covariates in Table 4.2, we have

(q) (q) (q) (q) (q)

λji (m, t) = λf (m)λ(q) (t) exp θ1 CLTVji + θ2 FICOji + θ3 NoteRateji (m) g + θ4 DocFullji + θ5 IOji + θ6.p Purpose.purchaseji + θ6.r Purpose.reﬁji , (4.53)

(q) (q) (q) (q)

127

Table 4.2: Loan-level covariates considered in mortgage credit risk modeling, where NoteRate could be dynamic and others are static upon origination. CLTV µ = 78, σ = 13 and range [10, 125] FICO µ = 706, σ = 35 and range [530, 840] NoteRate µ = 6.5, σ = 1.1 and range [1, 11] Documentation full or limited IO Indicator Interest-Only or not Loan Purpose purchase, reﬁ or cash-out reﬁ Table 4.3: Maximum partial likelihood estimation of mortgage covariate eﬀects in dual-time Cox regression models. Covariate Default Prepayment CLTV 2.841 −0.285 FICO −0.500 0.385 NoteRate 1.944 0.2862 −1.432 −0.091 DocFull 1.202 0.141 IO Purpose.p −1.284 0.185 Purpose.r −0.656 −0.307 where the superscript (q) indicates the competing-risk hazards of default or prepayment, the continuous type of covariates FICO, CLTV and NoteRate are all scaled to have zero mean and unit variance, and the 3-level Purpose covariate is broken into 2 dummy variables (Purpose.p and Purpose.r against the standard level of cash-out reﬁ). For simplicity, we have assumed the log-linear eﬀects of all 3 continuous covariates on both default and prepayment hazards, while in practice one may need strive to ﬁnd the appropriate functional forms or break them into multiple buckets. By the partial likelihood estimation discussed in Section 4.4, we obtain the numerical results tabulated in Table 4.3, where the coeﬃcient estimates are all statistically signiﬁcant based on our demonstration data (some other non-signiﬁcant covariates were actually screened out and not considered here). From Table 4.3, we ﬁnd that conditional on the nonparametric dual-time baselines, 1. The default risk is driven up by high CLTV, low FICO, high NoteRate, limited documentation, Interest-Only, and cash-out reﬁ Purpose.

128

2. The prepayment risk is driven up by low CLTV, high FICO, high NoteRate, limited documentation, Interest-Only, and purchase Purpose. 3. The covariates CLTV, FICO and Purpose reveals the competing nature of default and prepayment risks. It is actually reasonable to claim that a homeowner with good credit and low leverage would more likely prepay rather than choose to default.

4.6

Summary

This chapter is devoted completely to the survival analysis on Lexis diagram with coverage of nonparametric estimators, structural parameterization and semiparametric Cox regression. For the structural approach based on inverse Gaussian ﬁrst-passage-time distribution, we have discussed the feasibility of its dual-time extension and the calibration procedure, while its implementation is open to future investigation. Computational results are provided for other dual-time methods developed in this chapter, including assessment of the methodology through the simulation study. Finally, we have demonstrated the application of dual-time survival analysis to the retail credit risk modeling. Some interesting ﬁndings in credit card and mortgage risk analysis are presented, which may shed some light on understanding the ongoing credit crisis from a new dual-time perspective. To end this thesis, we conclude that statistical methods play a key role in credit risk modeling. Developed in this thesis is mainly the dual-time analytics, including VDA (vintage data analysis) and DtSA (dual-time survival analysis), which can be applied to model both corporate and retail credit. We have also remarked throughout the chapters the possible extensions and other open problems of interest. So, the end is also a new beginning . . .

129

4.7

Supplementary Materials

(A) Calibration of Inverse Gaussian Hazard Function Consider the calibration of the two-parameter inverse Gaussian hazard rate function λ(m; c, b) of the form (4.26) from the n observations (Ui , Mi , ∆i ), i = 1, . . . , n each subject to left truncation and right censoring. Knowing the parametric likelihood given by (4.27), the MLE requires nonlinear optimization that can be carried out by gradient search (e.g. Newton-Raphason algorithm). Provided below are some calculations of the derivatives for the gradient search purpose. Denote by η = (c, b)T the structural parameters of inverse Gaussian hazard. The log-likelihood function is given by n Mi

= i=1 ∆i log λ(Mi ; η) −

Ui

λ(s; η)ds

(4.54)

Taking the ﬁrst and second order derivatives, we have the score and Hessian functions ∂ = ∂η ∂2 ∂η∂η T = i=1 n

i=1 n

∆i − λ(Mi ; η) ∆i

Mi

λ (s; η)ds

Ui Mi

λ (Mi ; η) (λ (Mi ; η))⊗2 − λ(Mi ; η) λ2 (Mi ; η)

−

Ui

λ (s; η)ds

where λ (m; η) = ∂λ(m; η)/∂η, λ (m; η) = ∂ 2 λ(t; η)/∂η∂η T , and a⊗2 = aaT . By supplying the gradients, the optimization algorithms in most of standard softwares are faster and more reliable. Note that the evaluation of the integral terms above can be carried out by numerical integration methods like the composite trapezoidal or Simpon’s rule; see e.g. Press, et al. (2007). Alternatively, one may replace the second term of (4.54) by log S(Mi ; η) − log S(Ui ; η) and take their derivatives based on the explicit expressions below. The complications lie in the evaluations of ∂λ(m; η)/∂η and ∂ 2 λ(m; η)/∂η∂η T

130

based on the inverse Gaussian baseline hazard (4.26), which are, messy though, provided below. Denoting η1 + η2 m √ , m −η1 + η2 m √ , m 1 2 φ(z) = √ e−z /2 , 2π

z1 =

z2 =

evaluate ﬁrst the derivatives of the survival function (i.e. the denominator of (4.26)): ∂S(m; η) ∂η1 ∂S(m; η) ∂η2 2 ∂ S(m; η) 2 ∂η1 ∂ 2 S(m; η) ∂η1 ∂η2 ∂ 2 S(m; η) 2 ∂η2 1 = 2η2 e−2η1 η2 Φ(z2 ) + √ φ(z1 ) + e−2η1 η2 φ(z2 ) m √ = 2η1 e−2η1 η2 Φ(z2 ) + m φ(z1 ) − e−2η1 η2 φ(z2 ) 4η2 1 2 = −4η2 e−2η1 η2 Φ(z2 ) − √ e−2η1 η2 φ(z2 ) − φ(z1 )z1 − e−2η1 η2 φ(z2 )z2 m m = (2 − 4η1 η2 )e−2η1 η2 Φ(z2 ) − φ(z1 )z1 − e−2η1 η2 φ(z2 )z2 √ 2 = −4η1 e−2η1 η2 Φ(z2 ) + 4η1 me−2η1 η2 φ(z2 ) − m φ(z1 )z1 − e−2η1 η2 φ(z2 )z2 .

Then, the partial derivatives of λ(m; η) can be evaluated explicitly by η1 1 λ0 (m; η) ∂S(m; η) ∂λ(m; η) = −λ0 (m; η) + η2 − − ∂η1 m η1 S(m; η) ∂η1 ∂λ(m; η) λ(m; η) ∂S(m; η) = −λ(m; η) (η1 + η2 m) − ∂η2 S(m; η) ∂η2 ∂λ(m; η) η1 1 1 1 ∂ 2 λ(m; η) = − + η2 − + − λ(m; η) 2 2 ∂η1 ∂η1 m η1 η1 m λ(m; η) ∂ 2 S(m; η) 1 ∂λ(m; η) ∂S(m; η) λ(m; η) − + 2 2 S(m; η) ∂η1 S(m; η) ∂η1 ∂η1 S (m; η) 2 ∂ λ(m; η) ∂λ(m; η) = − (η1 + η2 m) − λ(m; η)m 2 ∂η2 ∂η2 − − ∂S(m; η) ∂η1

2

λ(m; η) ∂ 2 S(m; η) 1 ∂λ(m; η) ∂S(m; η) λ(m; η) ∂S(m; η) − + 2 2 S(m; η) ∂η2 S(m; η) ∂η2 ∂η2 S (m; η) ∂η2 2 ∂ λ(m; η) ∂λ(m; η) = − (η1 + η2 m) − λ(m; η) ∂η1 ∂η2 ∂η1 λ(m; η) ∂ 2 S(m; η) 1 ∂λ(m; η) ∂S(m; η) λ(m; η) ∂S(m; η) ∂S(t; η) − − + 2 . S(m; η) ∂η1 ∂η2 S(m; η) ∂η1 ∂η2 S (m; η) ∂η1 ∂η2

2

131

(B) Implementation of Cox Modeling with Time-varying Covariates Consider the Cox regression model with time-varying covariates x(m) ∈ Rp : λ(m; x(m)) = λ0 (m) exp{θ T x(m)}

(4.55)

where λ0 (m) is an arbitrary baseline hazard function and θ is the p-vector of regression coeﬃcients. Provided below is the implementation of maximum likelihood estimation of θ and λ0 (m) by tricking it to be a standard CoxPH modeling with time-indepdent covariates. Let τi be the event time for each account i = 1, . . . , n that is subject to left truncation Ui and right censoring Ci . Given the observations

(Ui , Mi , ∆i , xi (m)), m ∈ [Ui , Mi ], i = 1, . . . , n,

(4.56)

in which Mi = min{τi , Ci }, ∆i = I(τi < Ci ), the complete likelihood is given by n L= i=1 fi (Mi ) Si (Ui )

∆i

Si (Mi +) Si (Ui )

1−∆i

n

= i=1 [λi (Mi )]∆i Si (Mi +)Si−1 (Ui ).

(4.57)

Under the dynamic Cox model (4.55), the likelihood function is n ∆i Mi

L(θ, λ0 ) = i=1 λ0 (Mi ) exp{θ T xi (Mi )}

exp −

Ui

λ0 (s) exp{β T xi (s)}ds . (4.58)

Taking λ0 as the nuisance parameter, one may estimate β by the partial likelihood; see Cox (1972, 1975) or Section 4.4. The model estimation with time-varying covariates can be implemented by the standard software package that handles only the time-independent covariates by default. The trick is to construct little time segments such that xi (m) can be viewed constant within each segment. See Figure 4.8 for an illustration with the construction

132

Figure 4.8: Split time-varying covariates into little time segments, illustrated. of monthly segments. For each account i, let us partition m ∈ [Ui , Ti ] to be

Ui ≡ Ui,0 < Ui,1 < · · · < Ui,Li ≡ Mi

such that xi (m) = xi,l , m ∈ (Ui,l−1 , Ui,l ]. (4.59) n i=1

Thus, we have converted the n-sample survival data (4.56) to the following independent observations each with the constant covariates

Li

(Ui,l−1 , Ui,l , ∆i,l , xi,l ), l = 1, . . . , Li , i = 1, . . . , n

(4.60)

where ∆i,l = ∆i for l = Li and 0 otherwise. It is straightforward to show that the likelihood (4.57) can be reformulated by n Li n Li

L= i=1 [λi (Mi )]

∆i l=1

Si (Ui,l )Si−1 (Ui,l−1 )

= i=1 l=1

[λi (Ui,l )]∆i,l Si (Ui,l )Si−1 (Ui,l−1 ), (4.61)

which becomes the likelihood function based on the enlarged data (4.60). Therefore,

133

under the assumption (4.59), the parameter estimation by maximizing (4.58) can be equivalently handled by maximizing n Li ∆i,l Ui,l

L(θ, λ0 ) = i=1 l=1

λ0 (Ui,l ) exp{θ xi,l }

T

exp − exp{θ xi,l }

Ui,l−1

T

λ0 (s)ds . (4.62)

In R, one may perform the MLE directly by applying the CoxPH procedure (R: survival package) to the transformed data (4.60); see Therneau and Grambsch (2000) for computational details. Furthermore, one may trick the implementation of MLE by logistic regression, provided that the underlying hazard rate function (4.55) admits the linear approximation up to the logit link:

logit(λi (m)) ≈ η T φ(m) + θ T xi (m), for i = 1, . . . , n

(4.63)

where logit(x) = log(x/1 − x) and φ(m) is the vector of pre-speciﬁed basis functions (e.g. splines). Given the survival observations (4.60) with piecewise-constant covariates, it is easy to show that the discrete version of joint likelihood we have formulated in Section 4.2 can be rewritten as n ∆i

L = i=1 n

λi (Mi )

Li

[1 − λi (Mi )]1−∆i

∆i,l

[1 − λi (m)] m∈(Ui ,Mi )

= i=1 l=1

λi (Mi )

[1 − λi (Mi )]1−∆i,l

(4.64)

where ∆i,l are the same as in (4.60). This becomes clearly the likelihood function of the independent binary data (∆il , xi,l ) for l = 1, . . . , Li and i = 1, . . . , n, where xi,l are constructed by (4.59). Therefore, under the prameterized logistic model (4.63), one may perform the MLE for the parameters (η, θ) by a standard GLM procedure; see e.g. Faraway (2006).

134

(C) Frailty-type Vintage Eﬀects in Section 4.4.3 The likelihood function. By the independence of Z1 , . . . , Zk , it is easy to write down the likelihood as

K nj ∆ji Mji

L= κ=1 EZκ j∈Gκ i=1

λji (Mji |Zκ )

exp

−

Uji

λji (s|Zκ )ds

.

(4.65)

˜T ˜ Substituting λji (m|Zκ ) = λf (m)Zκ exp ξ xji (m) , we obtain for each Gκ the likelihood contribution nj Lκ = j∈Gκ i=1

˜T ˜ λf (Mji ) exp ξ xji (m)

∆ji

EZ Z dκ exp{−ZAκ }

(4.66)

in which dκ =

j∈Gκ

nj i=1

∆ji and Aκ =

j∈Gκ

nj i=1

Mji Uji

˜T ˜ λf (s) exp ξ xji (s) ds.

The expectation term in (4.66) can be derived from the dk -th derivative of Laplace transform of Z which satisﬁes that EZ Z dκ exp{−ZAκ } = (−1)dκ L (dκ ) (Aκ ); see e.g. Aalen, et al. (2008; §7). Thus, we obtain the joint likelihood

K nj

L= κ=1 j∈Gκ i=1

˜T ˜ λf (Mji ) exp ξ xji (m)

∆ji

(−1)dκ L (dκ ) (Aκ ),

(4.67)

or the log-likelihood function of the form (4.51). EM algorithm for fraitly model estimation. 1. E-step: given the ﬁxed values of (ξ, λf ), estimate the individual frailty levels {Zκ }K by the conditional expectation κ=1 L (dκ +1) (Aκ ) Zκ = E[Zκ |Gκ ] = − L (dκ ) (Aκ )

(4.68)

which corresponds to the empirical Bayes estimate; see Aalen, et al. (2008; §7.2.3). For the gamma frailty Gamma(δ −1 , δ) with L (d) (x) = (−δ)d (1 + 135

δx)−1/δ−d

d i=1 (1/δ

+ i − 1), the frailty level estimates are given by 1 + δdκ , κ = 1, . . . , K. 1 + δAκ

Zκ =

(4.69)

2. M-step: given the frailty levels {Zκ }K , estimate (ξ, λf ) by partial likelihood κ=1 maximization and Breslow estimator with appropriate modiﬁcation based on the ﬁxed frailty multipliers

J nj

PL(ξ) = j=1 i=1 J nj

exp{ξ xji (Mji )}

(j ,i )∈Rji Zκ(j ) exp ξ xj i (Mji ) T

T

∆ji (4.70)

Λf (m) = j=1 i=1

I(m ≥ Mji )∆ji

(j ,i )∈Rji Zκ(j ) exp ξ xj i (Mji ) T

,

(4.71)

where κ(j) indicates the bucket Gκ to which Vj belongs. In R, the partial likelihood maximization can be implemented by the standard CoxPH procedure based on the oﬀset type of log-frailty predictors as provided; details omitted. To estimate the frailty parameter δ, one may run EM algorithm for δ on a grid to obtain (ξ (δ), λ∗ (·; δ)), then ﬁnd the maximizer of the proﬁled log-likelihood (4.51). f

∗

136

BIBLIOGRAPHY

137

BIBLIOGRAPHY

[1] Aalen, O. Borgan, O. and Gjessing, H. K. (2008). Survival and Event History Analysis: A Process Point of View. Springer, New York. [2] Aalen, O. and and Gjessing, H. K. (2001). Understanding the shape of the hazard rate: a process point of view (with discussion). Statistical Science, 16, 1–22. [3] Abramovich, F. and Steinberg, D. M. (1996). Improved inference in nonparametric regression using Lk -smoothing splines. Journal of Statistical Planning and Inference, 49, 327–341. [4] Abramowitz, M. and Stegun, I. A. (1972). Handbook of Mathematical Functions (10th printing). New York: Dover. [5] Andersen, P. K., Borgan, O., Gill, R. D. and Keiding, N. (1993). Statistical Models Based on Couting Processes. Springer, New York. [6] Andersen, P. K. and Gill, R. D. (1982). Cox’s regression model for counting processes: a large sample study. Annals of Statistics, 10, 1100–1120. [7] Arjas, E. (1986). Stanford heart transplantation data revisited: a real time approach. In: Modern Statistical Methods in Chronic Disease Epidemiology (Moolgavkar, S. H. and Prentice, R. L., eds.), pp 65–81. Wiley, New York. [8] Berlin, M. and Mester, L. J. (eds.) (2004). Special issue of “Retail credit risk management and measurement”. Journal of Banking and Finance, 28, 721–899. [9] Bielecki, T. R. and Rutkowski, M. (2004). Credit Risk: Modeling, Valuation and Hedging. Springer-Verlag, Berlin. [10] Bilias, Y., Gu, M. and Ying, Z. (1997). Towards a general asymptotic theory for Cox model with staggered entry. Annals of Statistics, 25, 662–682. [11] Black, F. and Cox, J. C. (1976). Valuing corporate securities: Some eﬀects of bond indenture provisions. Journal of Finance, 31, 351–367. [12] Bluhm, C. and Overbeck, L. (2007) Structured Credit Portfolio Analysis, Baskets and CDOs. Boca Raton: Chapman and Hall. [13] Breeden, J. L. (2007). Modeling data with multiple time dimensions. Computational Statistics & Data Analysis, 51, 4761–4785. 138

[14] Breslow, N. E. (1972). Discussion of “Regression models and life-tables” by Cox, D. R. Journal of the Royal Statistical Society: Ser. B, 25, 662–682. [15] Chen, L., Lesmond, D. A. and Wei, J. (2007). Corporate yield spreads and bond liquidity. Journal of Finance, 62, 119–149. [16] Chhikara, R. S. and Folks, J. L. (1989). The Inverse Gaussian Distribution: Theory, Methodology, and Applications. Dekker, New York. [17] Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society. Series B, 34, 187–220. [18] Cox, D. R. (1975). Partial likelihood. Biometrika, 62, 269–276. [19] Cox, J. C., Ingersoll, J. E. and Ross, S. A. (1985). A theory of the term structure of interest rates. Econometrika, 53, 385–407. [20] Cressie, N. A. (1993). Statistics for Spatial Data. New York: Wiley. [21] Das, S. R., Duﬃe, D., Kapadia, N. and Saita, L. (2007). Common failings: how corporate defaults are correlated. Journal of Finance, 62, 93–117. [22] Deng, Y., Quigley, J. M. and Van Order, R. (2000). Mortgage terminations, heterogeneity and the exercise of mortgage options. Econometrica, 68, 275–307. [23] Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81 (3), 425-455. [24] Duchesne, T. and Lawless, J. F. (2000). Alternative time scales and failure time models. Lifetime Data Analysis, 6, 157–179. [25] Duﬃe, D., Pan, J. and Singleton, K. (2000). Transform analysis and option pricing for aﬃne jump-diﬀusions. Econometrica, 68, 1343–1376. [26] Duﬃe, D. and Singleton, K. J. (2003). Credit Risk: Pricing, Measurement, and Management. Princeton University Press. [27] Duﬃe, D., Saita, L. and Wang, K. (2007). Multi-period corporate default prediction with stochastic covariates. Journal of Financial Economics, 83, 635–665. [28] Duﬃe, D., Eckner, A., Horel, G. and Saita, L. (2008). Frailty correlated default. Journal of Finance, to appear. [29] Efron, B. (2002). The two-way proportional hazards model. Journal of the Royal Statistical Society: Ser. B, 34, 216–217. [30] Esper, J., Cook, E. R. and Schweingruber (2002). Low-frequency signals in long tree-ring chronologies for reconstruction past temperature variability. Science, 295, 2250–2253.

139

[31] Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chapman & Hall, Boca Raton. [32] Fan, J., Huang, T. and Li, R. (2007). Analysis of longitudinal data with semiparametric estimation of covariance function. Journal of the American Statistical Association, 102, 632–641. [33] Fan, J. and Yao, Q. (2003). Nonlinear Time Series: Nonparametric and Parametric Methods. Springer, New York. [34] Fang, K. T., Li, R. and Sudjianto, A. (2006). Design and Modeling for Compute Experiments. Baca Raton: Chapman and Hall. [35] Faraway, J. J. (2006). Extending the Linear Model with R: Generalized Linear, Mixed Eﬀects and Nonparametric Regression Models. Baca Raton: Chapman and Hall. [36] Farewell, V. T. and Cox, D. R. (1979). A note on multiple time scales in life testing. Appl. Statist., 28, 73–75. [37] Frame, S., Lehnert, A. and Prescott, N. (2008). A snapshot of mortgage conditions with an emphasis on subprime mortgage performance. U.S. Federal Reserve, August 2008. [80] Friedman, J. H. (1991). Multivariate adaptive regression splines (with discussion). Annals of Statistics, 19, 1–141. [39] Fu, W. J. (2000). Ridge estimator in singular design with applications to ageperiod-cohort analysis of disease rates. Communications in Statistics – Theory and Method, 29, 263–278. [40] Fu, W. J. (2008). A smoothing cohort model in age-period-cohort analysis with applications to homicide arrest rates and lung cancer mortality rates. Sociological Methods and Research, 36, 327–361. [41] Gerardi, K., Sherlund, S. M., Lehnert, A. and Willen, P. (2008). Making sense of the subprime crisis. Brookings Papers on Economic Activity, 2008 (2), 69–159. [42] Giesecke, K. (2006). Default and information. Journal of Economic Dynamics and Control, 30, 2281–2303. [43] Golub, G. H., Heath, M. and Wahba G. (1979). Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21, 215–223. [44] Goodman, L. S., Li, S., Lucas, D. J., Zimmerman, T. A. and Fabozzi, F. J. (2008). Subprime Mortgage Credit Derivatives. Wiley, Hoboken NJ. [45] Gramacy, R. B. and Lee, H. K. H. (2008). Bayesian treed Gaussian process models with an application to computer modeling. Journal of the American Statistical Association, 103, 1119–1130. 140

[46] Green, P. G. and Silverman, B. W. (1994). Nonparametric Regression and Generalized Linear Models. Chapman & Hall, London. [47] Gu, C. (2002). Smoothing Spline ANOVA Models. Springer, New York. [48] Gu, M. G. and Lai, T. L. (1991). Weak convergence of time-sequential censored rank statistics with applications to sequential testing in clinical trials. Ann. Statist., 19, 1403–1433. [49] Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. New York: Chapman and Hall. [50] Hougaard, P. (2000). Analysis of Multivariate Survival Data. Springer, New York. [51] Jarrow, R. and Turnbull, S. (1995). Pricing derivatives on ﬁnancial securities subject to credit risk. Journal of Finance, 50, 53–85. [52] Keiding, N. (1990). Statistical inference in the Lexis diagram. Phil. Trans. R. Soc. Lond. A, 332, 487–509. [53] Kimeldorf, G. and Wahba, G. (1970). A correspondence between Bayesian estimation of stochastic processes and smoothing by splines. Ann. Math. Stat., 41, 495–502. [54] Kimeldorf, G. and Wahba, G. (1971). Some results on Tchebycheﬃan spline functions. J. Math. Anal. Appl., 33, 82–95. [55] Kupper, L. L., Janis, J. M., Karmous, A. and Greenberg, B. G. (1985). Statistical age-period-cohort analysis: a review and critique. Journal of Chronic Disease, 38, 811–830. [56] Lando, D. (1998). On Cox processes and credit-risky securities. Review of Derivatives Research 2, 99–120. [57] Lando, D. (2004). Credit Risk Modeling: Theory and Applications. Princeton University Press. [58] Lawless, J. F. (2003). Statistical Models and Methods for Lifetime Data. Wiley, Hoboken NJ. [59] Lee, M.-L. T. and Whitmore, G. A. (2006). Threshold regression for survival analysis: modeling event times by a stochastic process reaching a boundary. Statistical Science, 21, 501–513. [60] Lexis, W. (1875). Einleitung in die Theorie der Bev¨lkerungsstatistik. Strasso burg: Tr¨bner. Pages 5–7 translated to English by Keyﬁtz, N. and printed in u Mathematical Demography (ed. Smith, D. and N. Keyﬁtz, N., 1977), Springer, Berlin.

141

[61] Luo, Z. and Wahba, G. (1997). Hybrid adaptive splines. Journal of the American Statistical Association, 92, 107–116. [62] Madan, D. and Unal, H. (1998). Pricing the risks of default. Review of Derivatives Research, 2, 121–160. [63] Mason, K. O., Mason, W. H., Winsborough, H. H. and Poole, K. (1973). Some methodological issues in cohort analysis of archival data. American Sociological Review, 38, 242–258. [64] Mason, W. H. and Wolﬁnger, N. H. (2002). Cohort analysis. In: International Encyclopedia of the Social and Behavioral Sciences, 2189–2194. New York: Elsevier. [65] Marshall, A. W. and Olkin, I. (2007). Life Distributions: Structural of Nonparametric, Semiparametric, and Parametric Families. Springer. [66] Mayer, C., Pence, K. and Sherlund, S. M. (2009). The rise in mortgage defaults. Journal of Economic Perspectives, 23, 27–50. [67] McCulloch, C. E. and Nelder, J. A. (1989). Generalized Linear Models (2nd ed.). London: Chapman and Hall. [68] Merton, R. C. (1974). On the pricing of corporate debt: The risk structure of interest rates. Journal of Finance, 49, 1213–1252. [69] Moody’s Special Comment: Corporate Default and Recovery Rates, 1920-2008. Data release: February 2009. [70] M¨ller, H. G. and Stadtm¨ller, U. (1987). Variable bandwidth kernel estimators u u of regression curves. Annals of Statistics, 15, 282–01. [71] Nelsen, R. B. (2006). An introduction to copulas (2nd ed.). Springer, New York. [72] Nielsen, G. G., Gill, R. D., Andersen, P. K. and Sørensen, T. I. A. (1992). A counting process approach to maximum likelihood estimation in frailty models Scan. J. Statist., 19, 25–43. [73] Nychka, D. W. (1988). Bayesian conﬁdence intervals for a smoothing spline. Journal of The American Statistical Association, 83, 1134–1143. [74] Oakes, D. (1995). Multiple Time Scales in Survival Analysis. Lifetime Data Analysis, 1, 7–18. [75] Pintore, A., Speckman, P. and Holmes, C. C. (2006). Spatially adaptive smoothing splines. Biometrika, 93, 113–125. [76] Press, W. H., Teukolsky, S. A., Vetterling, W. T. and Flannery, B. P. (2007) Numerical Recipes: The Art of Scientiﬁc Computing (3rd, ed.). Cambridge University Press. 142

[77] Ramlau-Hansen, H. (1983). Smoothing counting process intensities by means of kernel functions. Ann. Statist., 11, 453–466. [78] Ramsay, J. O. and Li, X. (1998). Curve registration. Journal of the Royal Statistical Society: Ser. B, 60, 351–363. [79] Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. The MIT Press. [80] Rice, J. and Rosenblatt, M. (1983). Smoothing splines: regression, derivatives and deconvolution. Annals of Statistics, 11, 141–156. [81] Ruppert, D. and Carroll, R. J. (2000). Spatially-adaptive penalties for spline ﬁtting. Aust. N. Z. J. Statist., 42(2), 205–223. [82] Schr¨dinger, E. (1915). Z¨r theorie der fall- und steigversuche an teilchen mit o u Brownscher bewegung. Physikalische Zeitschrift, 16, pp. 289–295. [83] Selke, T. and Siegmund, D. (1983). Sequential analysis of the proportional hazards model. Biometrika, 70, 315–326. [84] Shepp, L. A. (1966). Radon-Nikodynm derivatives of Gaussian measures. Annals of Mathematical Statistics, 37, 321–354. [85] Sherlund, S. M. (2008). The past, present, and future of subprime mortgages. Finance and Economics Discussion Series, 2008-63, U.S. Federal Reserve Board. [86] Silverman, B. W. (1985). Some aspects of the spline smoothing approch to nonparametric curve ﬁtting (with discussion). Journal of the Royal Statistical Society: Ser. B, 47, 1–52. [87] Slud, E. V. (1984). Sequential linear rank tests for two-sample censored survival data. Ann. Statist., 12, 551–571. [88] Stein, M. L. (1999). Interpolation of Spatial Data: Some Theory for Kriging. New York: Springer-Verlag. [89] Sujianto, A., Li, R. and Zhang, A. (2006). GAMEED documentation. Technical report. Enterprise Quantitative Risk Management, Bank of America, N.A. [90] Therneau, R. M. and Grambsch, P. M. (2000). Modeling Survival Data: Extending the Cox Model. Springer, New York. [91] Tweedie, M. C. K. (1957). Statistical Properties of Inverse Gaussian Distributions. I and II. Annals of Mathematical Statistics, 28, 362–377 and 696–705. [92] Vaupel, J. W., Manton, K. G. and Stallard, E. (1979). The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography 16, 439–454.

143

[93] Vasicek, O. (1977). An equilibrium characterization of the term structure. Journal of Financial Economics, 5, 177–188. [94] Vidakovic, B. (1999). Statistical Modeling by Wavelets. Wiley, New York. [95] Wahba, G. (1983). Bayesian conﬁdence intervals for the cross-validated smoothing spline. Journal of the Royal Statistical Society: Ser. B, 45, 133–150. [96] Wahba, G. (1990). Spline Models for Observational Data, SSBMS-NSF Regional Conference Series in Applied Mathematics. Philadelphia: SIAM. [97] Wahba, G. (1995). Discussion of “Wavelet shrinkage: asymptopia” by Donoho, et al. Journal of The Royal Statistical Society, Ser. B, 57, 360-361. [98] Wallace, N. (ed.) (2005). Special issue of “Innovations in mortgage modeling”. Real Estate Economics, 33, 587–782. [99] Wand, M. P. and Jones, M. C. (1995). Kernel Smoothing. Chapman & Hall, London. [100] Wang, J.-L. (2005). Smoothing hazard rate. In Arbmitage, P. and Colton, T. (Eds.): Encyclopedia of Biostatistics (2nd ed.), Volume 7, pp. 4986–4997. Wiley. [101] Wikipedia, The Free Encylopedia. URL: http://en.wikipedia.org

144…...

Free Essay

...University of Florence Faculty of Economy Master’s Degree in Bank, Insurance and Financial Markets Thesis in Applied Statistics for Banks and Insurances Credit Risk Models: Single Firm Default and Contagion Default Analysis Supervisor: P rof essor Fabrizio Cipollini Student: Marco Gambacciani Academic Year 2009/2010 Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Structural Models 1.1 Terminal Default . . . . . . . . . . . . 1.2 First Passage Models . . . . . . . . . . 1.2.1 The Black and Cox’s Model . . 1.2.2 Longstaff and Schwartz’s Model 1.2.3 Leland and Toft’s Model . . . . 1.2.4 Zhou’s Model . . . . . . . . . . 1.2.5 Random Threshold Model . . . 2 5 5 11 11 15 19 24 30 35 36 39 41 45 48 50 51 56 67 76 77 79 79 82 83 84 94 114 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Modelli reduced form 2.1 Approach With An Homogenous Poisson Process . . 2.2 Approach With a Non-Homogenous Poisson Process 2.3 Approach with a Cox’s Process . . . . . . . . . . . . 2.4 Bond and Spread Valuation . . . . . . . . . . . . . . Models For The Correlation Between Defaults 3.1 Bottom-Up Models . . . . . . . . . . . . . 3.1.1 Structural Apporach . . . . . . . . 3.1.2 Intensity Models Approaches . . . 3.1.3 Approaches with...

Words: 33386 - Pages: 134

Free Essay

...scene where they took scene details. I drove the vehicle to Industrial Area Police Station That is all I can state. R.O.C NAME : GEORGE S. OGUTU DATE : 5/7/2012 TIME : 1.30 P.M. SIGNED : ___________________ CREDIT RISK Most financial institutions devote considerable resources to the measurement and management of credit risk. Regulators have for many years required banks to keep capital to reflect the credit risks they are bearing. This capital is in addition to the capital, described in Business Snapshot 20.1, that is required for market risk. Credit risk arises from the possibility that borrowers and counterparties in derivatives transactions may default. This chapter discusses a number of different approaches to estimating the probability that a company will default and explains the key difference between risk-neutral and real-world probabilities of default. It examines the nature of the credit risk in over-the-counter derivatives transactions and discuss the clauses of derivatives dealers write into their contracts to reduce credit risk. It also covers default correlation, Gaussian copula models, and the estimation of credit value at risk. Chapter 23 will discuss credit derivatives and show how ideas introduced in this chapter can be used to value these instruments. CREDIT RATINGS Rating agencies, such as Moody’s S & P, and Fitch, are in the business of providing ratings describing the creditworthiness of corporate bonds. The best rating assigned by Moody’s is Aaa.......

Words: 3863 - Pages: 16

Free Essay

...Commercial Banking The first category of credit risk models are the ones based on the original framework developed by Merton (1974) using the principles of option pricing (Black and Scholes, 1973). * the default process of a company is driven by the value of the company’s assets and the risk of a firm’s default is therefore explicitly linked to the variability of the firm’s asset value. * The basic intuition behind the Merton model is relatively simple: default occurs when the value of a firm’s assets (the market value of the firm) is lower than that of its liabilities. * The payment to the debt holders at the maturity of the debt is therefore the smaller of two quantities: the face value of the debt or the market value of the firm’s assets. * Assuming that the company’s debt is entirely represented by a zero-coupon bond, if the value of the firm at maturity is greater than the face value of the bond, then the bondholder gets back the face value of the bond. * However, if the value of the firm is less than the face value of the bond, the shareholders get nothing and the bondholder gets back the market value of the firm. The payoff at maturity to the bondholder is therefore equivalent to the face value of the bond minus a put option on the value of the firm, with a strike price equal to the face value of the bond and a maturity equal to the maturity of the bond. Following this basic intuition, Merton derived an explicit formula for risky bonds which......

Words: 1011 - Pages: 5

Premium Essay

...Credit risk is the current and prospective risk to earnings or capital arising from an obligor’s failure to meet the terms of any contract with the Bank or otherwise to perform as agreed. Credit risk is found in all activities in which success depends on counterparty, issuer, or borrower performance. It arises any time bank funds are extended, committed, invested, or otherwise exposed through actual or implied contractual agreements, whether reflected on or off the balance sheet. Quantity of Credit Risk Indicators The quantity of credit risk is derived from the absolute amount of credit exposure and the quality of that exposure. How much credit exposure a bank has is a function of: • The level of loans and other credit/credit-equivalent exposures relative to total assets and capital and The extent to which earnings are dependent on loan or other credit/credit-equivalent income sources. • Banks that have higher loans-to-assets and loans-to-equity ratios and that depend heavily on the revenues from credit activities will have a higher quantity of credit risk. To determine the quantity of credit risk, the quantitative and qualitative risk indicators should be reviewed. These indictors can be rapid growth, high past-due levels, exceeds historical norms, or changes in trends or changes in portfolio mix. The following evaluation factors, as appropriate, should be used when assessing the quantity of credit risk. As assessment of low, moderate, or high......

Words: 283 - Pages: 2

Free Essay

...University of Nottingham Credit Risk Management in Major British Banks By Xiuzhu Zhao 2007 A Dissertation presented in part consideration for the degree of “MA Finance and Investment” Acknowledgement I would like to express my special thanks to my supervisor Mrs. Margaret Woods, who has given me strong support and encouragement during the whole research, and I am very appreciate of the expert guidance and inspiration she brought me. I am very grateful to my parents for their love and encouragement during my whole education period. The academic suggestions my father has given help me a lot in designing the dissertation. Last but not least, I would like to thank all my friends especially those in Melton Hall. I will never forget the help they have offered, which raises my confidence in completing this dissertation. i Abstract Credit risk is always treated as the major risk inherent in a bank’s banking and trading activities. And if not well managed, this kind of risk may drag a bank into great trouble or even bankruptcy, which can be proved by various bank failure cases. For banks, managing credit risk is not a simple task since comprehensive considerations and practices are needed for identifying, measuring, controlling and minimizing credit risk. In this dissertation, the credit risk management practices of major British banks are examined through the quantitative research on all Major British Banking Group members and qualitative analysis on the four sample banks....

Words: 23126 - Pages: 93

Premium Essay

...W. Ruland, Spring 2009 Credit Risk The Issue Will the loans will be paid on-time and in full. Who Should Understand Credit Risk? Borrowers Prospective lenders including those who sell on account Should they lend? How should they structure the loan? Shareholders Potential purchasers of products or services Credit Ratings Important to the business Influences the cost of borrowing; perhaps the ability to borrow, Influences the market for its securities given that some investors such as certain pension funds may invest in a firm only if the credit rating exceeds a certain amount. The firms pay the agencies to be rated. Ratings are for specific debt issues; not the entire corporation. In early 2008, Defaultrisk.com reported that at that time, there were 64 rating agencies worldwide. SEC registration. Nationally Recognized S Ratings Organizations are credit rating services that meet certain criteria enabling them to register with the SEC. Eight firms qualified when the book was published; increased to 10 (as of Fall 2008). Page 4-25 managerial decision case. When businesses take measures to improve credit ratings, what are the potential costs? Evaluating Credit Risk -- the Five C’s Character Expected reaction when things get rough Capacity Ability to generate future cash flow for loan service Conditions The future of the business and the economy Capital The balance......

Words: 1601 - Pages: 7

Premium Essay

...thought out. Change in any area, no matter how large or small should be discussed and involve the people it will effect on every level. Allow for input from staff and consider that input wisely and compare it to your written change plan. Often, a competitor’s idea of change that worked doesn’t mean it will work for you. Tools and techniques to help companies transform quickly. 1. Represent the “human side” systematically: Any significant transformation creates “people issues.” New leaders will be asked to step up, jobs will be changed, new skills and capabilities must be developed, and employees will be uncertain and resistant. Dealing with these issues on a reactive, case-by-case basis puts speed, morale, and results at risk. A formal approach for managing change — beginning with the leadership team and then engaging key stakeholders and leaders — should be developed early, and adapted often as change moves through the organization. This demands as much data collection and analysis, planning, and implementation discipline as does a redesign of strategy, systems, or processes. The change-management approach should be fully integrated into program design and decision making, both informing and enabling strategic direction. It should be based on a realistic assessment of the organization’s history, readiness, and capacity to change. 2. Begin with top: Because change is inherently unsettling for people at all levels of an organization,......

Words: 3380 - Pages: 14

Premium Essay

...Counterparty credit risk in portfolio risk management Prominent financial institution failures reminded market participants that over-the-counter derivatives bring counterparty credit risk. Even as these markets move towards settlement through clearing houses, significant volumes of existing and new transactions remain bilaterally settled, especially as non-standard derivatives may not qualify for central clearing. UBS Delta is providing tools for clients to measure counterparty exposure alongside other investment risk Prudent risk management of credit portfolios includes measurement and limitation of exposure to individual issuers to manage concentration risk. Investment portfolios will have limits, for example, on percentage of current value invested in securities issued by “Bank XYZ”. Where over-the-counter (OTC) derivative counterparties are also issuers of securities held, counterparty risk may be incremental to issuer exposure. If a portfolio includes a swap with Bank XYZ as the counterparty, then exposure to them failing on that swap should be considered alongside exposure to them failing on their debt issues. Counterparty credit risk measurement Counterparty exposure is properly measured not just by the current value of trades with a counterparty, but also by how this value can move as markets move. Where sets of trades with counterparties feature multiple risk drivers/asset classes, modelling the potential exposure becomes a complex problem. Many investment banks......

Words: 1881 - Pages: 8

Premium Essay

...ABSTRACT Risk Management is the application of proactive strategy to plan, lead, organize, and control the wide variety of risks that are rushed into the fabric of an organization„s daily and long-term functioning. Like it or not, risk has a say in the achievement of our goals and in the overall success of an organization. Present paper is to make an attempt to identify the risks faced by the banking industry and the process of risk management. This paper also examined the different techniques adopted by banking industry for risk management. To achieve the objectives of the study data has been collected from secondary sources i.e., from Books, journals and online publications, identified various risks faced by the banks, developed the process of risk management and analyzed different risk management techniques. Finally it can be concluded that the banks should take risk more consciously, anticipates adverse changes and hedges accordingly, it becomes a source of competitive advantage, and efficient management of the banking industry. KEYWORDS: Risk Management, Banking Sector, Credit risk, Market risk, Operating Risk, Gab Analysis, Value at Risk (VatR) _____________________________________________________________________________________ INTRODUCTION Risk is defined as anything that can create hindrances in the way of achievement of certain objectives. It can be because of either internal factors or external factors, depending upon the type of risk that......

Words: 3810 - Pages: 16

Premium Essay

...Credit Risk Management: Credit risk can be defined as risk of failure of customer/counterparty of the bank to meet financial obligations. Another major source of credit risk could be concentration risk, which arises when a bank’s credit portfolio tend to be non-diversified i.e. large single borrower exposure or lending exposure to clients having similar economic factors (single sub-sector, industry, geographic region etc.) that would adversely impact the repayment ability of mass obligor during any possible economic downturn. To ensure the portfolio health, the bank has distributed the overall credit concentration among different segments/industry/trading. For example, branches are primarily responsible for sourcing of potential clients and initiate limit (credit) approval process for review of Credit Risk Management Division (CRMD), this division (CRMD) ensure the quality of credit proposal before limit approval, a separate division known as Credit Administration (CAD) monitors the documentation aspects of approved credits and finally the Legal Recovery Department manages the deteriorating accounts. It is mentionable that the bank has own credit risk management guideline. All other remaining risk in regards to credit portfolio are addressed by the Risk Management Unit (RMU), the primary responsibility of this unit isto identify and assess the severity of risks and highlight the same to the management for strategic decision making. Below are risk wise list of few global model...

Words: 571 - Pages: 3

Free Essay

...Basel Committee on Banking Supervision International Convergence of Capital Measurement and Capital Standards A Revised Framework Comprehensive Version This document is a compilation of the June 2004 Basel II Framework, the elements of the 1988 Accord that were not revised during the Basel II process, the 1996 Amendment to the Capital Accord to Incorporate Market Risks, and the 2005 paper on the Application of Basel II to Trading Activities and the Treatment of Double Default Effects. No new elements have been introduced in this compilation. June 2006 Requests for copies of publications, or for additions/changes to the mailing list, should be sent to: Bank for International Settlements Press & Communications CH-4002 Basel, Switzerland E-mail: publications@bis.org Fax: +41 61 280 9100 and +41 61 280 8100 © Bank for International Settlements 2006. All rights reserved. Brief excerpts may be reproduced or translated provided the source is stated. ISBN print: 92-9131-720-9 ISBN web: 92-9197-720-9 Contents Introduction ...............................................................................................................................1 Structure of this document........................................................................................................6 Part 1: Scope of Application .....................................................................................................7 I. Introduction.....................

Words: 153391 - Pages: 614

Premium Essay

... Credit Risks Student’s Name University Affiliation Credit Risks What credit risks are faced by retail banking? Retail banks usually face one major credit risk, and that is the risk that arises if type customers of the bank, as well as any other counterparties that are associated with the bank, fail to meet their loan repayment obligations and thus leads to losses for the bank. Other sources of credit risks include foreign exchange, management of cash as well as derivatives of credit (Anolli, Beccalli & Giordani 2013). How are credit risks associated with individuals different from those associated with institutions? Customer credit risk is the risk that is inherent when the customers of a financial institution fail to make payment for various credit products such as loans, mortgage, and credit cards among other. The institutional credit risk, on the other hand, refers to the risk that arises with the corporate services of a company for example from the investment activities. The institutional credit risks are associated with low losses as well as great rigorousness. What retail banking services does your bank provide to individuals? The bank offers a wide range of banking services which include: depository and saving services, insurance services, lending of loans and offering of debit as well as the credit cards. The bank also offers online banking services to the individuals including online banking and bill payment. What retail banking services......

Words: 359 - Pages: 2

Premium Essay

...Group 11 Final Report Modeling Credit Risks with Enterprise Miner™ Tanu Aggarwal Meryl Lo Zhouzghzu “Jack” Liang Sallie Smith April 15, 2016 0 MIS6324.503/BUAN 6324.503 Table Of Contents Executive Summary ............................................................................................................... 2 1. Project Motivation........................................................................................................... 3 2. Data Description .............................................................................................................. 4 3. Data Preprocessing-Variable Exploration, Imputation and Transformation...................... 5 4. Variable Selection ............................................................................................................ 6 5. Descriptive Statistics of Data ......................................................................................... 11 6. Model Selection............................................................................................................. 12 7. What is a Regression Model? ......................................................................................... 13 8. Logistic Regression Model - Stepwise............................................................................. 14 9. Chosen Models .............................................................................................................. 15 10. Analyzing......

Words: 3333 - Pages: 14

Premium Essay

...STANDARDS OF SOUND BUSINESS PRACTICES CREDIT RISK MANAGEMENT © 2005 The Bank of Jamaica. All rights reserved Bank of Jamaica February 1996 Credit Risk Management Page 2 CREDIT RISK MANAGEMENT A. PURPOSE This document sets out the minimum policies and procedures that each licensee needs to have in place and apply within its credit risk management programme, and the minimum criteria it should use, to prudently manage and control its credit portfolio and exposure to credit risk. Experience indicates that credit quality goes hand in hand with financial soundness. Deterioration in credit quality is often a sign of problems in an institution. The major risk accompanying a weakening of the credit portfolio is the impairment of capital or liquidity. For most institutions, extending credit comprises the major portion of their business. To a great extent, therefore, the quality of an institution’s credit portfolio determines the risk to depositors Credit risk management must be conducted within the context of a comprehensive business plan. Although this document focuses on an institution’s responsibility for managing and controlling its credit portfolio and exposure to credit risk, it is not meant to imply that credit risk can be managed in isolation form asset/liability management considerations, such as the need to maintain adequate liquidity, or other risks. B. DEFINITION Credit is the provision of, or a commitment to provide funds or substitutes for funds (both......

Words: 4318 - Pages: 18

Premium Essay

...CREDIT RISK In the process of financial intermediation, the gap of which becomes thinner and thinner, banks are exposed to severe competition and hence are compelled to encounter various types of financial and non-financial risks. Risks and uncertainties form an integral part of banking which by nature entails taking risks. Business grows mainly by taking risk. Greater the risk, higher the profit and hence the business unit must strike a trade off between the two. The essential functions of risk management are to identify, measure and more importantly monitor the profile of the bank. While Non-Performing Assets are the legacy of the past in the present, Risk Management system is the pro-active action in the present for the future. Managing risk is nothing but managing the change before the risk manages. While new avenues for the bank has opened up they have brought with them new risks as well, which the banks will have to handle and overcome. When we use the term “Risk”, we all mean financial risk or uncertainty of financial loss. If we consider risk in terms of probability of occurrence frequently, we measure risk on a scale, with certainty of occurrence at one end and certainty of non-occurrence at the other end. Risk is the greatest where the probability of occurrence or non-occurrence is equal. As per the Reserve Bank of India guidelines issued in Oct. 1999, there are three major types of risks encountered by the banks and these are Credit Risk, Market Risk......

Words: 879 - Pages: 4