Robust Nonparametric Statistical Methods, Second Edition by Thomas P. Hettmansperger


4056f29eabac666.jpg Author Thomas P. Hettmansperger
Isbn 9781439809082
File size 5.1 MB
Year 2011
Pages 554
Language English
File format PDF
Category mathematics


 

MONOGRAPHS ON STATISTICS AND APPLIED PROBABILITY General Editors F. Bunea, V. Isham, N. Keiding, T. Louis, R. L. Smith, and H. Tong 1 Stochastic Population Models in Ecology and Epidemiology M.S. Barlett (1960) 2 Queues D.R. Cox and W.L. Smith (1961) 3 Monte Carlo Methods J.M. Hammersley and D.C. Handscomb (1964) 4 The Statistical Analysis of Series of Events D.R. Cox and P.A.W. Lewis (1966) 5 Population Genetics W.J. Ewens (1969) 6 Probability, Statistics and Time M.S. Barlett (1975) 7 Statistical Inference S.D. Silvey (1975) 8 The Analysis of Contingency Tables B.S. Everitt (1977) 9 Multivariate Analysis in Behavioural Research A.E. Maxwell (1977) 10 Stochastic Abundance Models S. Engen (1978) 11 Some Basic Theory for Statistical Inference E.J.G. Pitman (1979) 12 Point Processes D.R. Cox and V. Isham (1980) 13 Identification of Outliers D.M. Hawkins (1980) 14 Optimal Design S.D. Silvey (1980) 15 Finite Mixture Distributions B.S. Everitt and D.J. Hand (1981) 16 Classification A.D. Gordon (1981) 17 Distribution-Free Statistical Methods, 2nd edition J.S. Maritz (1995) 18 Residuals and Influence in Regression R.D. Cook and S. Weisberg (1982) 19 Applications of Queueing Theory, 2nd edition G.F. Newell (1982) 20 Risk Theory, 3rd edition R.E. Beard, T. Pentikäinen and E. Pesonen (1984) 21 Analysis of Survival Data D.R. Cox and D. Oakes (1984) 22 An Introduction to Latent Variable Models B.S. Everitt (1984) 23 Bandit Problems D.A. Berry and B. Fristedt (1985) 24 Stochastic Modelling and Control M.H.A. Davis and R. Vinter (1985) 25 The Statistical Analysis of Composition Data J. Aitchison (1986) 26 Density Estimation for Statistics and Data Analysis B.W. Silverman (1986) 27 Regression Analysis with Applications G.B. Wetherill (1986) 28 Sequential Methods in Statistics, 3rd edition G.B. Wetherill and K.D. Glazebrook (1986) 29 Tensor Methods in Statistics P. McCullagh (1987) 30 Transformation and Weighting in Regression R.J. Carroll and D. Ruppert (1988) 31 Asymptotic Techniques for Use in Statistics O.E. Bandorff-Nielsen and D.R. Cox (1989) 32 Analysis of Binary Data, 2nd edition D.R. Cox and E.J. Snell (1989) 33 Analysis of Infectious Disease Data N.G. Becker (1989) 34 Design and Analysis of Cross-Over Trials B. Jones and M.G. Kenward (1989) 35 Empirical Bayes Methods, 2nd edition J.S. Maritz and T. Lwin (1989) 36 Symmetric Multivariate and Related Distributions K.T. Fang, S. Kotz and K.W. Ng (1990) 37 Generalized Linear Models, 2nd edition P. McCullagh and J.A. Nelder (1989) 38 Cyclic and Computer Generated Designs, 2nd edition J.A. John and E.R. Williams (1995) 39 Analog Estimation Methods in Econometrics C.F. Manski (1988) 40 Subset Selection in Regression A.J. Miller (1990) 41 Analysis of Repeated Measures M.J. Crowder and D.J. Hand (1990) 42 Statistical Reasoning with Imprecise Probabilities P. Walley (1991) 43 Generalized Additive Models T.J. Hastie and R.J. Tibshirani (1990) 44 Inspection Errors for Attributes in Quality Control N.L. Johnson, S. Kotz and X. Wu (1991) K10449_FM.indd 2 11/19/10 1:27 PM 45 The Analysis of Contingency Tables, 2nd edition B.S. Everitt (1992) 46 The Analysis of Quantal Response Data B.J.T. Morgan (1992) 47 Longitudinal Data with Serial Correlation—A State-Space Approach R.H. Jones (1993) 48 Differential Geometry and Statistics M.K. Murray and J.W. Rice (1993) 49 Markov Models and Optimization M.H.A. Davis (1993) 50 Networks and Chaos—Statistical and Probabilistic Aspects O.E. Barndorff-Nielsen, J.L. Jensen and W.S. Kendall (1993) 51 Number-Theoretic Methods in Statistics K.-T. Fang and Y. Wang (1994) 52 Inference and Asymptotics O.E. Barndorff-Nielsen and D.R. Cox (1994) 53 Practical Risk Theory for Actuaries C.D. Daykin, T. Pentikäinen and M. Pesonen (1994) 54 Biplots J.C. Gower and D.J. Hand (1996) 55 Predictive Inference—An Introduction S. Geisser (1993) 56 Model-Free Curve Estimation M.E. Tarter and M.D. Lock (1993) 57 An Introduction to the Bootstrap B. Efron and R.J. Tibshirani (1993) 58 Nonparametric Regression and Generalized Linear Models P.J. Green and B.W. Silverman (1994) 59 Multidimensional Scaling T.F. Cox and M.A.A. Cox (1994) 60 Kernel Smoothing M.P. Wand and M.C. Jones (1995) 61 Statistics for Long Memory Processes J. Beran (1995) 62 Nonlinear Models for Repeated Measurement Data M. Davidian and D.M. Giltinan (1995) 63 Measurement Error in Nonlinear Models R.J. Carroll, D. Rupert and L.A. Stefanski (1995) 64 Analyzing and Modeling Rank Data J.J. Marden (1995) 65 Time Series Models—In Econometrics, Finance and Other Fields D.R. Cox, D.V. Hinkley and O.E. Barndorff-Nielsen (1996) 66 Local Polynomial Modeling and its Applications J. Fan and I. Gijbels (1996) 67 Multivariate Dependencies—Models, Analysis and Interpretation D.R. Cox and N. Wermuth (1996) 68 Statistical Inference—Based on the Likelihood A. Azzalini (1996) 69 Bayes and Empirical Bayes Methods for Data Analysis B.P. Carlin and T.A Louis (1996) 70 Hidden Markov and Other Models for Discrete-Valued Time Series I.L. MacDonald and W. Zucchini (1997) 71 Statistical Evidence—A Likelihood Paradigm R. Royall (1997) 72 Analysis of Incomplete Multivariate Data J.L. Schafer (1997) 73 Multivariate Models and Dependence Concepts H. Joe (1997) 74 Theory of Sample Surveys M.E. Thompson (1997) 75 Retrial Queues G. Falin and J.G.C. Templeton (1997) 76 Theory of Dispersion Models B. Jørgensen (1997) 77 Mixed Poisson Processes J. Grandell (1997) 78 Variance Components Estimation—Mixed Models, Methodologies and Applications P.S.R.S. Rao (1997) 79 Bayesian Methods for Finite Population Sampling G. Meeden and M. Ghosh (1997) 80 Stochastic Geometry—Likelihood and computation O.E. Barndorff-Nielsen, W.S. Kendall and M.N.M. van Lieshout (1998) 81 Computer-Assisted Analysis of Mixtures and Applications— Meta-analysis, Disease Mapping and Others D. Böhning (1999) 82 Classification, 2nd edition A.D. Gordon (1999) 83 Semimartingales and their Statistical Inference B.L.S. Prakasa Rao (1999) 84 Statistical Aspects of BSE and vCJD—Models for Epidemics C.A. Donnelly and N.M. Ferguson (1999) 85 Set-Indexed Martingales G. Ivanoff and E. Merzbach (2000) K10449_FM.indd 3 11/19/10 1:27 PM 86 The Theory of the Design of Experiments D.R. Cox and N. Reid (2000) 87 Complex Stochastic Systems O.E. Barndorff-Nielsen, D.R. Cox and C. Klüppelberg (2001) 88 Multidimensional Scaling, 2nd edition T.F. Cox and M.A.A. Cox (2001) 89 Algebraic Statistics—Computational Commutative Algebra in Statistics G. Pistone, E. Riccomagno and H.P. Wynn (2001) 90 Analysis of Time Series Structure—SSA and Related Techniques N. Golyandina, V. Nekrutkin and A.A. Zhigljavsky (2001) 91 Subjective Probability Models for Lifetimes Fabio Spizzichino (2001) 92 Empirical Likelihood Art B. Owen (2001) 93 Statistics in the 21st Century Adrian E. Raftery, Martin A. Tanner, and Martin T. Wells (2001) 94 Accelerated Life Models: Modeling and Statistical Analysis Vilijandas Bagdonavicius and Mikhail Nikulin (2001) 95 Subset Selection in Regression, Second Edition Alan Miller (2002) 96 Topics in Modelling of Clustered Data Marc Aerts, Helena Geys, Geert Molenberghs, and Louise M. Ryan (2002) 97 Components of Variance D.R. Cox and P.J. Solomon (2002) 98 Design and Analysis of Cross-Over Trials, 2nd Edition Byron Jones and Michael G. Kenward (2003) 99 Extreme Values in Finance, Telecommunications, and the Environment Bärbel Finkenstädt and Holger Rootzén (2003) 100 Statistical Inference and Simulation for Spatial Point Processes Jesper Møller and Rasmus Plenge Waagepetersen (2004) 101 Hierarchical Modeling and Analysis for Spatial Data Sudipto Banerjee, Bradley P. Carlin, and Alan E. Gelfand (2004) 102 Diagnostic Checks in Time Series Wai Keung Li (2004) 103 Stereology for Statisticians Adrian Baddeley and Eva B. Vedel Jensen (2004) 104 Gaussian Markov Random Fields: Theory and Applications H˚avard Rue and Leonhard Held (2005) 105 Measurement Error in Nonlinear Models: A Modern Perspective, Second Edition Raymond J. Carroll, David Ruppert, Leonard A. Stefanski, and Ciprian M. Crainiceanu (2006) 106 Generalized Linear Models with Random Effects: Unified Analysis via H-likelihood Youngjo Lee, John A. Nelder, and Yudi Pawitan (2006) 107 Statistical Methods for Spatio-Temporal Systems Bärbel Finkenstädt, Leonhard Held, and Valerie Isham (2007) 108 Nonlinear Time Series: Semiparametric and Nonparametric Methods Jiti Gao (2007) 109 Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis Michael J. Daniels and Joseph W. Hogan (2008) 110 Hidden Markov Models for Time Series: An Introduction Using R Walter Zucchini and Iain L. MacDonald (2009) 111 ROC Curves for Continuous Data Wojtek J. Krzanowski and David J. Hand (2009) 112 Antedependence Models for Longitudinal Data Dale L. Zimmerman and Vicente A. Núñez-Antón (2009) 113 Mixed Effects Models for Complex Data Lang Wu (2010) 114 Intoduction to Time Series Modeling Genshiro Kitagawa (2010) 115 Expansions and Asymptotics for Statistics Christopher G. Small (2010) 116 Statistical Inference: An Integrated Bayesian/Likelihood Approach Murray Aitkin (2010) 117 Circular and Linear Regression: Fitting Circles and Lines by Least Squares Nikolai Chernov (2010) 118 Simultaneous Inference in Regression Wei Liu (2010) 119 Robust Nonparametric Statistical Methods, Second Edition Thomas P. Hettmansperger and Joseph W. McKean (2011) K10449_FM.indd 4 11/19/10 1:27 PM Monographs on Statistics and Applied Probability 119 Robust Nonparametric Statistical Methods Second Edition Thomas P. Hettmansperger Penn State University University Park, Pennsylvania, USA Joseph W. McKean Western Michigan University Kalamazoo, Michigan, USA K10449_FM.indd 5 11/19/10 1:27 PM CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2011 by Taylor and Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed in the United States of America on acid-free paper 10 9 8 7 6 5 4 3 2 1 International Standard Book Number: 978-1-4398-0908-2 (Hardback) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging‑in‑Publication Data Hettmansperger, Thomas P., 1939Robust nonparametric statistical methods / Thomas P. Hettmansperger, Joseph W. McKean. -- 2nd ed. p. cm. -- (Monographs on statistics and applied probability ; 119) Summary: “Often referred to as distribution-free methods, nonparametric methods do not rely on assumptions that the data are drawn from a given probability distribution. With an emphasis on Wilcoxon rank methods that enable a unified approach to data analysis, this book presents a unique overview of robust nonparametric statistical methods. Drawing on examples from various disciplines, the relevant R code for these examples, as well as numerous exercises for self-study, the text covers location models, regression models, designed experiments, and multivariate methods. This edition features a new chapter on cluster correlated data”-- Provided by publisher. Includes bibliographical references and index. ISBN 978-1-4398-0908-2 (hardback) 1. Nonparametric statistics. 2. Robust statistics. I. McKean, Joseph W., 1944- II. Title. III. Series. QA278.8.H47 2010 519.5--dc22 2010044858 Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com K10449_FM.indd 6 11/19/10 1:27 PM i i “book” — 2010/11/17 — 16:39 — page vii — i i vii Dedication: To Ann and to Marge i i i i i i “book” — 2010/11/17 — 16:39 — page ix — i i Contents Preface xv 1 One-Sample Problems 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Location Model . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Geometry and Inference in the Location Model . . . . . . . . . 1.3.1 Computation . . . . . . . . . . . . . . . . . . . . . . . 1.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Properties of Norm-Based Inference . . . . . . . . . . . . . . . 1.5.1 Basic Properties of the Power Function γS (θ) . . . . . 1.5.2 Asymptotic Linearity and Pitman Regularity . . . . . . 1.5.3 Asymptotic Theory and Efficiency Results for θb . . . . 1.5.4 Asymptotic Power and Efficiency Results for the Test Based on S(θ) . . . . . . . . . . . . . . . . . . . . . . . 1.5.5 Efficiency Results for Confidence Intervals Based on S(θ) 1.6 Robustness Properties of Norm-Based Inference . . . . . . . . 1.6.1 Robustness Properties of θb . . . . . . . . . . . . . . . . 1.6.2 Breakdown Properties of Tests . . . . . . . . . . . . . . 1.7 Inference and the Wilcoxon Signed-Rank Norm . . . . . . . . 1.7.1 Null Distribution Theory of T (0) . . . . . . . . . . . . 1.7.2 Statistical Properties . . . . . . . . . . . . . . . . . . . 1.7.3 Robustness Properties . . . . . . . . . . . . . . . . . . 1.8 Inference Based on General Signed-Rank Norms . . . . . . . . 1.8.1 Null Properties of the Test . . . . . . . . . . . . . . . . 1.8.2 Efficiency and Robustness Properties . . . . . . . . . . 1.9 Ranked Set Sampling . . . . . . . . . . . . . . . . . . . . . . . 1.10 L1 Interpolated Confidence Intervals . . . . . . . . . . . . . . 1.11 Two-Sample Analysis . . . . . . . . . . . . . . . . . . . . . . . 1.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 5 13 14 19 20 22 26 27 29 32 33 35 38 39 40 46 48 50 51 57 61 65 70 ix i i i i i i “book” — 2010/11/17 — 16:39 — page x — i i x CONTENTS 2 Two-Sample Problems 77 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 2.2 Geometric Motivation . . . . . . . . . . . . . . . . . . . . . . 78 2.2.1 Least Squares (LS) Analysis . . . . . . . . . . . . . . . 81 2.2.2 Mann-Whitney-Wilcoxon (MWW) Analysis . . . . . . 82 2.2.3 Computation . . . . . . . . . . . . . . . . . . . . . . . 84 2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 2.4 Inference Based on the Mann-Whitney-Wilcoxon . . . . . . . . 87 2.4.1 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 2.4.2 Confidence Intervals . . . . . . . . . . . . . . . . . . . 97 2.4.3 Statistical Properties of the Inference Based on the MWW 97 2.4.4 Estimation of ∆ . . . . . . . . . . . . . . . . . . . . . . 102 2.4.5 Efficiency Results Based on Confidence Intervals . . . . 103 2.5 General Rank Scores . . . . . . . . . . . . . . . . . . . . . . . 105 2.5.1 Statistical Methods . . . . . . . . . . . . . . . . . . . . 109 2.5.2 Efficiency Results . . . . . . . . . . . . . . . . . . . . . 110 2.5.3 Connection between One- and Two-Sample Scores . . . 113 2.6 L1 Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 2.6.1 Analysis Based on the L1 Pseudo-Norm . . . . . . . . . 115 2.6.2 Analysis Based on the L1 Norm . . . . . . . . . . . . . 119 2.7 Robustness Properties . . . . . . . . . . . . . . . . . . . . . . 122 2.7.1 Breakdown Properties . . . . . . . . . . . . . . . . . . 122 2.7.2 Influence Functions . . . . . . . . . . . . . . . . . . . . 123 2.8 Proportional Hazards . . . . . . . . . . . . . . . . . . . . . . . 125 2.8.1 The Log Exponential and the Savage Statistic . . . . . 126 2.8.2 Efficiency Properties . . . . . . . . . . . . . . . . . . . 129 2.9 Two-Sample Rank Set Sampling (RSS) . . . . . . . . . . . . . 131 2.10 Two-Sample Scale Problem . . . . . . . . . . . . . . . . . . . 133 2.10.1 Appropriate Score Functions . . . . . . . . . . . . . . . 133 2.10.2 Efficacy of the Traditional F -Test . . . . . . . . . . . . 142 2.11 Behrens-Fisher Problem . . . . . . . . . . . . . . . . . . . . . 144 2.11.1 Behavior of the Usual MWW Test . . . . . . . . . . . . 144 2.11.2 General Rank Tests . . . . . . . . . . . . . . . . . . . . 146 2.11.3 Modified Mathisen’s Test . . . . . . . . . . . . . . . . . 147 2.11.4 Modified MWW Test . . . . . . . . . . . . . . . . . . . 149 2.11.5 Efficiencies and Discussion . . . . . . . . . . . . . . . . 150 2.12 Paired Designs . . . . . . . . . . . . . . . . . . . . . . . . . . 152 2.12.1 Behavior under Alternatives . . . . . . . . . . . . . . . 156 2.13 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 i i i i i i “book” — 2010/11/17 — 16:39 — page xi — i i CONTENTS 3 Linear Models 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Geometry of Estimation and Tests . . . . . . . . . . . . . . . . 3.2.1 The Geometry of Estimation . . . . . . . . . . . . . . . 3.2.2 The Geometry of Testing . . . . . . . . . . . . . . . . . 3.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Assumptions for Asymptotic Theory . . . . . . . . . . . . . . 3.5 Theory of Rank-Based Estimates . . . . . . . . . . . . . . . . 3.5.1 R Estimators of the Regression Coefficients . . . . . . . 3.5.2 R Estimates of the Intercept . . . . . . . . . . . . . . . 3.6 Theory of Rank-Based Tests . . . . . . . . . . . . . . . . . . . 3.6.1 Null Theory of Rank-Based Tests . . . . . . . . . . . . 3.6.2 Theory of Rank-Based Tests under Alternatives . . . . 3.6.3 Further Remarks on the Dispersion Function . . . . . . 3.7 Implementation of the R Analysis . . . . . . . . . . . . . . . . 3.7.1 Estimates of the Scale Parameter τϕ . . . . . . . . . . 3.7.2 Algorithms for Computing the R Analysis . . . . . . . 3.7.3 An Algorithm for a Linear Search . . . . . . . . . . . . 3.8 L1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9 Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.1 Properties of R Residuals and Model Misspecification . 3.9.2 Standardization of R Residuals . . . . . . . . . . . . . 3.9.3 Measures of Influential Cases . . . . . . . . . . . . . . 3.10 Survival Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 3.11 Correlation Model . . . . . . . . . . . . . . . . . . . . . . . . . 3.11.1 Huber’s Condition for the Correlation Model . . . . . . 3.11.2 Traditional Measure of Association and Its Estimate . 3.11.3 Robust Measure of Association and Its Estimate . . . . 3.11.4 Properties of R Coefficients of Multiple Determination 3.11.5 Coefficients of Determination for Regression . . . . . . 3.12 High Breakdown (HBR) Estimates . . . . . . . . . . . . . . . 3.12.1 Geometry of the HBR Estimates . . . . . . . . . . . . 3.12.2 Weights . . . . . . . . . . . . . . . . . . . . . . . . . . b 3.12.3 Asymptotic Normality of β HBR . . . . . . . . . . . . . 3.12.4 Robustness Properties of the HBR Estimates . . . . . . 3.12.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 3.12.6 Implementation and Examples . . . . . . . . . . . . . . 3.12.7 Studentized Residuals . . . . . . . . . . . . . . . . . . 3.12.8 Example on Curvature Detection . . . . . . . . . . . . 3.13 Diagnostics for Differentiating between Fits . . . . . . . . . . 3.14 Rank-Based Procedures for Nonlinear Models . . . . . . . . . 3.14.1 Implementation . . . . . . . . . . . . . . . . . . . . . . xi 165 165 166 166 169 172 177 180 180 185 191 191 197 201 203 204 207 210 211 213 214 220 227 231 240 240 242 243 245 250 252 252 253 256 260 263 264 265 267 268 276 279 i i i i i i “book” — 2010/11/17 — 16:39 — page xii — i i xii CONTENTS 3.15 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 4 Experimental Designs: Fixed Effects 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4.2 One-way Design . . . . . . . . . . . . . . . . . . . . . . 4.2.1 R Fit of the One-way Design . . . . . . . . . . . 4.2.2 Rank-Based Tests of H0 : µ1 = · · · = µk . . . . 4.2.3 Tests of General Contrasts . . . . . . . . . . . . 4.2.4 More on Estimation of Contrasts and Location . 4.2.5 Pseudo-observations . . . . . . . . . . . . . . . 4.3 Multiple Comparison Procedures . . . . . . . . . . . . 4.3.1 Discussion . . . . . . . . . . . . . . . . . . . . . 4.4 Two-way Crossed Factorial . . . . . . . . . . . . . . . . 4.5 Analysis of Covariance . . . . . . . . . . . . . . . . . . 4.6 Further Examples . . . . . . . . . . . . . . . . . . . . . 4.7 Rank Transform . . . . . . . . . . . . . . . . . . . . . . 4.7.1 Monte Carlo Study . . . . . . . . . . . . . . . . 4.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Models with Dependent Error Structure 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . 5.2 General Mixed Models . . . . . . . . . . . . . . . 5.2.1 Applications . . . . . . . . . . . . . . . . . 5.3 Simple Mixed Models . . . . . . . . . . . . . . . . 5.3.1 Variance Component Estimators . . . . . . 5.3.2 Studentized Residuals . . . . . . . . . . . 5.3.3 Example and Simulation Studies . . . . . 5.3.4 Simulation Studies of Validity . . . . . . . 5.3.5 Simulation Study of Other Score Functions 5.4 Arnold Transformations . . . . . . . . . . . . . . 5.4.1 R Fit Based on Arnold Transformed Data 5.5 General Estimating Equations (GEE) . . . . . . . 5.5.1 Asymptotic Theory . . . . . . . . . . . . . 5.5.2 Implementation and a Monte Carlo Study 5.5.3 Example: Inflammatory Markers . . . . . . 5.6 Time Series . . . . . . . . . . . . . . . . . . . . . 5.6.1 Asymptotic Theory . . . . . . . . . . . . . 5.6.2 Wald-Type Inference . . . . . . . . . . . . 5.6.3 Linear Models with Autoregressive Errors 5.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 291 292 294 296 299 300 302 304 311 313 317 321 325 327 331 . . . . . . . . . . . . . . . . . . . . 337 337 337 342 342 343 344 346 347 349 350 351 356 359 360 362 366 368 370 372 375 i i i i i i “book” — 2010/11/17 — 16:39 — page xiii — i i CONTENTS xiii 6 Multivariate 377 6.1 Multivariate Location Model . . . . . . . . . . . . . . . . . . . 377 6.2 Componentwise Methods . . . . . . . . . . . . . . . . . . . . . 382 6.2.1 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 385 6.2.2 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 386 6.2.3 Componentwise Rank Methods . . . . . . . . . . . . . 390 6.3 Spatial Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 392 6.3.1 Spatial Sign Methods . . . . . . . . . . . . . . . . . . . 392 6.3.2 Spatial Rank Methods . . . . . . . . . . . . . . . . . . 399 6.4 Affine Equivariant and Invariant Methods . . . . . . . . . . . 403 6.4.1 Blumen’s Bivariate Sign Test . . . . . . . . . . . . . . 403 6.4.2 Affine Invariant Sign Tests . . . . . . . . . . . . . . . . 405 6.4.3 The Oja Criterion Function . . . . . . . . . . . . . . . 413 6.4.4 Additional Remarks . . . . . . . . . . . . . . . . . . . 418 6.5 Robustness of Estimates of Location . . . . . . . . . . . . . . 419 6.5.1 Location and Scale Invariance: Componentwise Methods 419 6.5.2 Rotation Invariance: Spatial Methods . . . . . . . . . . 420 6.5.3 The Spatial Hodges-Lehmann Estimate . . . . . . . . . 421 6.5.4 Affine Equivariant Spatial Median . . . . . . . . . . . . 421 6.5.5 Affine Equivariant Oja Median . . . . . . . . . . . . . 422 6.6 Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 422 6.6.1 Test for Regression Effect . . . . . . . . . . . . . . . . 425 6.6.2 The Estimate of the Regression Effect . . . . . . . . . 431 6.6.3 Tests of General Hypotheses . . . . . . . . . . . . . . . 432 6.7 Experimental Designs . . . . . . . . . . . . . . . . . . . . . . . 439 6.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443 A Asymptotic Results A.1 Central Limit Theorems . . . . . . . . . . . . A.2 Simple Linear Rank Statistics . . . . . . . . . A.2.1 Null Asymptotic Distribution Theory . A.2.2 Local Asymptotic Distribution Theory A.2.3 Signed-Rank Statistics . . . . . . . . . A.3 Rank-Based Analysis of Linear Models . . . . A.3.1 Convex Functions . . . . . . . . . . . . A.3.2 Asymptotic Linearity and Quadraticity b and β e A.3.3 Asymptotic Distance between β A.3.4 Consistency of the Test Statistic Fϕ . . A.3.5 Proof of Lemma 3.5.1 . . . . . . . . . . A.4 Asymptotic Linearity for the L1 Analysis . . . A.5 Influence Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 447 448 449 450 457 460 463 464 467 468 469 470 473 i i i i i i “book” — 2010/11/17 — 16:39 — page xiv — i i xiv CONTENTS A.5.1 Influence Function for Estimates Based on Signed-Rank Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . A.5.2 Influence Functions for Chapter 3 . . . . . . . . . . . . b A.5.3 Influence Function of β HBR of Section 3.12.4 . . . . . . A.6 Asymptotic Theory for Section 3.12.3 . . . . . . . . . . . . . . A.7 Asymptotic Theory for Section 3.12.7 . . . . . . . . . . . . . . A.8 Asymptotic Theory for Section 3.13 . . . . . . . . . . . . . . . 474 476 482 484 491 492 References 495 Author Index 521 Index 527 i i i i i i “book” — 2010/11/17 — 16:39 — page xv — i i Preface Basically, I’m not interested in doing research and I never have been. I’m interested in understanding, which is quite a different thing. And often to understand something you have to work it out yourself because no one else has done it. David Blackwell describing himself as a “dilettante” in a 1983 interview for Mathematical People, a collection of profiles and interviews. I don’t believe I can really do without teaching. The reason is, I have to have something so that when I don’t have any ideas and I’m not getting anywhere I can say to myself, “At least I’m living; at least I’m doing something; I’m making some contribution”-it’s just psychological. Richard Feynman Nonparametric inference methods, especially those derived from ranks, have a long and successful history extending back to early work by Frank Wilcoxon in 1945. In the first edition of this monograph we developed rankbased methods from the unifying theme of geometry and continue this approach in the second edition. The least squares norm is replaced by a weighted L1 norm, and the resulting statistical interpretations are similar to those of least squares. This results in rank-based methods or L1 methods depending on the choice of weights. The rank-based methods proceed much like the traditional analysis. Using the norm, models are easily fitted. Diagnostics procedures can then be used to check the quality of fit (model criticism) and to locate outlying points and points of high influence. Upon satisfaction with the fit, rank-based inferential procedures can then be used to conduct the statistical analysis. The advantages of rank-based methods include better power and efficiency at heavy-tailed distributions and robustness against various model violations and pathological data. In the first edition we extended rank methods from univariate location models to linear models and multivariate models, providing a much more extensive set of tools and methods for data analysis. The second edition provides xv i i i i “book” — 2010/11/17 — 16:39 — page xvi — i i xvi PREFACE additional models (including models with dependent error structure and nonlinear models) and methods and extends significantly the possible analyses based on ranks. In the second edition we have retained the material on one- and two-sample problems (Chapters 1 and 2) along with the basic development of rank methods in the linear model (Chapter 3) and fixed effects experimental designs (Chapter 4). Chapter 5, from the first edition, on high breakdown R estimates has been condensed and moved to Chapter 3. In addition, Chapter 3 now contains a new section on rank procedures for nonlinear models. Selected topics from the first four chapters provide a basic graduate course in rank-based methods. The methods are fully illustrated and the theory fully developed. The prerequisites are a basic course in mathematical statistics and some background in applied statistics. For a one semester course, we suggest the first seven sections of Chapter 1, the first four sections of Chapter 2, the first seven sections plus section 9 in Chapter 3, and the first four sections of Chapter 4, and then choice of topics depending on interest. The new Chapter 5 deals with models with dependent error structure. New material on rank methods for mixed models is included along with material on general estimating equations, GEE. Finally, a section on time series has been added. As in the first edition, this new material is illustrated on data sets and R software is made available to the reader. Chapter 6 in both editions deals with multivariate models. In the second edition we have added new material on the development of affine invariant/equivariant sign methods based on transform-retransform techniques. The new methods are computationally efficient as opposed to the earlier affine invariant/equivariant methods. The methods developed in the book can be computed using R libraries and functions. These libraries are discussed and illustrated in the relevant sections. Information on several of these packages and functions (including Robnp, ww, and Rfit) can be obtained at the web site http://www.stat.wmich.edu/mckean/index.html. Hence, we have again expanded significantly the available set of tools and inference methods based on ranks. We have included the data sets for many of our examples in the book. For others, the reader can obtain the data at the Chapman and Hall web site. See also the site http://www.stat.wmich.edu/mckean/index.html for information on the data sets used in this book. We are indebted to many of our students and colleagues for valuable discussions, stimulation, and motivation. In particular, the first author would like to express his sincere thanks for many stimulating hours of discussion with Steve Arnold, Bruce Brown, and Hannu Oja while the second author wants to express his sincere thanks for discussions over the years with Ash Abebe, i i i i i i “book” — 2010/11/17 — 16:39 — page xvii — i i xvii Kim Crimin, Brad Huitema, John Kapenga, John Kloke, Joshua Naranjo, M. Rashid, Jerry Sievers, Jeff Terpstra, and Tom Vidmar. We both would like to express our debt to Simon Sheather, our friend, colleague, and co-author on many papers. We express our thanks to Rob Calver, Sarah Morris, and Michele Dimont of Chapman & Hall/CRC for their assistance in the preparation of this book. Thomas P. Hettmansperger Joseph W. McKean i i i i i i “book” — 2010/11/17 — 16:39 — page 1 — i i Chapter 1 One-Sample Problems 1.1 Introduction Traditional statistical procedures are widely used because they offer the user a unified methodology with which to attack a multitude of problems, from simple location problems to highly complex experimental designs. These procedures are based on least squares fitting. Once the problem has been cast into a model then least squares offers the user: 1. a way of fitting the model by minimizing the Euclidean normed distance between the responses and the conjectured model; 2. diagnostic techniques that check the adequacy of the fit of the model, explore the quality of fit, and detect outlying and/or influential cases; 3. inferential procedures, including confidence procedures, tests of hypotheses, and multiple comparison procedures; 4. computational feasibility. Procedures based on least squares, though, are easily impaired by outlying observations. Indeed one outlying observation is enough to spoil the least squares fit, its associated diagnostics and inference procedures. Even though traditional inference procedures are exact when the errors in the model follow a normal distribution, they can be quite inefficient when the distribution of the errors has longer tails than the normal distribution. For simple location problems, nonparametric methods were proposed by Wilcoxon (1945). These methods consist of test statistics based on the ranks of the data and associated estimates and confidence intervals for location parameters. The test statistics are distribution free in the sense that their null distributions do not depend on the distribution of the errors. It was soon 1 i i i i i i “book” — 2010/11/17 — 16:39 — page 2 — i i 2 CHAPTER 1. ONE-SAMPLE PROBLEMS realized that these procedures are almost as efficient as the traditional methods when the errors follow a normal distribution and, furthermore, are often much more efficient relative to the traditional methods when the error distributions deviate from normality; see Hodges and Lehmann (1956). These procedures possess both robustness of validity and power. In recent years these nonparametric methods have been extended to linear and nonlinear models. In addition, from the perspective of modern robustness theory, contrary to least squares estimates, these rank-based procedures have bounded influence functions and positive breakdown points. Often these nonparametric procedures are thought of as disjoint methods that differ from one problem to another. In this text, we intend to show that this is not the case. Instead, these procedures present a unified methodology analogous to the traditional methods. The four items cited above for the traditional analysis hold for these procedures too. Indeed the only operational difference is that the Euclidean norm is replaced by another norm. There are computational procedures available for the rank-based procedures discussed in this book. We offer the reader a collection of computational functions written in the software language R; see the site http://www.stat.wmich.edu/mckean/. We refer to these computational algorithms as robust nonparametric R algorithms or Robnp. For the chapters on linear models we make use of the set of algorithms ww written by Terpstra and McKean (2005) and the R package Rfit developed by Kloke and McKean (2010). We discuss these functions throughout the text and use them in many of the examples, simulation studies, and exercises. The programming language R (see Ihaka and Gentleman, 1996) is freeware and can run on all (PC, Mac, Linux) platforms. To download the R software and accompanying information, visit the site http://www.r-project.org/. The language R has intrinsic functions for computation of some of the procedures discussed in this and the next chapter. 1.2 Location Model In this chapter we consider the one-sample location problem. This allows us to explore some useful concepts such as distribution freeness and robustness in a simple setting. We extend many of these concepts to more complicated situations in later chapters. We need to first define a location parameter. For a random variable X we often subscript its distribution function by X to avoid confusion. Definition 1.2.1. Let T (H) be a function defined on the set of distribution functions. We say T (H) is a location functional if i i i i i i “book” — 2010/11/17 — 16:39 — page 3 — i i 1.2. LOCATION MODEL 3 1. If G is stochastically larger than F ((G(x) ≤ F (x)) for all x, then T (G) ≥ T (F ); 2. T (HaX+b ) = aT (HX ) + b, a > 0; 3. T (H−X ) = −T (HX ). Then, we call θ = T (H) a location parameter of H. Note that if X has location parameter θ it follows from the second item in the above definition that the random variable e = X −θ has location parameter 0. Suppose X1 , . . . , Xn is a random sample having the common distribution function H(x) and θ = T (H) is a location parameter of interest. We express this by saying that Xi follows the statistical location model, Xi = θ + ei , i = 1, . . . , n , (1.2.1) where e1 , . . . , en are independent and identically distributed random variable with distribution function F (x) and density function f (x) and location T (F ) = 0. It follows that H(x) = F (x − θ) and that T (H) = θ. We next discuss three examples of location parameters that we use throughout this chapter. Other location parameters are discussed in Section 1.8. See Bickel and Lehmann (1975) for additional discussion of location functionals. Example 1.2.1 (The Median Location Functional). First define the inverse of the cdf H(x) by H −1 (u) = inf{x : H(x) ≥ u}. Generally we suppose that H(x) is strictly increasing on its support and this eliminates ambiguities on the selection of the parameter. Now define θ1 = T1 (H) = H −1 (1/2). This is the median functional. Note that if G(x) ≤ F (x) for all x, then G−1 (u) ≥ F −1 (u) for all u; and, in particular, G−1 (1/2) ≥ F −1 (1/2). Hence, T1 (H) satisfies the first condition for a location functional. Next let H ∗ (x) = P (aX + b ≤ x) = H[a−1 (x − b)]. Then it follows at once that H ∗−1 (u) = aH −1 (u) + b and the second condition is satisfied. The third condition follows with an argument similar to the one for the second condition. Example 1.2.2 (The R Mean Location Functional). For the mean R functional let xdH(x), when the mean exists. Note that xdH(x) = 2 = T2 (H) = R θ−1 H (u)du. Now if G(x) ≤ F (x) for all x, then x ≤ G−1 (F (x)). Let x = −1 (u) and weRhave F −1 (u) ≤ G−1 (F (F −1(u)) ≤ G−1 (u). Hence, T2 (G) = RF −1 G (u)du ≥ F −1 (u)du = T2 (F ) and the first condition is satisfied. The other two conditions follow easily from the definition of the integral. Example 1.2.3 (The Pseudo-Median Location Functional). Assume that X1 and X2 are independent and identically distributed, (iid), with distribution i i i i i i “book” — 2010/11/17 — 16:39 — page 4 — i i 4 CHAPTER 1. ONE-SAMPLE PROBLEMS function H(x). Let Y = R (X1 + X2 )/2. Then Y has distribution function H ∗ (y) = P (Y ≤ y) = H(2y − x)h(x)dx. Let θ3 = T3 (H) = H ∗−1 (1/2). To show that T3 is a location functional, suppose G(x) ≤ F (x) for all x. Then  Z Z Z 2y−x ∗ G (y) = G(2y − x)g(x) dx = g(t) dt g(x) dx −∞  Z Z 2y−x ≤ f (t) dt g(x) dx −∞  Z Z 2y−t = g(x) dt f (t) dx −∞  Z Z 2y−t ≤ f (x) dt f (t) dx = F ∗ (y) ; −∞ hence, as in Example 1.2.1, it follows that G∗−1 (u) ≥ F ∗−1 (u) and, hence, that T3 (G) ≥ T3 (F ). For the second property, let W = aX + b where X has distribution function H and a > 0. Then W has distribution function FW (t) = H((t − b)/a). Then by the change of variable z = (x − b)/a, we have       Z Z y−b x−b 2y − x − b 1 ∗ dx = H 2 h − z h(z) dz . FW (y) = H a a a a Thus the defining equation for T3 (FW ) is   Z 1 T3 (FW ) − b = H 2 − z h(z) dz , 2 a which is satisfied for T3 (FW ) = aT3 (H) + b. For the third property, let V = −X where X has distribution function H. Then V has distribution function FV (t) = 1 − H(−t). Hence, by the change in variable z = −x, Z Z ∗ FV (y) = (1 − H(−2y + x))h(−x) dx = 1 − H(−2y − z))h(z) dz . Because the defining equation of T3 (FV ) can be written as Z 1 = H(2(−T3 (FV )) − z)h(z) dz , 2 it follows that T3 (FV ) = −T3 (H). Therefore, T3 is a location functional. It has been called the pseudo-median by Hoyland (1965) and is more appropriate for symmetric distributions. The next theorem characterizes all the location functionals for a symmetric distribution. i i i i

Author Thomas P. Hettmansperger Isbn 9781439809082 File size 5.1 MB Year 2011 Pages 554 Language English File format PDF Category Mathematics Book Description: FacebookTwitterGoogle+TumblrDiggMySpaceShare This book gives an excellent treatment of modern rank-based methods with a special attention to their practical application to data. … a welcome highly up-to-date and very readable contribution to the field. It will certainly become a standard reference for nonparametric and robust methods. I recommend the book as an important textbook for research libraries. The book will soon find its place on the shelves and the tables of many kind of researchers and will serve as a graduate course textbook.     Download (5.1 MB) Statistical Methods For Handling Incomplete Data Applied Nonparametric Statistical Methods, Fourth Edition An Introduction To Queueing Theory Multilevel Modeling Using Mplus Introduction to Enumerative Combinatorics Load more posts

Leave a Reply

Your email address will not be published. Required fields are marked *