Privacy-Preserving Data Mining: Models and Algorithms by Charu C. Aggarwal


7758a17a496a050-261x361.jpeg Author Charu C. Aggarwal
Isbn 9780387709918
File size 5.3MB
Year 2008
Pages 514
Language English
File format PDF
Category security



 

Privacy-Preserving Data Mining Models and Algorithms ADVANCES IN DATABASE SYSTEMS Volume 34 Series Editors Ahmed K. Elmagarmid Amit P. Sheth Purdue University West Lafayette, IN 47907 Wright State University Dayton, Ohio 45435 Other books in the Series: SEQUENCE DATA MINING, Guozhu Dong, Jian Pei; ISBN: 978-0-387-69936-3 DATA STREAMS: Models and Algorithms, edited by Charu C. Aggarwal; ISBN: 978-0-387-28759-1 SIMILARITY SEARCH: The Metric Space Approach, P. Zezula, G. Amato, V. Dohnal, M. Batko; ISBN: 0-387-29146-6 STREAM DATA MANAGEMENT, Nauman Chaudhry, Kevin Shaw, Mahdi Abdelguerfi; ISBN: 0-387-24393-3 FUZZY DATABASE MODELING WITH XML, Zongmin Ma; ISBN: 0-387- 24248-1 MINING SEQUENTIAL PATTERNS FROM LARGE DATA SETS, Wei Wang and Jiong Yang; ISBN: 0-387-24246-5 ADVANCED SIGNATURE INDEXING FOR MULTIMEDIA AND WEB APPLICATIONS, Yannis Manolopoulos, Alexandros Nanopoulos, Eleni Tousidou; ISBN: 1-4020-7425-5 ADVANCES IN DIGITAL GOVERNMENT: Technology, Human Factors, and Policy, edited by William J. McIver, Jr. and Ahmed K. Elmagarmid; ISBN: 1-4020-7067-5 INFORMATION AND DATABASE QUALITY, Mario Piattini, Coral Calero and Marcela Genero; ISBN: 0-7923-7599-8 DATA QUALITY, Richard Y. Wang, Mostapha Ziad, Yang W. Lee: ISBN: 0-7923-7215-8 THE FRACTAL STRUCTURE OF DATA REFERENCE: Applications to the Memory Hierarchy, Bruce McNutt; ISBN: 0-7923-7945-4 SEMANTIC MODELS FOR MULTIMEDIA DATABASE SEARCHING AND BROWSING, ShuChing Chen, R.L. Kashyap, and Arif Ghafoor; ISBN: 0-7923-7888-1 INFORMATION BROKERING ACROSS HETEROGENEOUS DIGITAL DATA: A Metadatabased Approach, Vipul Kashyap, Amit Sheth; ISBN: 0-7923-7883-0 DATA DISSEMINATION IN WIRELESS COMPUTING ENVIRONMENTS, Kian-Lee Tan and Beng Chin Ooi; ISBN: 0-7923-7866-0 MIDDLEWARE NETWORKS: Concept, Design and Deployment of Internet Infrastructure, Michah Lerner, George Vanecek, Nino Vidovic, Dad Vrsalovic; ISBN: 0-7923-7840-7 ADVANCED DATABASE INDEXING, Yannis Manolopoulos, Yannis Theodoridis, Vassilis J. Tsotras; ISBN: 0-7923-7716-8 MULTILEVEL SECURE TRANSACTION PROCESSING, Vijay Atluri, Sushil Jajodia, Binto George ISBN: 0-7923-7702-8 FUZZY LOGIC IN DATA MODELING, Guoqing Chen ISBN: 0-7923-8253-6 PRIVACY-PRESERVING DATA MINING: Models and Algorithms, edited by Charu C. Aggarwal and Philip S. Yu; ISBN: 0-387-70991-8 Privacy-Preserving Data Mining Models and Algorithms Edited by Charu C. Aggarwal IBM T.J. Watson Research Center, USA and Philip S. Yu University of Illinois at Chicago, USA ABC Editors: Charu C. Aggarwal IBM Thomas J. Watson Research Center 19 Skyline Drive Hawthorne NY 10532 [email protected] Series Editors Ahmed K. Elmagarmid Purdue University West Lafayette, IN 47907 Philip S. Yu Department of Computer Science University of Illinois at Chicago 854 South Morgan Street Chicago, IL 60607-7053 [email protected] Amit P. Sheth Wright State University Dayton, Ohio 45435 ISBN 978-0-387-70991-8 e-ISBN 978-0-387-70992-5 DOI 10.1007/978-0-387-70992-5 Library of Congress Control Number: 2007943463 c 2008 Springer Science+Business Media, LLC. ° All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper 9 8 7 6 5 4 3 2 1 springer.com Preface In recent years, advances in hardware technology have lead to an increase in the capability to store and record personal data about consumers and individuals. This has lead to concerns that the personal data may be misused for a variety of purposes. In order to alleviate these concerns, a number of techniques have recently been proposed in order to perform the data mining tasks in a privacy-preserving way. These techniques for performing privacy-preserving data mining are drawn from a wide array of related topics such as data mining, cryptography and information hiding. The material in this book is designed to be drawn from the different topics so as to provide a good overview of the important topics in the field. While a large number of research papers are now available in this field, many of the topics have been studied by different communities with different styles. At this stage, it becomes important to organize the topics in such a way that the relative importance of different research areas is recognized. Furthermore, the field of privacy-preserving data mining has been explored independently by the cryptography, database and statistical disclosure control communities. In some cases, the parallel lines of work are quite similar, but the communities are not sufficiently integrated for the provision of a broader perspective. This book will contain chapters from researchers of all three communities and will therefore try to provide a balanced perspective of the work done in this field. This book will be structured as an edited book from prominent researchers in the field. Each chapter will contain a survey which contains the key research content on the topic, and the future directions of research in the field. Emphasis will be placed on making each chapter self-sufficient. While the chapters will be written by different researchers, the topics and content is organized in such a way so as to present the most important models, algorithms, and applications in the privacy field in a structured and concise way. In addition, attention is paid in drawing chapters from researchers working in different areas in order to provide different points of view. Given the lack of structurally organized information on the topic of privacy, the book will provide insights which are not easily accessible otherwise. A few chapters in the book are not surveys, since the corresponding topics fall in the emerging category, and enough material is vi Preface not available to create a survey. In such cases, the individual results have been included to give a flavor of the emerging research in the field. It is expected that the book will be a great help to researchers and graduate students interested in the topic. While the privacy field clearly falls in the emerging category because of its recency, it is now beginning to reach a maturation and popularity point, where the development of an overview book on the topic becomes both possible and necessary. It is hoped that this book will provide a reference to students, researchers and practitioners in both introducing the topic of privacypreserving data mining and understanding the practical and algorithmic aspects of the area. Contents Preface v List of Figures xvii List of Tables xxi 1 An Introduction to Privacy-Preserving Data Mining Charu C. Aggarwal, Philip S. Yu 1.1. Introduction 1.2. Privacy-Preserving Data Mining Algorithms 1.3. Conclusions and Summary References 2 A General Survey of Privacy-Preserving Data Mining Models and Algorithms Charu C. Aggarwal, Philip S. Yu 2.1. Introduction 2.2. The Randomization Method 2.2.1 Privacy Quantification 2.2.2 Adversarial Attacks on Randomization 2.2.3 Randomization Methods for Data Streams 2.2.4 Multiplicative Perturbations 2.2.5 Data Swapping 2.3. Group Based Anonymization 2.3.1 The k -Anonymity Framework 2.3.2 Personalized Privacy-Preservation 2.3.3 Utility Based Privacy Preservation 2.3.4 Sequential Releases 2.3.5 The l -diversity Method 2.3.6 The t -closeness Model 2.3.7 Models for Text, Binary and String Data 2.4. Distributed Privacy-Preserving Data Mining 2.4.1 Distributed Algorithms over Horizontally Partitioned Data Sets 2.4.2 Distributed Algorithms over Vertically Partitioned Data 2.4.3 Distributed Algorithms for k -Anonymity 1 1 3 7 8 11 11 13 15 18 18 19 19 20 20 24 24 25 26 27 27 28 30 31 32 viii Contents 2.5. Privacy-Preservation of Application Results 2.5.1 Association Rule Hiding 2.5.2 Downgrading Classifier Effectiveness 2.5.3 Query Auditing and Inference Control 2.6. Limitations of Privacy: The Curse of Dimensionality 2.7. Applications of Privacy-Preserving Data Mining 2.7.1 Medical Databases: The Scrub and Datafly Systems 2.7.2 Bioterrorism Applications 2.7.3 Homeland Security Applications 2.7.4 Genomic Privacy 2.8. Summary References 3 A Survey of Inference Control Methods for Privacy-Preserving Data Mining Josep Domingo-Ferrer 3.1. Introduction 3.2. A classification of Microdata Protection Methods 3.3. Perturbative Masking Methods 3.3.1 Additive Noise 3.3.2 Microaggregation 3.3.3 Data Wapping and Rank Swapping 3.3.4 Rounding 3.3.5 Resampling 3.3.6 PRAM 3.3.7 MASSC 3.4. Non-perturbative Masking Methods 3.4.1 Sampling 3.4.2 Global Recoding 3.4.3 Top and Bottom Coding 3.4.4 Local Suppression 3.5. Synthetic Microdata Generation 3.5.1 Synthetic Data by Multiple Imputation 3.5.2 Synthetic Data by Bootstrap 3.5.3 Synthetic Data by Latin Hypercube Sampling 3.5.4 Partially Synthetic Data by Cholesky Decomposition 3.5.5 Other Partially Synthetic and Hybrid Microdata Approaches 3.5.6 Pros and Cons of Synthetic Microdata 3.6. Trading off Information Loss and Disclosure Risk 3.6.1 Score Construction 3.6.2 R-U Maps 3.6.3 k -anonymity 3.7. Conclusions and Research Directions References 32 33 34 34 37 38 39 40 40 42 43 43 53 54 55 58 58 59 61 62 62 62 63 63 64 64 65 65 65 65 66 66 67 67 68 69 69 71 71 72 73 ix Contents 4 Measures of Anonymity Suresh Venkatasubramanian 4.1. Introduction 4.1.1 What is Privacy? 4.1.2 Data Anonymization Methods 4.1.3 A Classification of Methods 4.2. Statistical Measures of Anonymity 4.2.1 Query Restriction 4.2.2 Anonymity via Variance 4.2.3 Anonymity via Multiplicity 4.3. Probabilistic Measures of Anonymity 4.3.1 Measures Based on Random Perturbation 4.3.2 Measures Based on Generalization 4.3.3 Utility vs Privacy 4.4. Computational Measures of Anonymity 4.4.1 Anonymity via Isolation 4.5. Conclusions and New Directions 4.5.1 New Directions References 81 81 82 83 84 85 85 85 86 87 87 90 94 94 97 97 98 99 5 k-Anonymous Data Mining: A Survey 105 V. Ciriani, S. De Capitani di Vimercati, S. Foresti, and P. Samarati 5.1. Introduction 5.2. k -Anonymity 5.3. Algorithms for Enforcing k -Anonymity 5.4. k -Anonymity Threats from Data Mining 5.4.1 Association Rules 5.4.2 Classification Mining 5.5. k -Anonymity in Data Mining 5.6. Anonymize-and-Mine 5.7. Mine-and-Anonymize 5.7.1 Enforcing k -Anonymity on Association Rules 5.7.2 Enforcing k -Anonymity on Decision Trees 5.8. Conclusions Acknowledgments References 105 107 110 117 118 118 120 123 126 126 130 133 133 134 6 A Survey of Randomization Methods for Privacy-Preserving Data Mining Charu C. Aggarwal, Philip S. Yu 6.1. Introduction 6.2. Reconstruction Methods for Randomization 6.2.1 The Bayes Reconstruction Method 6.2.2 The EM Reconstruction Method 6.2.3 Utility and Optimality of Randomization Models 137 137 139 139 141 143 x Contents 6.3. Applications of Randomization 6.3.1 Privacy-Preserving Classification with Randomization 6.3.2 Privacy-Preserving OLAP 6.3.3 Collaborative Filtering 6.4. The Privacy-Information Loss Tradeoff 6.5. Vulnerabilities of the Randomization Method 6.6. Randomization of Time Series Data Streams 6.7. Multiplicative Noise for Randomization 6.7.1 Vulnerabilities of Multiplicative Randomization 6.7.2 Sketch Based Randomization 6.8. Conclusions and Summary References 7 A Survey of Multiplicative Perturbation for Privacy-Preserving Data Mining Keke Chen and Ling Liu 7.1. Introduction 7.1.1 Data Privacy vs. Data Utility 7.1.2 Outline 7.2. Definition of Multiplicative Perturbation 7.2.1 Notations 7.2.2 Rotation Perturbation 7.2.3 Projection Perturbation 7.2.4 Sketch-based Approach 7.2.5 Geometric Perturbation 7.3. Transformation Invariant Data Mining Models 7.3.1 Definition of Transformation Invariant Models 7.3.2 Transformation-Invariant Classification Models 7.3.3 Transformation-Invariant Clustering Models 7.4. Privacy Evaluation for Multiplicative Perturbation 7.4.1 A Conceptual Multidimensional Privacy Evaluation Model 7.4.2 Variance of Difference as Column Privacy Metric 7.4.3 Incorporating Attack Evaluation 7.4.4 Other Metrics 7.5. Attack Resilient Multiplicative Perturbations 7.5.1 Naive Estimation to Rotation Perturbation 7.5.2 ICA-Based Attacks 7.5.3 Distance-Inference Attacks 7.5.4 Attacks with More Prior Knowledge 7.5.5 Finding Attack-Resilient Perturbations 7.6. Conclusion Acknowledgment References 8 A Survey of Quantification of Privacy Preserving Data Mining Algorithms Elisa Bertino, Dan Lin and Wei Jiang 8.1. Introduction 8.2. Metrics for Quantifying Privacy Level 8.2.1 Data Privacy 144 144 145 145 146 149 151 152 153 153 154 154 157 158 159 160 161 161 161 162 164 164 165 166 166 167 168 168 169 170 171 171 171 173 174 176 177 177 178 179 183 184 186 186 Contents 8.2.2 Result Privacy Metrics for Quantifying Hiding Failure Metrics for Quantifying Data Quality 8.4.1 Quality of the Data Resulting from the PPDM Process 8.4.2 Quality of the Data Mining Results 8.5. Complexity Metrics 8.6. How to Select a Proper Metric 8.7. Conclusion and Research Directions References 8.3. 8.4. 9 A Survey of Utility-based Privacy-Preserving Data Transformation Methods Ming Hua and Jian Pei 9.1. Introduction 9.1.1 What is Utility-based Privacy Preservation? 9.2. Types of Utility-based Privacy Preservation Methods 9.2.1 Privacy Models 9.2.2 Utility Measures 9.2.3 Summary of the Utility-Based Privacy Preserving Methods 9.3. Utility-Based Anonymization Using Local Recoding 9.3.1 Global Recoding and Local Recoding 9.3.2 Utility Measure 9.3.3 Anonymization Methods 9.3.4 Summary and Discussion 9.4. The Utility-based Privacy Preserving Methods in Classification Problems 9.4.1 The Top-Down Specialization Method 9.4.2 The Progressive Disclosure Algorithm 9.4.3 Summary and Discussion 9.5. Anonymized Marginal: Injecting Utility into Anonymized Data Sets 9.5.1 Anonymized Marginal 9.5.2 Utility Measure 9.5.3 Injecting Utility Using Anonymized Marginals 9.5.4 Summary and Discussion 9.6. Summary Acknowledgments References 10 Mining Association Rules under Privacy Constraints Jayant R. Haritsa 10.1. Introduction 10.2. Problem Framework 10.2.1 Database Model 10.2.2 Mining Objective 10.2.3 Privacy Mechanisms 10.2.4 Privacy Metric 10.2.5 Accuracy Metric xi 191 192 193 193 198 200 201 202 202 207 208 209 210 210 212 214 214 215 216 217 219 219 220 224 228 228 229 230 231 233 234 234 234 239 239 240 240 241 241 243 245 xii Contents 10.3. Evolution of the Literature 10.4. The FRAPP Framework 10.4.1 Reconstruction Model 10.4.2 Estimation Error 10.4.3 Randomizing the Perturbation Matrix 10.4.4 Efficient Perturbation 10.4.5 Integration with Association Rule Mining 10.5. Sample Results 10.6. Closing Remarks Acknowledgments References 11 A Survey of Association Rule Hiding Methods for Privacy Vassilios S. Verykios and Aris Gkoulalas-Divanis 11.1. Introduction 11.2. Terminology and Preliminaries 11.3. Taxonomy of Association Rule Hiding Algorithms 11.4. Classes of Association Rule Algorithms 11.4.1 Heuristic Approaches 11.4.2 Border-based Approaches 11.4.3 Exact Approaches 11.5. Other Hiding Approaches 11.6. Metrics and Performance Analysis 11.7. Discussion and Future Trends 11.8. Conclusions References 12 A Survey of Statistical Approaches to Preserving Confidentiality of Contingency Table Entries Stephen E. Fienberg and Aleksandra B. Slavkovic 12.1. Introduction 12.2. The Statistical Approach Privacy Protection 12.3. Datamining Algorithms, Association Rules, and Disclosure Limitation 12.4. Estimation and Disclosure Limitation for Multi-way Contingency Tables 12.5. Two Illustrative Examples 12.5.1 Example 1: Data from a Randomized Clinical Trial 12.5.2 Example 2: Data from the 1993 U.S. Current Population Survey 12.6. Conclusions Acknowledgments References 13 A Survey of Privacy-Preserving Methods Across Horizontally Partitioned Data Murat Kantarcioglu 13.1. Introduction 246 251 252 253 256 256 258 259 263 263 263 267 267 269 270 271 272 277 278 279 281 284 285 286 291 291 292 294 295 301 301 305 308 309 309 313 313 Contents 13.2. Basic Cryptographic Techniques for Privacy-Preserving Distributed Data Mining 13.3. Common Secure Sub-protocols Used in Privacy-Preserving Distributed Data Mining 13.4. Privacy-preserving Distributed Data Mining on Horizontally Partitioned Data 13.5. Comparison to Vertically Partitioned Data Model 13.6. Extension to Malicious Parties 13.7. Limitations of the Cryptographic Techniques Used in PrivacyPreserving Distributed Data Mining 13.8. Privacy Issues Related to Data Mining Results 13.9. Conclusion References 14 A Survey of Privacy-Preserving Methods Across Vertically Partitioned Data Jaideep Vaidya 14.1. Introduction 14.2. Classification 14.2.1 Na¨ıve Bayes Classification 14.2.2 Bayesian Network Structure Learning 14.2.3 Decision Tree Classification 14.3. Clustering 14.4. Association Rule Mining 14.5. Outlier detection 14.5.1 Algorithm 14.5.2 Security Analysis 14.5.3 Computation and Communication Analysis 14.6. Challenges and Research Directions References 15 A Survey of Attack Techniques on Privacy-Preserving Data Perturbation Methods Kun Liu, Chris Giannella, and Hillol Kargupta 15.1. Introduction 15.2. Definitions and Notation 15.3. Attacking Additive Data Perturbation 15.3.1 Eigen-Analysis and PCA Preliminaries 15.3.2 Spectral Filtering 15.3.3 SVD Filtering 15.3.4 PCA Filtering 15.3.5 MAP Estimation Attack 15.3.6 Distribution Analysis Attack 15.3.7 Summary 15.4. Attacking Matrix Multiplicative Data Perturbation 15.4.1 Known I/O Attacks 15.4.2 Known Sample Attack 15.4.3 Other Attacks Based on ICA xiii 315 318 323 326 327 329 330 332 332 337 337 341 342 343 344 346 347 349 351 352 354 355 356 359 360 360 361 362 363 364 365 366 367 367 369 370 373 374 xiv Contents 15.4.4 Summary 15.5. Attacking k-Anonymization 15.6. Conclusion Acknowledgments References 16 Private Data Analysis via Output Perturbation Kobbi Nissim 16.1. Introduction 16.2. The Abstract Model – Statistical Databases, Queries, and Sanitizers 16.3. Privacy 16.3.1 Interpreting the Privacy Definition 16.4. The Basic Technique: Calibrating Noise to Sensitivity 16.4.1 Applications: Functions with Low Global Sensitivity 16.5. Constructing Sanitizers for Complex Functionalities 16.5.1 k-Means Clustering 16.5.2 SVD and PCA 16.5.3 Learning in the Statistical Queries Model 16.6. Beyond the Basics 16.6.1 Instance Based Noise and Smooth Sensitivity 16.6.2 The Sample-Aggregate Framework 16.6.3 A General Sanitization Mechanism 16.7. Related Work and Bibliographic Notes Acknowledgments References 17 A Survey of Query Auditing Techniques for Data Privacy Shubha U. Nabar, Krishnaram Kenthapadi, Nina Mishra and Rajeev Motwani 17.1. Introduction 17.2. Auditing Aggregate Queries 17.2.1 Offline Auditing 17.2.2 Online Auditing 17.3. Auditing Select-Project-Join Queries 17.4. Challenges in Auditing 17.5. Reading References 18 Privacy and the Dimensionality Curse Charu C. Aggarwal 18.1. Introduction 18.2. The Dimensionality Curse and the k -anonymity Method 18.3. The Dimensionality Curse and Condensation 18.4. The Dimensionality Curse and the Randomization Method 18.4.1 Effects of Public Information 18.4.2 Effects of High Dimensionality 18.4.3 Gaussian Perturbing Distribution 18.4.4 Uniform Perturbing Distribution 375 376 376 377 377 383 383 385 388 390 394 396 400 401 403 404 405 406 408 409 409 411 411 415 415 416 417 418 426 427 429 430 433 433 435 441 446 446 450 450 455 Contents 18.5. The Dimensionality Curse and l-diversity 18.6. Conclusions and Research Directions References 19 Personalized Privacy Preservation Yufei Tao and Xiaokui Xiao 19.1. Introduction 19.2. Formalization of Personalized Anonymity 19.2.1 Personal Privacy Requirements 19.2.2 Generalization 19.3. Combinatorial Process of Privacy Attack 19.3.1 Primary Case 19.3.2 Non-primary Case 19.4. Theoretical Foundation 19.4.1 Notations and Basic Properties 19.4.2 Derivation of the Breach Probability 19.5. Generalization Algorithm 19.5.1 The Greedy Framework 19.5.2 Optimal SA-generalization 19.6. Alternative Forms of Personalized Privacy Preservation 19.6.1 Extension of k -anonymity 19.6.2 Personalization in Location Privacy Protection 19.7. Summary and Future Work References xv 458 459 460 461 461 463 464 465 467 468 469 470 471 472 473 474 476 478 479 480 482 485 20 Privacy-Preserving Data Stream Classification Yabo Xu, Ke Wang, Ada Wai-Chee Fu, Rong She, and Jian Pei 20.1. Introduction 20.1.1 Motivating Example 20.1.2 Contributions and Paper Outline 20.2. Related Works 20.3. Problem Statement 20.3.1 Secure Join Stream Classification 20.3.2 Naive Bayesian Classifiers 20.4. Our Approach 20.4.1 Initialization 20.4.2 Bottom-Up Propagation 20.4.3 Top-Down Propagation 20.4.4 Using NBC 20.4.5 Algorithm Analysis 20.5. Empirical Studies 20.5.1 Real-life Datasets 20.5.2 Synthetic Datasets 20.5.3 Discussion 20.6. Conclusions References 487 488 490 491 493 493 494 495 495 496 497 499 500 501 502 504 506 507 508 Index 511 487 List of Figures 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 5.15 5.16 5.17 5.18 5.19 5.20 6.1 7.1 Simplified representation of a private table An example of domain and value generalization hierarchies Classification of k-anonymity techniques [11] Generalization hierarchy for QI={Marital status, Sex} Index assignment to attributes Marital status and Sex An example of set enumeration tree over set I = {1, 2, 3} of indexes Sub-hierarchies computed by Incognito for the table in Figure 5.1 Spatial representation (a) and possible partitioning (b)-(d) of the table in Figure 5.1 An example of decision tree Different approaches for combining k-anonymity and data mining An example of top-down anonymization for the private table in Figure 5.1 Frequent itemsets extracted from the table in Figure 5.1 An example of binary table Itemsets extracted from the table in Figure 5.13(b) Itemsets with support at least equal to 40 (a) and corresponding anonymized itemsets (b) 3-anonymous version of the tree of Figure 5.9 Suppression of occurrences in non-leaf nodes in the tree in Figure 5.9 Table inferred from the decision tree in Figure 5.17 11-anonymous version of the tree in Figure 5.17 Table inferred from the decision tree in Figure 5.19 Illustration of the Information Loss Metric Using known points and distance relationship to infer the rotation matrix 108 109 110 111 112 113 114 116 119 120 124 127 128 128 129 131 132 132 132 133 149 175 xviii List of Figures 9.1 9.2 9.3 9.4 10.1 10.2 A taxonomy tree on categorical attribute Education A taxonomy tree on continuous attribute Age Interactive graph A decomposition CENSUS (γ = 19) Perturbation Matrix Condition Numbers (γ = 19) 13.1 Relationship between Secure Sub-protocols and Privacy Preserving Distributed Data Mining on Horizontally Partitioned Data Two dimensional problem that cannot be decomposed into two one-dimensional problems Wigner’s semi-circle law: a histogram of the eigenvalues  √ for a large, randomly generated A of A+A 2 2p Skeleton of a simulatable private randomized auditor Some Examples of Generalization for 2-Anonymity Upper Bound of 2-anonymity Probability in an Non-Empty Grid Cell Fraction of Data Points Preserving 2-Anonymity with Data Dimensionality (Gaussian Clusters) Minimum Information Loss for 2-Anonymity (Gaussian Clusters) Randomization Level with Increasing Dimensionality, Perturbation level = 8 · σ o (U niDis) Microdata and generalization The taxonomy of attribute Disease A possible result of our generalization scheme The voter registration list Algorithm for computing personalized generalization Algorithm for finding the optimal SA-generalization Personalized k-anonymous generalization Related streams / tables The join stream Example with 3 streams at initialization After bottom-up propagations After top-down propagations UK road accident data (2001) Classifier accuracy Time per input tuple 14.1 15.1 17.1 18.1 18.2 18.3 18.4 18.5 19.1 19.2 19.3 19.4 19.5 19.6 19.7 20.1 20.2 20.3 20.4 20.5 20.6 20.7 20.8 221 221 232 232 261 262 323 340 363 423 435 439 440 445 457 462 463 466 468 474 478 480 489 489 496 498 499 502 503 503 List of Figures xix 20.9 20.10 20.11 20.12 20.13 505 505 506 506 507 Classifier accuracy vs. window size Classifier accuracy vs. concept drifting interval Time per input tuple vs. window size Time per input tuple vs. blow-up ratio Time per input tuple vs. number of streams List of Tables 3.1 3.2 Perturbative methods vs data types. “X” denotes applicable and “(X)” denotes applicable with some adaptation Example of rank swapping. Left, original file; right, rankswapped file Non-perturbative methods vs data types The original table A 2-anonymized table with better utility A 2-anonymized table with poorer utility Summary of utility-based privacy preserving methods 3-anonymous table by global recoding 3-anonymous table by local recoding The original table The anonymized table The original table The suppressed table The original table The anonymized table Age Marginal (Education, AnnualIncome) Marginal 3.3 9.1a 9.2b 9.3c 9.4 9.5a 9.6b 9.7a 9.8b 9.9a 9.10b 9.11a 9.12b 9.13a 9.14b 10.1 CENSUS Dataset 10.2 Frequent Itemsets for supmin = 0.02 12.1 Results of clinical trial for the effectiveness of an analgesic drug 12.2 Second panel has LP relaxation bounds, and third panel has sharp IP bounds for cell entries in Table 1.1 given [R|CST ] conditional probability values 12.3 Sharp upper and lower bounds for cell entries in Table 12.1 given the [CSR] margin, and LP relaxation bounds given [R|CS] conditional probability values 12.4 Description of variables in CPS data extract 58 62 64 209 209 209 214 215 215 223 223 225 225 229 229 229 229 260 260 302 303 304 305

Author Charu C. Aggarwal Isbn 9780387709918 File size 5.3MB Year 2008 Pages 514 Language English File format PDF Category Security Book Description: FacebookTwitterGoogle+TumblrDiggMySpaceShare Advances in hardware technology have increased the capability to store and record personal data. This has caused concerns that personal data may be abused. This book proposes a number of techniques to perform the data mining tasks in a privacy-preserving way. This edited volume contains surveys by distinguished researchers in the privacy field. Each survey includes the key research content as well as future research directions of a particular topic in privacy. The book is designed for researchers, professors, and advanced-level students in computer science, but is also suitable for practitioners in industry.     Download (5.3MB) Database Anonymization Decentralized Computing Using Blockchain Technologies and Smart Contracts User-Centric Privacy and Security in Biometrics Computer Architecture and Security: Fundamentals of Designing Secure Computer Systems Statistical Techniques for Network Security Load more posts

Leave a Reply

Your email address will not be published. Required fields are marked *