Author | Charu C. Aggarwal | |

Isbn | 9780387709918 | |

File size | 5.3MB | |

Year | 2008 | |

Pages | 514 | |

Language | English | |

File format | ||

Category | security |

Privacy-Preserving
Data Mining
Models and Algorithms
ADVANCES IN DATABASE SYSTEMS
Volume 34
Series Editors
Ahmed K. Elmagarmid
Amit P. Sheth
Purdue University
West Lafayette, IN 47907
Wright State University
Dayton, Ohio 45435
Other books in the Series:
SEQUENCE DATA MINING, Guozhu Dong, Jian Pei; ISBN: 978-0-387-69936-3
DATA STREAMS: Models and Algorithms, edited by Charu C. Aggarwal; ISBN: 978-0-387-28759-1
SIMILARITY SEARCH: The Metric Space Approach, P. Zezula, G. Amato, V. Dohnal, M. Batko;
ISBN: 0-387-29146-6
STREAM DATA MANAGEMENT, Nauman Chaudhry, Kevin Shaw, Mahdi Abdelguerfi; ISBN:
0-387-24393-3
FUZZY DATABASE MODELING WITH XML, Zongmin Ma; ISBN: 0-387- 24248-1
MINING SEQUENTIAL PATTERNS FROM LARGE DATA SETS, Wei Wang and Jiong Yang;
ISBN: 0-387-24246-5
ADVANCED SIGNATURE INDEXING FOR MULTIMEDIA AND WEB APPLICATIONS, Yannis Manolopoulos, Alexandros Nanopoulos, Eleni Tousidou; ISBN: 1-4020-7425-5
ADVANCES IN DIGITAL GOVERNMENT: Technology, Human Factors, and Policy, edited by
William J. McIver, Jr. and Ahmed K. Elmagarmid; ISBN: 1-4020-7067-5
INFORMATION AND DATABASE QUALITY, Mario Piattini, Coral Calero and Marcela Genero;
ISBN: 0-7923-7599-8
DATA QUALITY, Richard Y. Wang, Mostapha Ziad, Yang W. Lee: ISBN: 0-7923-7215-8
THE FRACTAL STRUCTURE OF DATA REFERENCE: Applications to the Memory Hierarchy,
Bruce McNutt; ISBN: 0-7923-7945-4
SEMANTIC MODELS FOR MULTIMEDIA DATABASE SEARCHING AND BROWSING, ShuChing Chen, R.L. Kashyap, and Arif Ghafoor; ISBN: 0-7923-7888-1
INFORMATION BROKERING ACROSS HETEROGENEOUS DIGITAL DATA: A Metadatabased Approach, Vipul Kashyap, Amit Sheth; ISBN: 0-7923-7883-0
DATA DISSEMINATION IN WIRELESS COMPUTING ENVIRONMENTS, Kian-Lee Tan and
Beng Chin Ooi; ISBN: 0-7923-7866-0
MIDDLEWARE NETWORKS: Concept, Design and Deployment of Internet Infrastructure,
Michah Lerner, George Vanecek, Nino Vidovic, Dad Vrsalovic; ISBN: 0-7923-7840-7
ADVANCED DATABASE INDEXING, Yannis Manolopoulos, Yannis Theodoridis, Vassilis J. Tsotras;
ISBN: 0-7923-7716-8
MULTILEVEL SECURE TRANSACTION PROCESSING, Vijay Atluri, Sushil Jajodia, Binto
George ISBN: 0-7923-7702-8
FUZZY LOGIC IN DATA MODELING, Guoqing Chen ISBN: 0-7923-8253-6
PRIVACY-PRESERVING DATA MINING: Models and Algorithms, edited by Charu C. Aggarwal
and Philip S. Yu; ISBN: 0-387-70991-8
Privacy-Preserving
Data Mining
Models and Algorithms
Edited by
Charu C. Aggarwal
IBM T.J. Watson Research Center, USA
and
Philip S. Yu
University of Illinois at Chicago, USA
ABC
Editors:
Charu C. Aggarwal
IBM Thomas J. Watson Research Center
19 Skyline Drive
Hawthorne NY 10532
[email protected]
Series Editors
Ahmed K. Elmagarmid
Purdue University
West Lafayette, IN 47907
Philip S. Yu
Department of Computer Science
University of Illinois at Chicago
854 South Morgan Street
Chicago, IL 60607-7053
[email protected]
Amit P. Sheth
Wright State University
Dayton, Ohio 45435
ISBN 978-0-387-70991-8
e-ISBN 978-0-387-70992-5
DOI 10.1007/978-0-387-70992-5
Library of Congress Control Number: 2007943463
c 2008 Springer Science+Business Media, LLC.
°
All rights reserved. This work may not be translated or copied in whole or in part without the written
permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY
10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection
with any form of information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed is forbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are
not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject
to proprietary rights.
Printed on acid-free paper
9 8 7 6 5 4 3 2 1
springer.com
Preface
In recent years, advances in hardware technology have lead to an increase
in the capability to store and record personal data about consumers and individuals. This has lead to concerns that the personal data may be misused for
a variety of purposes. In order to alleviate these concerns, a number of techniques have recently been proposed in order to perform the data mining tasks in
a privacy-preserving way. These techniques for performing privacy-preserving
data mining are drawn from a wide array of related topics such as data mining,
cryptography and information hiding. The material in this book is designed
to be drawn from the different topics so as to provide a good overview of the
important topics in the ﬁeld.
While a large number of research papers are now available in this ﬁeld, many
of the topics have been studied by different communities with different styles.
At this stage, it becomes important to organize the topics in such a way that
the relative importance of different research areas is recognized. Furthermore,
the ﬁeld of privacy-preserving data mining has been explored independently
by the cryptography, database and statistical disclosure control communities.
In some cases, the parallel lines of work are quite similar, but the communities
are not sufﬁciently integrated for the provision of a broader perspective. This
book will contain chapters from researchers of all three communities and will
therefore try to provide a balanced perspective of the work done in this ﬁeld.
This book will be structured as an edited book from prominent researchers
in the ﬁeld. Each chapter will contain a survey which contains the key research
content on the topic, and the future directions of research in the ﬁeld. Emphasis
will be placed on making each chapter self-sufﬁcient. While the chapters will
be written by different researchers, the topics and content is organized in such
a way so as to present the most important models, algorithms, and applications
in the privacy ﬁeld in a structured and concise way. In addition, attention is
paid in drawing chapters from researchers working in different areas in order
to provide different points of view. Given the lack of structurally organized information on the topic of privacy, the book will provide insights which are not
easily accessible otherwise. A few chapters in the book are not surveys, since
the corresponding topics fall in the emerging category, and enough material is
vi
Preface
not available to create a survey. In such cases, the individual results have been
included to give a ﬂavor of the emerging research in the ﬁeld. It is expected
that the book will be a great help to researchers and graduate students interested in the topic. While the privacy ﬁeld clearly falls in the emerging category
because of its recency, it is now beginning to reach a maturation and popularity
point, where the development of an overview book on the topic becomes both
possible and necessary. It is hoped that this book will provide a reference to
students, researchers and practitioners in both introducing the topic of privacypreserving data mining and understanding the practical and algorithmic aspects
of the area.
Contents
Preface
v
List of Figures
xvii
List of Tables
xxi
1
An Introduction to Privacy-Preserving Data Mining
Charu C. Aggarwal, Philip S. Yu
1.1. Introduction
1.2. Privacy-Preserving Data Mining Algorithms
1.3. Conclusions and Summary
References
2
A General Survey of Privacy-Preserving Data Mining Models and
Algorithms
Charu C. Aggarwal, Philip S. Yu
2.1. Introduction
2.2. The Randomization Method
2.2.1 Privacy Quantiﬁcation
2.2.2 Adversarial Attacks on Randomization
2.2.3 Randomization Methods for Data Streams
2.2.4 Multiplicative Perturbations
2.2.5 Data Swapping
2.3. Group Based Anonymization
2.3.1 The k -Anonymity Framework
2.3.2 Personalized Privacy-Preservation
2.3.3 Utility Based Privacy Preservation
2.3.4 Sequential Releases
2.3.5 The l -diversity Method
2.3.6 The t -closeness Model
2.3.7 Models for Text, Binary and String Data
2.4. Distributed Privacy-Preserving Data Mining
2.4.1 Distributed Algorithms over Horizontally Partitioned Data
Sets
2.4.2 Distributed Algorithms over Vertically Partitioned Data
2.4.3 Distributed Algorithms for k -Anonymity
1
1
3
7
8
11
11
13
15
18
18
19
19
20
20
24
24
25
26
27
27
28
30
31
32
viii
Contents
2.5.
Privacy-Preservation of Application Results
2.5.1 Association Rule Hiding
2.5.2 Downgrading Classiﬁer Effectiveness
2.5.3 Query Auditing and Inference Control
2.6. Limitations of Privacy: The Curse of Dimensionality
2.7. Applications of Privacy-Preserving Data Mining
2.7.1 Medical Databases: The Scrub and Dataﬂy Systems
2.7.2 Bioterrorism Applications
2.7.3 Homeland Security Applications
2.7.4 Genomic Privacy
2.8. Summary
References
3
A Survey of Inference Control Methods for Privacy-Preserving Data
Mining
Josep Domingo-Ferrer
3.1. Introduction
3.2. A classiﬁcation of Microdata Protection Methods
3.3. Perturbative Masking Methods
3.3.1 Additive Noise
3.3.2 Microaggregation
3.3.3 Data Wapping and Rank Swapping
3.3.4 Rounding
3.3.5 Resampling
3.3.6 PRAM
3.3.7 MASSC
3.4. Non-perturbative Masking Methods
3.4.1 Sampling
3.4.2 Global Recoding
3.4.3 Top and Bottom Coding
3.4.4 Local Suppression
3.5. Synthetic Microdata Generation
3.5.1 Synthetic Data by Multiple Imputation
3.5.2 Synthetic Data by Bootstrap
3.5.3 Synthetic Data by Latin Hypercube Sampling
3.5.4 Partially Synthetic Data by Cholesky Decomposition
3.5.5 Other Partially Synthetic and Hybrid Microdata Approaches
3.5.6 Pros and Cons of Synthetic Microdata
3.6. Trading off Information Loss and Disclosure Risk
3.6.1 Score Construction
3.6.2 R-U Maps
3.6.3 k -anonymity
3.7. Conclusions and Research Directions
References
32
33
34
34
37
38
39
40
40
42
43
43
53
54
55
58
58
59
61
62
62
62
63
63
64
64
65
65
65
65
66
66
67
67
68
69
69
71
71
72
73
ix
Contents
4
Measures of Anonymity
Suresh Venkatasubramanian
4.1. Introduction
4.1.1 What is Privacy?
4.1.2 Data Anonymization Methods
4.1.3 A Classiﬁcation of Methods
4.2. Statistical Measures of Anonymity
4.2.1 Query Restriction
4.2.2 Anonymity via Variance
4.2.3 Anonymity via Multiplicity
4.3. Probabilistic Measures of Anonymity
4.3.1 Measures Based on Random Perturbation
4.3.2 Measures Based on Generalization
4.3.3 Utility vs Privacy
4.4. Computational Measures of Anonymity
4.4.1 Anonymity via Isolation
4.5. Conclusions and New Directions
4.5.1 New Directions
References
81
81
82
83
84
85
85
85
86
87
87
90
94
94
97
97
98
99
5
k-Anonymous Data Mining: A Survey
105
V. Ciriani, S. De Capitani di Vimercati, S. Foresti, and P. Samarati
5.1. Introduction
5.2. k -Anonymity
5.3. Algorithms for Enforcing k -Anonymity
5.4. k -Anonymity Threats from Data Mining
5.4.1 Association Rules
5.4.2 Classiﬁcation Mining
5.5. k -Anonymity in Data Mining
5.6. Anonymize-and-Mine
5.7. Mine-and-Anonymize
5.7.1 Enforcing k -Anonymity on Association Rules
5.7.2 Enforcing k -Anonymity on Decision Trees
5.8. Conclusions
Acknowledgments
References
105
107
110
117
118
118
120
123
126
126
130
133
133
134
6
A Survey of Randomization Methods for Privacy-Preserving Data Mining
Charu C. Aggarwal, Philip S. Yu
6.1. Introduction
6.2. Reconstruction Methods for Randomization
6.2.1 The Bayes Reconstruction Method
6.2.2 The EM Reconstruction Method
6.2.3 Utility and Optimality of Randomization Models
137
137
139
139
141
143
x
Contents
6.3.
Applications of Randomization
6.3.1 Privacy-Preserving Classiﬁcation with Randomization
6.3.2 Privacy-Preserving OLAP
6.3.3 Collaborative Filtering
6.4. The Privacy-Information Loss Tradeoff
6.5. Vulnerabilities of the Randomization Method
6.6. Randomization of Time Series Data Streams
6.7. Multiplicative Noise for Randomization
6.7.1 Vulnerabilities of Multiplicative Randomization
6.7.2 Sketch Based Randomization
6.8. Conclusions and Summary
References
7
A Survey of Multiplicative Perturbation for Privacy-Preserving
Data Mining
Keke Chen and Ling Liu
7.1. Introduction
7.1.1 Data Privacy vs. Data Utility
7.1.2 Outline
7.2. Deﬁnition of Multiplicative Perturbation
7.2.1 Notations
7.2.2 Rotation Perturbation
7.2.3 Projection Perturbation
7.2.4 Sketch-based Approach
7.2.5 Geometric Perturbation
7.3. Transformation Invariant Data Mining Models
7.3.1 Deﬁnition of Transformation Invariant Models
7.3.2 Transformation-Invariant Classiﬁcation Models
7.3.3 Transformation-Invariant Clustering Models
7.4. Privacy Evaluation for Multiplicative Perturbation
7.4.1 A Conceptual Multidimensional Privacy Evaluation Model
7.4.2 Variance of Difference as Column Privacy Metric
7.4.3 Incorporating Attack Evaluation
7.4.4 Other Metrics
7.5. Attack Resilient Multiplicative Perturbations
7.5.1 Naive Estimation to Rotation Perturbation
7.5.2 ICA-Based Attacks
7.5.3 Distance-Inference Attacks
7.5.4 Attacks with More Prior Knowledge
7.5.5 Finding Attack-Resilient Perturbations
7.6. Conclusion
Acknowledgment
References
8
A Survey of Quantiﬁcation of Privacy Preserving Data Mining Algorithms
Elisa Bertino, Dan Lin and Wei Jiang
8.1. Introduction
8.2. Metrics for Quantifying Privacy Level
8.2.1 Data Privacy
144
144
145
145
146
149
151
152
153
153
154
154
157
158
159
160
161
161
161
162
164
164
165
166
166
167
168
168
169
170
171
171
171
173
174
176
177
177
178
179
183
184
186
186
Contents
8.2.2 Result Privacy
Metrics for Quantifying Hiding Failure
Metrics for Quantifying Data Quality
8.4.1 Quality of the Data Resulting from the PPDM Process
8.4.2 Quality of the Data Mining Results
8.5. Complexity Metrics
8.6. How to Select a Proper Metric
8.7. Conclusion and Research Directions
References
8.3.
8.4.
9
A Survey of Utility-based Privacy-Preserving Data
Transformation Methods
Ming Hua and Jian Pei
9.1. Introduction
9.1.1 What is Utility-based Privacy Preservation?
9.2. Types of Utility-based Privacy Preservation Methods
9.2.1 Privacy Models
9.2.2 Utility Measures
9.2.3 Summary of the Utility-Based Privacy Preserving Methods
9.3. Utility-Based Anonymization Using Local Recoding
9.3.1 Global Recoding and Local Recoding
9.3.2 Utility Measure
9.3.3 Anonymization Methods
9.3.4 Summary and Discussion
9.4. The Utility-based Privacy Preserving Methods in Classiﬁcation Problems
9.4.1 The Top-Down Specialization Method
9.4.2 The Progressive Disclosure Algorithm
9.4.3 Summary and Discussion
9.5. Anonymized Marginal: Injecting Utility into Anonymized Data Sets
9.5.1 Anonymized Marginal
9.5.2 Utility Measure
9.5.3 Injecting Utility Using Anonymized Marginals
9.5.4 Summary and Discussion
9.6. Summary
Acknowledgments
References
10
Mining Association Rules under Privacy Constraints
Jayant R. Haritsa
10.1. Introduction
10.2. Problem Framework
10.2.1 Database Model
10.2.2 Mining Objective
10.2.3 Privacy Mechanisms
10.2.4 Privacy Metric
10.2.5 Accuracy Metric
xi
191
192
193
193
198
200
201
202
202
207
208
209
210
210
212
214
214
215
216
217
219
219
220
224
228
228
229
230
231
233
234
234
234
239
239
240
240
241
241
243
245
xii
Contents
10.3. Evolution of the Literature
10.4. The FRAPP Framework
10.4.1 Reconstruction Model
10.4.2 Estimation Error
10.4.3 Randomizing the Perturbation Matrix
10.4.4 Efﬁcient Perturbation
10.4.5 Integration with Association Rule Mining
10.5. Sample Results
10.6. Closing Remarks
Acknowledgments
References
11
A Survey of Association Rule Hiding Methods for Privacy
Vassilios S. Verykios and Aris Gkoulalas-Divanis
11.1. Introduction
11.2. Terminology and Preliminaries
11.3. Taxonomy of Association Rule Hiding Algorithms
11.4. Classes of Association Rule Algorithms
11.4.1 Heuristic Approaches
11.4.2 Border-based Approaches
11.4.3 Exact Approaches
11.5. Other Hiding Approaches
11.6. Metrics and Performance Analysis
11.7. Discussion and Future Trends
11.8. Conclusions
References
12
A Survey of Statistical Approaches to Preserving Conﬁdentiality
of Contingency Table Entries
Stephen E. Fienberg and Aleksandra B. Slavkovic
12.1. Introduction
12.2. The Statistical Approach Privacy Protection
12.3. Datamining Algorithms, Association Rules, and Disclosure
Limitation
12.4. Estimation and Disclosure Limitation for Multi-way Contingency
Tables
12.5. Two Illustrative Examples
12.5.1 Example 1: Data from a Randomized Clinical Trial
12.5.2 Example 2: Data from the 1993 U.S. Current Population
Survey
12.6. Conclusions
Acknowledgments
References
13
A Survey of Privacy-Preserving Methods Across Horizontally Partitioned
Data
Murat Kantarcioglu
13.1. Introduction
246
251
252
253
256
256
258
259
263
263
263
267
267
269
270
271
272
277
278
279
281
284
285
286
291
291
292
294
295
301
301
305
308
309
309
313
313
Contents
13.2. Basic Cryptographic Techniques for Privacy-Preserving Distributed
Data Mining
13.3. Common Secure Sub-protocols Used in Privacy-Preserving
Distributed Data Mining
13.4. Privacy-preserving Distributed Data Mining on Horizontally
Partitioned Data
13.5. Comparison to Vertically Partitioned Data Model
13.6. Extension to Malicious Parties
13.7. Limitations of the Cryptographic Techniques Used in PrivacyPreserving Distributed Data Mining
13.8. Privacy Issues Related to Data Mining Results
13.9. Conclusion
References
14
A Survey of Privacy-Preserving Methods Across Vertically Partitioned
Data
Jaideep Vaidya
14.1. Introduction
14.2. Classiﬁcation
14.2.1 Na¨ıve Bayes Classiﬁcation
14.2.2 Bayesian Network Structure Learning
14.2.3 Decision Tree Classiﬁcation
14.3. Clustering
14.4. Association Rule Mining
14.5. Outlier detection
14.5.1 Algorithm
14.5.2 Security Analysis
14.5.3 Computation and Communication Analysis
14.6. Challenges and Research Directions
References
15
A Survey of Attack Techniques on Privacy-Preserving Data Perturbation
Methods
Kun Liu, Chris Giannella, and Hillol Kargupta
15.1. Introduction
15.2. Deﬁnitions and Notation
15.3. Attacking Additive Data Perturbation
15.3.1 Eigen-Analysis and PCA Preliminaries
15.3.2 Spectral Filtering
15.3.3 SVD Filtering
15.3.4 PCA Filtering
15.3.5 MAP Estimation Attack
15.3.6 Distribution Analysis Attack
15.3.7 Summary
15.4. Attacking Matrix Multiplicative Data Perturbation
15.4.1 Known I/O Attacks
15.4.2 Known Sample Attack
15.4.3 Other Attacks Based on ICA
xiii
315
318
323
326
327
329
330
332
332
337
337
341
342
343
344
346
347
349
351
352
354
355
356
359
360
360
361
362
363
364
365
366
367
367
369
370
373
374
xiv
Contents
15.4.4 Summary
15.5. Attacking k-Anonymization
15.6. Conclusion
Acknowledgments
References
16
Private Data Analysis via Output Perturbation
Kobbi Nissim
16.1. Introduction
16.2. The Abstract Model – Statistical Databases, Queries, and Sanitizers
16.3. Privacy
16.3.1 Interpreting the Privacy Deﬁnition
16.4. The Basic Technique: Calibrating Noise to Sensitivity
16.4.1 Applications: Functions with Low Global Sensitivity
16.5. Constructing Sanitizers for Complex Functionalities
16.5.1 k-Means Clustering
16.5.2 SVD and PCA
16.5.3 Learning in the Statistical Queries Model
16.6. Beyond the Basics
16.6.1 Instance Based Noise and Smooth Sensitivity
16.6.2 The Sample-Aggregate Framework
16.6.3 A General Sanitization Mechanism
16.7. Related Work and Bibliographic Notes
Acknowledgments
References
17
A Survey of Query Auditing Techniques for Data Privacy
Shubha U. Nabar, Krishnaram Kenthapadi, Nina Mishra and Rajeev Motwani
17.1. Introduction
17.2. Auditing Aggregate Queries
17.2.1 Ofﬂine Auditing
17.2.2 Online Auditing
17.3. Auditing Select-Project-Join Queries
17.4. Challenges in Auditing
17.5. Reading
References
18
Privacy and the Dimensionality Curse
Charu C. Aggarwal
18.1. Introduction
18.2. The Dimensionality Curse and the k -anonymity Method
18.3. The Dimensionality Curse and Condensation
18.4. The Dimensionality Curse and the Randomization Method
18.4.1 Effects of Public Information
18.4.2 Effects of High Dimensionality
18.4.3 Gaussian Perturbing Distribution
18.4.4 Uniform Perturbing Distribution
375
376
376
377
377
383
383
385
388
390
394
396
400
401
403
404
405
406
408
409
409
411
411
415
415
416
417
418
426
427
429
430
433
433
435
441
446
446
450
450
455
Contents
18.5. The Dimensionality Curse and l-diversity
18.6. Conclusions and Research Directions
References
19
Personalized Privacy Preservation
Yufei Tao and Xiaokui Xiao
19.1. Introduction
19.2. Formalization of Personalized Anonymity
19.2.1 Personal Privacy Requirements
19.2.2 Generalization
19.3. Combinatorial Process of Privacy Attack
19.3.1 Primary Case
19.3.2 Non-primary Case
19.4. Theoretical Foundation
19.4.1 Notations and Basic Properties
19.4.2 Derivation of the Breach Probability
19.5. Generalization Algorithm
19.5.1 The Greedy Framework
19.5.2 Optimal SA-generalization
19.6. Alternative Forms of Personalized Privacy Preservation
19.6.1 Extension of k -anonymity
19.6.2 Personalization in Location Privacy Protection
19.7. Summary and Future Work
References
xv
458
459
460
461
461
463
464
465
467
468
469
470
471
472
473
474
476
478
479
480
482
485
20
Privacy-Preserving Data Stream Classiﬁcation
Yabo Xu, Ke Wang, Ada Wai-Chee Fu, Rong She, and Jian Pei
20.1. Introduction
20.1.1 Motivating Example
20.1.2 Contributions and Paper Outline
20.2. Related Works
20.3. Problem Statement
20.3.1 Secure Join Stream Classiﬁcation
20.3.2 Naive Bayesian Classiﬁers
20.4. Our Approach
20.4.1 Initialization
20.4.2 Bottom-Up Propagation
20.4.3 Top-Down Propagation
20.4.4 Using NBC
20.4.5 Algorithm Analysis
20.5. Empirical Studies
20.5.1 Real-life Datasets
20.5.2 Synthetic Datasets
20.5.3 Discussion
20.6. Conclusions
References
487
488
490
491
493
493
494
495
495
496
497
499
500
501
502
504
506
507
508
Index
511
487
List of Figures
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
5.12
5.13
5.14
5.15
5.16
5.17
5.18
5.19
5.20
6.1
7.1
Simpliﬁed representation of a private table
An example of domain and value generalization hierarchies
Classiﬁcation of k-anonymity techniques [11]
Generalization hierarchy for QI={Marital status, Sex}
Index assignment to attributes Marital status and Sex
An example of set enumeration tree over set I = {1, 2, 3}
of indexes
Sub-hierarchies computed by Incognito for the table in
Figure 5.1
Spatial representation (a) and possible partitioning
(b)-(d) of the table in Figure 5.1
An example of decision tree
Different approaches for combining k-anonymity and
data mining
An example of top-down anonymization for the private
table in Figure 5.1
Frequent itemsets extracted from the table in Figure 5.1
An example of binary table
Itemsets extracted from the table in Figure 5.13(b)
Itemsets with support at least equal to 40 (a) and
corresponding anonymized itemsets (b)
3-anonymous version of the tree of Figure 5.9
Suppression of occurrences in non-leaf nodes in the tree
in Figure 5.9
Table inferred from the decision tree in Figure 5.17
11-anonymous version of the tree in Figure 5.17
Table inferred from the decision tree in Figure 5.19
Illustration of the Information Loss Metric
Using known points and distance relationship to infer
the rotation matrix
108
109
110
111
112
113
114
116
119
120
124
127
128
128
129
131
132
132
132
133
149
175
xviii
List of Figures
9.1
9.2
9.3
9.4
10.1
10.2
A taxonomy tree on categorical attribute Education
A taxonomy tree on continuous attribute Age
Interactive graph
A decomposition
CENSUS (γ = 19)
Perturbation Matrix Condition Numbers (γ = 19)
13.1
Relationship between Secure Sub-protocols and Privacy
Preserving Distributed Data Mining on Horizontally
Partitioned Data
Two dimensional problem that cannot be decomposed
into two one-dimensional problems
Wigner’s semi-circle law: a histogram of the eigenvalues
√
for a large, randomly generated A
of A+A
2 2p
Skeleton of a simulatable private randomized auditor
Some Examples of Generalization for 2-Anonymity
Upper Bound of 2-anonymity Probability in an
Non-Empty Grid Cell
Fraction of Data Points Preserving 2-Anonymity with
Data Dimensionality (Gaussian Clusters)
Minimum Information Loss for 2-Anonymity (Gaussian
Clusters)
Randomization Level with Increasing Dimensionality,
Perturbation level = 8 · σ o (U niDis)
Microdata and generalization
The taxonomy of attribute Disease
A possible result of our generalization scheme
The voter registration list
Algorithm for computing personalized generalization
Algorithm for ﬁnding the optimal SA-generalization
Personalized k-anonymous generalization
Related streams / tables
The join stream
Example with 3 streams at initialization
After bottom-up propagations
After top-down propagations
UK road accident data (2001)
Classiﬁer accuracy
Time per input tuple
14.1
15.1
17.1
18.1
18.2
18.3
18.4
18.5
19.1
19.2
19.3
19.4
19.5
19.6
19.7
20.1
20.2
20.3
20.4
20.5
20.6
20.7
20.8
221
221
232
232
261
262
323
340
363
423
435
439
440
445
457
462
463
466
468
474
478
480
489
489
496
498
499
502
503
503
List of Figures
xix
20.9
20.10
20.11
20.12
20.13
505
505
506
506
507
Classiﬁer accuracy vs. window size
Classiﬁer accuracy vs. concept drifting interval
Time per input tuple vs. window size
Time per input tuple vs. blow-up ratio
Time per input tuple vs. number of streams
List of Tables
3.1
3.2
Perturbative methods vs data types. “X” denotes applicable and “(X)” denotes applicable with some adaptation
Example of rank swapping. Left, original ﬁle; right,
rankswapped ﬁle
Non-perturbative methods vs data types
The original table
A 2-anonymized table with better utility
A 2-anonymized table with poorer utility
Summary of utility-based privacy preserving methods
3-anonymous table by global recoding
3-anonymous table by local recoding
The original table
The anonymized table
The original table
The suppressed table
The original table
The anonymized table
Age Marginal
(Education, AnnualIncome) Marginal
3.3
9.1a
9.2b
9.3c
9.4
9.5a
9.6b
9.7a
9.8b
9.9a
9.10b
9.11a
9.12b
9.13a
9.14b
10.1 CENSUS Dataset
10.2 Frequent Itemsets for supmin = 0.02
12.1 Results of clinical trial for the effectiveness
of an analgesic drug
12.2 Second panel has LP relaxation bounds, and third panel
has sharp IP bounds for cell entries in Table 1.1 given
[R|CST ] conditional probability values
12.3 Sharp upper and lower bounds for cell entries in Table 12.1 given the [CSR] margin, and LP relaxation
bounds given [R|CS] conditional probability values
12.4 Description of variables in CPS data extract
58
62
64
209
209
209
214
215
215
223
223
225
225
229
229
229
229
260
260
302
303
304
305

Author Charu C. Aggarwal Isbn 9780387709918 File size 5.3MB Year 2008 Pages 514 Language English File format PDF Category Security Book Description: FacebookTwitterGoogle+TumblrDiggMySpaceShare Advances in hardware technology have increased the capability to store and record personal data. This has caused concerns that personal data may be abused. This book proposes a number of techniques to perform the data mining tasks in a privacy-preserving way. This edited volume contains surveys by distinguished researchers in the privacy field. Each survey includes the key research content as well as future research directions of a particular topic in privacy. The book is designed for researchers, professors, and advanced-level students in computer science, but is also suitable for practitioners in industry. Download (5.3MB) Database Anonymization Decentralized Computing Using Blockchain Technologies and Smart Contracts User-Centric Privacy and Security in Biometrics Computer Architecture and Security: Fundamentals of Designing Secure Computer Systems Statistical Techniques for Network Security Load more posts