re-identification – Privacy Design® / [protecting people by good design, solid security, efficient processes and trusted services] Sun, 26 May 2019 21:08:00 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 /wp-content/uploads/2018/02/cropped-favicon-32x32.jpg re-identification – Privacy Design® / 32 32 AEPD – Survey on Device Fingerprinting /2019/05/26/aepd-survey-on-device-fingerprinting/ Sun, 26 May 2019 21:08:00 +0000 /?p=793 https://www.aepd.es/media/estudios/estudio-fingerprinting-huella-digital-EN.pdf

]]>
Synthea – a Synthetic Patient Population Simulator. /2019/05/26/synthea-a-synthetic-patient-population-simulator/ Sun, 26 May 2019 19:40:21 +0000 /?p=721 Synthea  is a Synthetic Patient Population Simulator. The goal is to output synthetic, realistic (but not real), patient data and associated health records in a variety of formats.

Nice offline tool to generate synthetic patient data..

https://github.com/synthetichealth/synthea

]]>
Researchers re-identify patients from a de-identified patient data set published by the Australian government /2018/02/21/researchers-re-identify-patients-from-a-de-identified-patient-data-set-published-by-the-australian-government/ Wed, 21 Feb 2018 09:49:39 +0000 /?p=166 Continue reading "Researchers re-identify patients from a de-identified patient data set published by the Australian government"

]]>
The Australian government published a de-identified open health data set in the past, which contained the patient data of a subset of the Australian population.  – The de-identification process  involved not just stripping direct identifiers, but also adding some inaccuracies to the data set. However, the data set was still at the person-level.

Researchers have been able to successfully re-identify some patients.


Abstract: With the aim of informing sound policy about data sharing and privacy, we describe successful re-identification of patients in an Australian de-identified open health dataset. As in prior studies of similar datasets, a few mundane facts often suffice to isolate an individual.
Some people can be identified by name based on publicly available information. Decreasing the precision of the unit-record level data, or perturbing it statistically, makes re-identification gradually harder at a substantial cost to utility. We also examine the value of related datasets in improving the accuracy and confidence of re-identification. Our re-identifications were performed on a 10% sample dataset, but a related open Australian dataset allows us to infer with high confidence that some individuals in the sample have been correctly re-identified.
Finally, we examine the combination of the open datasets with some commercial datasets that are known to exist but are not in our possession. We show that they would further increase the ease of re-identification

https://arxiv.org/ftp/arxiv/papers/1712/1712.05627.pdf

]]>