Privacy Risks – Privacy Design®

CNIL Privacy Impact Assessment Knowledge Bases

stefan — Thu, 30 May 2019 20:13:05 +0000

https://www.cnil.fr/sites/default/files/atoms/files/cnil-pia-3-en-knowledgebases.pdf

I keep going back to this resource, as it has a good set of examples for privacy risks.

But it also has a long catalog of technical and organizational measures (TOM).

AEPD – Survey on Device Fingerprinting

stefan — Sun, 26 May 2019 21:08:00 +0000

https://www.aepd.es/media/estudios/estudio-fingerprinting-huella-digital-EN.pdf

Synthea – a Synthetic Patient Population Simulator.

stefan — Sun, 26 May 2019 19:40:21 +0000

Synthea is a Synthetic Patient Population Simulator. The goal is to output synthetic, realistic (but not real), patient data and associated health records in a variety of formats.

Nice offline tool to generate synthetic patient data..

https://github.com/synthetichealth/synthea

Three Artificial Intelligence papers by the DPAs of Norway, UK and France

stefan — Tue, 13 Mar 2018 22:28:23 +0000

France

https://www.cnil.fr/en/how-can-humans-keep-upper-hand-report-ethical-matters-raised-algorithms-and-artificial-intelligence

Norway

https://www.datatilsynet.no/globalassets/global/english/ai-and-privacy.pdf

https://ico.org.uk/for-organisations/guide-to-data-protection/big-data/

Blackbox extraction of secrets from deep learning models

stefan — Tue, 13 Mar 2018 19:22:51 +0000

Fascinating paper: “The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets”, Nicholas Carlini, Chang Liu, Jernej Kos, Úlfar Erlingsson, Dawn Song at https://arxiv.org/abs/1802.08232

Turns out that your algorithm memorizes your secrets in the training data. -Even if the algorithm is a lot smaller than the actual secrets… – My jaw fell do the ground right here :

“The fact that models completely memorize secrets in the training data is completely unexpected: our language model is only 600KB when compressed , and the PTB dataset is 1.7MB when compressed. Assuming that the PTB dataset can not be compressed significantly more than this, it is therefore information-theoretically impossible for the model to have memorized all training data—it simply does not have enough capacity with only 600KB of weights. Despite this, when we repeat our experiment and train this language model multiple times, the inserted secret is the most likely 80% of the time (and in the remaining times the secret is always within the top10 most likely). At present we are unable to fully explain the reason this occurs. We conjecture that the model learns a lossy compression of the training data on which it is forced to learn and generalize. But since secrets are random, incompressible parts of the training data, no such force prevents the model from simply memorizing their exact details.”

https://arxiv.org/pdf/1802.08232.pdf

CNIL: DPIA tools (templates and knowledge bases)

stefan — Wed, 21 Feb 2018 10:59:26 +0000

This is one of my favorite documents that I refer to on a day to day basis.

Nice list of privacy risks and severity examples.

https://www.cnil.fr/sites/default/files/typo/document/CNIL-PIA-2-Tools.pdf

HIPAA – Lessons learnt from Aetna settlement $17.2 Million

stefan — Wed, 21 Feb 2018 10:26:02 +0000

from Hogan Lovells:

https://www.hldataprotection.com/2018/01/articles/health-privacy-hipaa/aetna-17-2-million-breach-settlement-brings-lessons-for-handling-health-data/

Norwegian DPA blocks three smart device vendors from processing customer data

stefan — Wed, 21 Feb 2018 09:52:55 +0000

The Norwegian DPA has given Gator AS orders to discontinue all processing of personal information about its customers since they have not provided enough information in the smart bells they provide. In addition, PepCall AS and GPS for children – Smartprodukt AS have been notified of similar decisions.

Use right-click in Chrome to translate:

https://www.datatilsynet.no/aktuelt/2017/palegger-stans-i-behandlingen-av-personopplysninger-i-smartklokker/

Researchers re-identify patients from a de-identified patient data set published by the Australian government

stefan — Wed, 21 Feb 2018 09:49:39 +0000

The Australian government published a de-identified open health data set in the past, which contained the patient data of a subset of the Australian population. – The de-identification process involved not just stripping direct identifiers, but also adding some inaccuracies to the data set. However, the data set was still at the person-level.

Researchers have been able to successfully re-identify some patients.

Abstract: With the aim of informing sound policy about data sharing and privacy, we describe successful re-identification of patients in an Australian de-identified open health dataset. As in prior studies of similar datasets, a few mundane facts often suffice to isolate an individual.
Some people can be identified by name based on publicly available information. Decreasing the precision of the unit-record level data, or perturbing it statistically, makes re-identification gradually harder at a substantial cost to utility. We also examine the value of related datasets in improving the accuracy and confidence of re-identification. Our re-identifications were performed on a 10% sample dataset, but a related open Australian dataset allows us to infer with high confidence that some individuals in the sample have been correctly re-identified.
Finally, we examine the combination of the open datasets with some commercial datasets that are known to exist but are not in our possession. We show that they would further increase the ease of re-identification

https://arxiv.org/ftp/arxiv/papers/1712/1712.05627.pdf