Link to workshop with slides and videos:
https://edps.europa.eu/data-protection/our-work/ipen/ipen-webinar-2021-synthetic-data-what-use-cases-privacy-enhancing_en
A few gems:
Unsorted links from chat
Various statements from chat
Of course, there are things to consider and, for example in 2019 there was an array of papers criticizing that differential privacy in general can re-inforce biases: “But it turns out that in reality the matter is actually much more complicated, as pointed out by latest research highlighting an inherent relationship between privacy and fairness. In fact, it becomes apparent that guaranteeing fairness under differentially private AI model training is impossible when one wants to maintain high accuracy. Such incompatibility of data privacy and fairness would have significant consequences. With respect to the potential of unfairness of some of the standard deep learning models, when it comes to fairness, the current differentially private learning methods fare even worse, reinforcing the biases and being even less fair to a great degree. Results like that should not exactly come as a surprise to implementers and deployers of the technology. Hiding data of small groups is actually among the features of differential privacy. In other words, it is not a bug but a feature of differential privacy. However, this feature leading to decrease of precision might not be something desirable in all use cases.” (Source (with link to papers): https://edps.europa.eu/press-publications/press-news/blog/inviting-new-perspectives-data-protection_en
But fairness AND privacy most certainly are possible. And when we look at synthetic data in particular, there were some promising presentations and talks about fair synthetic data generation at this year’s ICLR conference (e.g. by Amazon) https://www.jmir.org/2020/11/e23139
As might be expected, the speakers’ company gives an elaboration and opinion on synthetic data’s anonymity: https://www.replica-analytics.com/web/default/files/public/tutorials/privacy-law-and-synthetic-data/presentation_html5.html
I am a bit surprised by the references. This talk seems to ignore a vast literature on synthetic data (including an advanced analyses of privacy risks), e.g., https://arxiv.org/pdf/2011.07018.pdf ?
It is really important you can quantitatively measure the re-identification risk at the output level. Otherwise, there is going to be a lack of confidence that identifiability issues have been truly addressed. lt could be open to abuse if not done properly (i.e. claims that it is not personal data when it in fact it is personal data). Done well -it has great value for certain use cases.
totally agree that when epsilon is large, the formal DP guarantee is basically meaningless. That said, two comments: a) you always have to use enough noise so that the epsilon is small, b) sometimes the actual privacy DP gives is stronger in practical attack scenarios, see https://arxiv.org/pdf/2101.04535.pdf
Does de-identification require consent under the GDPR and English common law? authored by Khaled el Emam, Mike Hintze and Ruth Boardman
https://iapp.org/news/a/does-anonymization-or-de-identification-require-consent-under-the-gdpr/ also “Does de-identification require consent under the GDPR and English common law?, Authors: El Emam, Khaled 1 ; Hintze, Mike 2 ; Boardman, Ruth 3 ;Source: Journal of Data Protection & Privacy, Volume 3 / Number 3 / Summer 2020, pp. 291-298(8), Publisher: Henry Stewart Publications, https://www.ingentaconnect.com/content/hsp/jdpp/2020/00000003/00000003/art00007
Company: www.intuite.ai
Company: statice