Don’t forget GDPR when untraining your ML – Maverisk / Étoiles du Nord

Training ML systems is bound to use personally identifiable information, PII usually dubbed personal information. This latter thing diminishes the scope, way too much, by leaving out that any bit of information that in conjunction with outside sources of any kind, can be used to identify a person, is PII.[1]
Under GDPR, there’s the right to be forgotten… Now there’s two problems:

Sometimes, data points can be retrieved literally from the trained system, like here. Clearly, such data points need to not be reproduced anymore, then. But how to un-learn an ML system when the data point involved, needs to be forgotten? [2]
Similar less literal cases apply. E.g., when it’s not one data point that’s regurgitated but the one does have an off-average value in the weight/trained parameter. Which is probable, since an ML system hardly learns from n times an average value [it may but then, that’s not ML but fixed function learning, fixed ‘algorithm’ wise] but from n different values, the one of concern among them. How to get the contribution out of the weights, and how to prove (which you may have to, under GDPR obligations though only when push comes to shove) that your ML weights no longer include that one data point its impact on the weights ..?

It’ll be fun, they said. For lawyers, they said.

Still, the whole thing may need to be figured out before anyone can deploy any ML system that included European citizens’ data — since the GDPR has global effect.
Now you have fun, I say.

With:

[You probably are on camera here…]

[1] Side note: I was wont to write ‘can and will’ which is true but sounds too much like ‘anything you say can and will be used against you in a court of law’ [disregarding the exact wording], which will of fact alter what I may say as I now include the consideration of what and how I say things subsequently. To which I ask: When not if, not all that I’d say is actually used in a court of law, does this invalidate the statement made to me, rendering the ‘can’ part invalid i.e., the respective speech part(s) that are used, illegal(ly obtained) evidence ..? Since I say things other and/or differently than without the statement at arrest i.e. based on a statement by a sworn officer that is later proven false, perjurious even. Entrapment? That’s illegal in many circumstances…
Would want to know from a legal scholar how this works.

[2] Most probably, you will not be able/allowed to keep that data point for any specific reason. To say that it’s too difficult to get the data point out of the trained system: Does. not. work. The law just require you to do the near-impossible; your mistake. Just train the system all over again why would anyone care for your interest? GDPR requires you to only ask how high you have to jump and then do that, whether you’d have to set a world record or not.

Leave a Reply Cancel reply