Deepmind have been back in the news again recently for their inappropriate gathering of training data from the NHS. Currently, under common law in the UK, patient consent is implied if the purpose of the data sharing is direct care - however it's a bit of a stretch to consider training up AI algorithms as direct care.

All of the ML start-ups I talk to are keen to talk about their training data sets, always talking about how difficult they were to collect, giving unique data and supposedly allowing them to train better models. Very few, however, talk about the legality or moral status of their data gathering. For most start-ups this won't be an issue, but for some it may be a ticking PR timebomb or worse.

Until regulations and public perception of privacy catch-up to ML's needs for big data sets (if they do at all), there's always going to be a tension between wanting lots of data to train, in a world where no-one wants to share their information.