Ah, to add to the previous column: The Signal and the Noise: Why So Many Predictions Fail – But Some Don’t by Nate Silver seems interesting. Though I fear that the conclusion will again be: “It’s not hopeless [it almost is, ed.] but if you work really, really hard, you may succeed in finding nuggets — whether or not they’re worth all the effort [I think: hardly, ed.].”
[Just a pretty picture of Valencia, as a Calatrava … follower but not groupie]
With all the talk about Predictive Analysis lately, it is time we separate, once again, the hype-humbug from the real(ity). Which is that the Predictive part is only partially true, and the Analysis part by definition is the contrary of predictive.
1. Analysis can only look at what’s here and now, or behind us. One can only analyse history. And, as Hegel pointed out, “Only at dusk does Minerva’s owl takes flight”: Only when the sun sets on some development, can we start (sic) to assess its importance. Analysis is devout of Meaning. It’s mechanical, it may abstract but not create Knowledge, Information at best but usually just more meta metadata. Again, interpretation, the error-prone and nearly-completely ununderstood brain process, is needed to make analysis worthwhile. And the result looks back, not forward.
2. Predictive… If we know anything of the future, it is that it’s uncertainty defined. One can predict some trends to continue — but may be disappointed.
Anything near the precision and the precise predictions that Predictive Analysis is pictured to deliver, will be certain to not come true.
This, as the dragnet of the analysis can never be complete, by the theoretical framework of needing a model of the universe which then would need the model itself to be anything distantly approaching completeness. And by practical considerations; analysts will have to work with the hammer they’re handed and view any world as nails only.
Also, trends may continue but ‘never’ (unless by extreme exception) linearly so, and non-linear modeling is still in its infancy (and may prove impossible by lack of suitable data). Hence, you’ll miss the mark by just extending lines.
Oh, it’s not about the detail, you say? Why don’t you stick to qualitative predictions, then?
3. Anything that can be analysed today, is tomorrow’s mediocrity as it was/is already known. The spurious results may be interesting, but we know whether they’re spurious or early indicators only afterwards, too late.
4. “Sh.t happens!” Why can’t we accept that in the medium-sized world we live in, quantum jumps do happen as well, with similar levels of chance calculus? They may not be binary quantum events, but appear to be. And make life both dangerous and fun! No surprises is boring par excellence. And will fail.
5. Go ahead and hype the mundane. But don’t oversell. You’re not a magician, or the magician is a fraud …
It’s not the size of your data, it’s what you do with it.
Or so claim men that are insecure about theirs. [Disclaimer: I have big hands. ‘nuff said.]
There appears to be confusion about what Big Data may achieve. There’s the Marketing anecdotes (sic) about spurious relations, the long-standing anecdotes of credit card companies’ fraud detection, and there’s talk of ‘smart data’ use in e.g. organizations’ process mining to establish internal control quality. One party bashing the others over ‘abuse’ of terms.
What happen? [for those that know their History of Memes]
1. We have more data available than ever before. We had various generations of data analysis tools already; every new generation introducing a. a higely increased capability to deal with the data set sizes of their age, b. ease of use through prettier interfaces, c. less requirements on ex ante correlation hypothesis definitions. Now, we seem to come to a phase where we have all the data we might possibly want (not) with tools that allow any fatastic result to appear out of thin air (not).
2. Throwing humongous amounts of (usually, marketing- or generic socmed) data at an analysis tool may automatically deliver correlations, but these come in various sorts and sizes:
- The usual suspects; loyal brand/product followers will probably be hard to get into lift-shift-retention mode. If (not when) valid, still are not really interesting because they would (should!) have been known already from ‘traditional’ research;
- The false positives; spurious correlations, co-variance, etc., induced by data noise. Without human analysis, many wrongs can be assumed. All too often, correlation is taken (emotionally at least) to be close to or somewhat the same as causation; over and over again. How dumb can an analyst be, to not (sufficiently) be aware of their own psychological biases! Tons of them are around and impact the work, and the more one is convinced not to be (psychologically) biased, the more onewillbe and the worse the impact will be. Let alone that systemic biases can be found all too often;
- The true positives. The hidden gems.
We don’t have any data on the amount of the spurious results vis-a-vis useful results (next bullet) to know how well (effective, efficient) we do with both automated correlation discovery and human analysis, which would be tell-tale in the world of analyse-everything. But what would you expect from this overwhelmingly inductive approach?
3. Yes, until now, in history we seem to have done quite well with deductive approaches, from the pre-Socratics until the recent discovery of the Higgs boson… in all sciences including the classic social/sociological scientists like the Greek and Roman authors (yes, the humanities are deductive sociology) and the deep thinking by definition philosophers.
The ‘scientists’ who relied on inductive approaches … we don’t even know their names (anymore) because their ‘theories’ were all refuted so completely. Yet, the above data bucket approach is no more than just pure and blind induction.
4. Ah, but then you say, ‘We aren’tthatstupid, we do take care to select the right data, filter and massage it until we know it may deliver something useful.’ Well, thank you for numbing down your data; out go the false but also the true positives..! And the other results, you should have had already long time ago via ‘traditional’ aproaches. No need to call Big Data Analysis what you do now. Either take it wholesale, or leave it!
5. Taking it wholesale will take tons of human analysis and control (over the results!); the Big Data element will dwindle to negligable proportionsifyou do this right. Big Data will be just a small start to a number of process steps that,ifdone right, will lean towards refinement through deduction much more than being induction-only that Big Data is trumpeted to be. This can be seen in e.g. some TLA having collected the world’s communications and Internet data; there’s so many dots and so many dots connected, that the significant connected dots are missed time and time again — it appears infeasible to separate the false positives from the true positives or we have a Sacrifice Coventry situation. So, repeated, no need to call this all, Big Data.
5. And then there’s the ‘smart data’ approach of not even using too much data, but using what’s available because there’s not yottabytes out there. I mean, even the databases of business transactions in the biggest global companies don’t hold what we’d call Big Data. But there’s enough (internal) transaction data to be able to establish through automated analysis, how the data flows through the organization, which is then turned into ‘process flow apparent’ schemes. Handy, but what then …? And there’s no need at all to call this stuff Big Data, either.
So, we conclude that ‘Big Data’ is just a tool, and the world is still packed with fools. Can we flip the tools back again to easily test hypotheses? Then we may even allow someinductive automated correlation searches, to hintat possible hidden causations that may be refined and tested before they can be useful.
Or we’ll remain stuck in the ‘My Method is More What Big Data Is Than Yours’.
So, I can confidently say: Size does matter, and what you do with it.
Next blog will be about how ‘predictive’ ‘analysis’ isn’t on both counts.
That’s last last Porto series so far. Actually beautified (?) pictures will follow sometime soon.
Porto, series II. One to follow.
Just a random selection of (almost) unedited Porto architecture. Series I
Just a question: Would anyone know some definitive source, or pointers, to discussions either formal or informal, on the logic behind double secrets i.e. situations where it is a secret that some secret exists ..?
Yerah, it’s relevant in particular now that some countries’ government seems to have failed to keep that double secret completely, but should be more systematically dealt with, I think, also re regular business-to-business (and -to-consumer) interactions.
So, if you have some neat write-ups of formal logic systems approaches, I’d be grateful. TIA!
In various discussions in my trade, and in general public, there seems to be a point of gravity around insufficiency of latter-day education. The troubles are many, but they fall into several distinct categories:
[Spoiler: the true point of this entry is somewhere near the bottom…]
- Children know way too little; much knowledge is lost. No, this is not about the simple learning of facts – it is already quite clear that that Nick Carr’s Shallows, shallow brains, have taken root and may only be undone by a big swing towards renewed rote memorization and a wholesale write-off of current generations. It is more about culture, the effects of too much freedom in education. Be aware that I tend to think that a great many children would be much happier (as adults, too) and society would benefit in a big, very big way if children were allowed to develop (start developing) their non-sports skills into Excellence much earlier than now. On the condition that general education of all sorts of subjects is maintained at quite a level too. We don’t want savants that in the end fail to make the genius grade and end up with nothing else. Conclusion/result/solution/requisite to make this possible: see below.
- School-leavers aren’t ready for any type of job available, if any are available. They’re too inexperienced, but also they know too little to understand the most basic, core things of how organizations operate. This complaint is of all times, yes. Solutions have been tried, but have run out of their time. Military or social service, socialist or bureaucratic (there is a distinction between those two, overlooked by those that don’t gauge the depth of the notions behind those simplifying labels!) in its nature, have worked here and there, but were unsustainable because of free riders (fire them for their lack of character!) and moreover because of lack of economic egalisation – rewards for services delivered, education and experience gained, [hi there useful Oxford comma] and societal gains haven’t been calculated, estimated or explicitly transferred hence remained too little visible. This can be solved by reinstating social service requirements on youth – but that wouldn’t necessarily go down well with any economically developed society where individualism has raged. Conclusion/result/solution/requisite to make this better: internships, plus see below.
- There’s so many variation within any profession and at all experience levels that education can only deliver base levels of professionals. More differentiated high-level education is required. But that would splinter course programs and may very well tie many too many young still direction- and destination-seeking students into studies and careers that they in the end are disappointed with. With, due to specialization, too little way out; all the places elsewhere have been taken by maybe a little bit less experienced, but better specialized, others. Conclusion/result/solution/requisite to make this better: see below.
What is causing all of this ..? My take is that education as a system is lagging more than ever the increases in complexity of society/societies. Way back in time, when times were slower, societal development could be caught up with through education in relatively sufficiently short time. New generations could be trained, in whatever way, mostly by training on or near the job. But the exponential speed-up of society’s business, and society’s complexity!, over the past centuries, has meant that the developments have become so quick and so unclear as to the one solution to catch all to cater for well-rounded members of society through education, that ever more feel they (individually and as a group) are lost, not able to improve themselves easily enough to cope with the new world order.
A peasant was a peasant, and only the extremely rare exception would ‘escape’. In times when a lord would look down on a peasant for the lack of education, but would regard the peasant as less of a lesser human being than generally assumed. The lord knew well his existence relied on peasants for food, and the purpose of his lordship (and not the purpose of his individual person) was to govern. Excesses apart, all could settle in their place and destiny, and needed not too much education because of this simpleness of society. That has changed…
To educate new generations today to be able to cope with the enormous complexity of society when they have grown up, may hence take much more education, in breath and in depth, than current day education systems allow. All the compulsory subjects that are stripped away at too low levels already (humanities, math, science) due to too low exit levels being allowed and due to too early specialization (without allowing savants to jump ahead in their specific curiosities of choice), should be taught to all at higher levels throughout.
It is sad or a privilege, but current-day youth may need to attend school much longer to be ready to function in society…!
To be able to arrange for all the variety of students that will be around (including some that may want to broaden their horizon, switch specializations or just out of hobby interest want to keep on educating themselves, at various levels of experience and seniority), course structures may have to be changed. In particular, packaging of education should be reconsidered. E.g., in accountancy, not all certified accountants need to know each and every petty IFRS rule by heart as it may have no relevance to their daily job at all during all of their career. Better offer modules!
But this should be doable, in particular with the use of technology (MOOCs et al; blended methods) – and with other parties (both private and public sector organizations) more aware and involved and transparent to allow to learn from the sideline how they operate. To ready the next generations better for their roles.
We still see quite a market for ‘ethical’ hacking out in the information security consulting world. However, if this type of activity should have a name, it would be wise the name would be descriptive, right? Rather than deceiting, swindling… We certainly won’t do that, sir, no way.
We’d call it ‘ethical’ if the purpose of it all would be to further the ethical goals of the ones doing it. Now take a look at who’s doing it. ‘Ethical’ hacking. And for what: Moneyyy! Hey indeed, it is the consultants and Big4 accountants that will only and exclusively do it for the money. You say No? Have you tried to talk off just an hour of their bills because the hacking that they do (more on that, below), serves some ethical purpose that they are happy to work on for free ..? A great many would consider doing just anything that pays and not doing any of it otherwise, the direct opposite, the utmost perversion of ‘ethical’ behaviour. Yet, that’s where we are with ‘ethical’ hacking.
Now for the ‘hacking’ part. Most of that is non-existent again. It’s primarily penetration testing using off-the-shelf freeware tools. Can be done from any phablet while driving, or it’s so outdated that it should serve no purpose. OK, you got me there. Even antiquated tools will find big holes in clients’ defenses that could and should have been fixed aeons ago, you know, decades of internet time (a couple of years in our time). And about that entering through a small hole: it’s still rather common to not go there, stay virgin and only do some port scanning.
So, [except for the few good men that do understand what they’re concocting] no hacking together one’s own new baby tools takes place. Yes, hacking, as in state-of-the-art coding (programming for those of you who have been hibernating the last decade) without the need for any bureaucrat’s architecture principles but with a deep understanding of languages’ strenghts and pitfalls.
So there we have it. Let loose some basic scanning tools, write up a fat report with some fancy letterhead and the usual suspects in findings; long live copy-paste, and bill ‘em for some ridiculous amount that goes straight into the coffers of some elderly gentlemen partners that don’t know how to use the Internet … except for, well, you know, searching for pictures.
Therefore, in search for a truthful descriptory name, let’s either revert to ‘penetration testing’ which for most men wouldn’t feel comfortable or even just ‘port scanning, or find some new designation. Mammon scanning, or so. But let’s not call it ‘ethical’ ‘hacking’ – two humongous wrongs don’t make a right.
Next up, maybe, a rephrased repost of @meneer’s #ditchcyber argument.
A thought just crossed my mind (oh, don’t worry, any such thing with value quickly passes through w/o leaving a trace… (?)):
When all will constantly have to prove, by lack of trust, isn’t that In Limbo Till Proving Not Guilty instead of innocence till proven otherwise?
Yeah, it’s an old theme, already a decade old maybe even. But in silence, this one creeps on. Out of the business woodworks, into the public sphere. And still, even after a decade, waaay too little debated.