Blog

Auditing the T of ETL for ML-AI is a human affair (still)

Wow that seems like two handfuls of something…

This, about a little part of the AI problem complex at the moment: ETL, as Extract, Transform, Load is usually called but few people care to remember, is Data Analytics’ talk for getting the data from some basic simple systems like SAP (E), pruning it and applying all sorts of other manual corrections (yikes! the T), and then slurping it into some Analytics (or visualisation) tool (L).

The L is easiest. Once one would have gone through the E part, plus some T sauce. Not this. The T sauce, this one, however is too hot to handle.

Since it is so error-prone. Error, as in accidental human failing at the medium, data, information and ethical levels. Error, as in failing in bad faith, at the same levels. Bias, anyone?
Problem being, when there would be e.g., bias in the source data, does one toss the respective data points out? Or might they contain valid information [Note: I take it you understand the most kindergarten basic concept of discrimination: distinguishing people on irrelevant criteria – what to do when the criteria are relevant!?] that one misses when dismissing the misfits outright, even before one has a chance to find out whether they had some role in the original data. Ethically-unwantedly biased or not.
And how good would a human be at detecting biases in source data ..? Not very. The very value of many a latter-day ML tool is in finding those hidden patterns that we miss. The experts miss.
Plus, how would you correct? If one were to leave out all cases where it turned out that bias played a role, you’ll end up with ideal cases only. But then your trained ML system will give results that are incomparable with (past) practice and for effective (… → later on) and efficient ML-trained rule-based systems development, one needs to optimise the fit with the past. That’s where your F1 score comes from. Mess with the source data, destroy the learning results.
Above, what to do with later continued learning where self-learing, unsupervised, is all the rage?

In between already, when one prunes to get out the rules one wants and dismisses others, why not turn all found patterns and rules, into a classical expert system, without or preferably with, fuzzy logic? Most explainable, transparent…

But above all, what is ethically unwanted ..? Apparently, the inputs lead to relevant outcomes as they turn out to exist in the source bias. The ML is there to detect such patterns; if there would be no relevance, no pattern would be calculated (sic; and leaving aside small-sample errors that aren’t biases but just errors).
Rather, who determines the vague ideas of what ‘society’ happens to consider just, for some time ..!? E.g., many Western societies have a core of values that are proclaimed to be based on Christianity; either some interpretation of how Jesus Christ’s words would apply in those later centuries, i.e., big fat interpretations on very often shady hidden intents, or hearkening back to the original intent as much as possible – where JS (as a full-on Jew) and those of any intellectual propensity above ignorant peasant level, would have found the idea that salvation or support for one’s neighbours would be available for non-Jews quite despicable, bordering on the unthinkable. The Golden Rule wouldn’t apply to anyone outside the close circle… All ‘ethical’ discussions since are very time- and circumstance-bound even when putting it mildly. As e.g., ‘democracy’ is so much on the decline around the world [fact]. And people don’t care about millions starving but do care about stray dogs in the same countries. Those against discrimination don’t bat an eye over discrimination of e.g., white male elderly (over 30) on the job market. When one wants to ‘correct’ a bias through some measure of equally low or worse moral value, one has no right to enforce [what one loathes oneself; or you’re in breach of the Golden Rule again].

OK, so much for difficulties with manual T. Now, …:

[Non-random colour scheme; Dublin]

The Accidental Pairer

[Kinda full book review]
Searching for a sort-of definitive guide on wine-cheese pairing, your intrepid reviewer came across Tasting Wine & Cheese, by Adam Centamore. Browsing through, scepticism started, to be returned to optimism in the introductory chapters. That are quite basic, but with hints of systematic treatment of the subject. And, given the introductory level, miss a few ‘rules’ here and there, in the wine, in the cheese, and in the wine-cheese sections.

Like, what grows together, goes together – yes that is mentioned twice in the book, but elsewhere, piecemeal. Yes, the introduction is about experimenting. Which is what one does when having more experience. If the book is for readers that want to jump into the experimentation, why is the introduction so simple and except a few without systematic rules? If the book is for beginners, why jump to experimentation without first handling the basic thoroughly? And then have near the end of the book with some (…) classic wine-cheese (and –codiment) combinations that all are (some of) the very grow together things that could have been treated upfront. Or in all chapters, where ‘the cheese that loves it’ typically is not from the same area. Often, because some American cheese is mentioned that, like many European brands (yes, often not types only but specific brands that very often can’t be had locally), will be available only here and there. Mostly there.

Except Stilton and port – the Anglophile angle on ‘grow together’, overlooking history where port and Dutch cheeses were styled together when anything out of France was impossible to have at the other side of the Channel. Yes, grows together is a way in which centuries of careful crafting of wines and cheeses to make the perfect fit is captured, so why not have this as the foundation for variation?

Except where Langres is thought ideal for Champagne. Yeah, if you mistake (fact) Langres for being in the Champagne instead of being between the utterly Bourgogne Chablis and the Côte d’Or. Like Durango is in Arizona because it’s close to the Four Corners. One, Langres isn’t in the Champagne of the wines nor of the département; two, Chaource comes from much closer, and is indeed the better pairing. Now, Chaource is mentioned but with Crémants in general where (with the better wines) Langres would be ideal. Why? Noting that with Chaource, having some Champagne in the fontaine makes the pairing outright sublime, yes.

There’s quite a number of outright errors as well. Just browsing around a little, far from complete:
Champagne can only be made of the three grapes ..? Foolish mistake, legally and practically; e.g., the Champagnes with Pinot Blanc that are resurfacing, have funky edges that are perfect for surprises and horizontal tastings. Not to mention Arbane, Pinot Gris and Petit Meslier. Blanc de Noirs being only from Pinot Noir? That’s just stupid. Forgetting to mention the tangerine edge of Noirs, in particular from the Meunier (try a Württemberger Lemberger or a still red from the Champagne and you’ll know), also doesn’t convey much experience with Champagnes outside the annual-million-bottle factory produce of the big companies. Wouldn’t call them ‘houses’, reserve that for the small sometimes artisanal honest producers.
White wine, even when crisp, at 4.4° to 10° …? Wine for which that is ideal, isn’t worth much is it?
Riesling not grown in France ..? Hello, Alsace! Yes tacos will not be served anywhere in Texas because it has in history never belonged to Mexico, right? Alsatian Riesling may have different characteristics but when declared not to exist, how can one tell ..?
Bell pepper aroma (a.k.a paprika everywhere except some local i.e. ‘American’ regions) is mentioned as a characteristic “often found” Cab Sav (p. 147). Right. When your Cab Sav (p.110, can anyone explain the out of order of the Tempranillo and Cab Sav in this section?) has paprika notes, it is not well-crafted, it is very-badly-crafted. Paprika is known to be an indication of serious errors in the making of the wine except in wines with clear Cab Franc influence or dominance and even then. Cab Franc badly made: Green paprika, biting. Cab Franc well-made (e.g., the Canadian ones! Try a Fort Berens or a Burrowing Owl and you’ll know): Mellow yellow paprika, perfect!

[And then one encounters the very rare Frappato brand one has on stock. Other Valle dell’Acate’s are much more interesting. Take a Cerasuolo de Vittoria, excellent on its own but with what cheese ..?]

In general, the wine characteristics are unhelpful as they are either either-or qua style, or incomplete. As mentioned on pp. 88-91 and many places elsewhere, the variety outstretches most of the ‘characteristics’—then why pick a few? And the wines list itself is very incomplete, slightly erratic, as well.
Also I expected to have a good cheese list with wine suggestions as well, not the accidental index lookup. What’s there now, is random examples as if not cheesemakers from some same region make cheeses that are as different from their neighbour’s as what winemakers make.

“In the end, though, pairing cheese and wine is an inexact science, if a science at all.” This, halfway through the book (p.85) so how’s that for in the end, seems to summarise the book quite well. Though one remains unconvinced the author actually intended the self-reference.
Is this book for beginners that need to learn the bars and chords and some simple music pieces, or is it for a seasoned jazz musician? One is lead to believe, both. Starting and ending with general tips & tricks is for the former. But the bewildering details suddenly without too many basics will throw off the beginners, and the lack of systematic treatment (jazz musicians train their bars much more often than beginners..!) will throw off the jazz musician.

Concluding: Not the definitive, systematic wine-cheese and cheese-wine pairing guide I was looking for. Not the guide you should be looking for, either.

We’ll rest with:

[Bayeux is just (?) over the edge of Camembertland, so still perfect with cider – not in the book]

Human’s Wrong, on many levels

Oh my. You just know the hordes of conzultants will have a field day babbling on and on about hoomans being the weakest, very weak link and need to be confined to at most one icon on their desktop because that’s what they need for work and no more.

Not only is the latter a blatant lie except for the handful (literally) of exceptions, but when you read this report (from this site, #17-0444), you’ll notice the same old #FAIL that you see about everywhere else.

Because the civil servant had a little time on his hands (is that a pleonasm or what?), he surfed a bit and some malware got onto his employer’s networks. Where all sorts of things went wrong (further).
Now the user is blamed, to not have utmost carefully followed a weakest of weak organisational control – organisational: the type one should only apply when all other options are technically infeasible – and now … nothing, IF the organisation hadn’t FAILed to apply all the other, stronger controls that would have easily prevented anything going pear-shaped.
Like, detection, correction, patching, to name a few. That were found to be so severely lacking that havoc ensued.
Yes, throw in a little reactance if you wish; the ‘it is forbidden hence must be good to try’ clearly is human so must be stamped out with violence or what.

But blaming that on the human for trying to keep his knowledge about his work up-to-date, probably against strict policy that no education is allowed as it would stand in the way of doing nothing while work is slow, and oh yeah passing by some other sites as well, is like driving into a concrete wall at full speed and then blaming the car manufacturer that the colour of your airbag doesn’t go with the interior.
Yes, that’s nonsense. So is the blame the human thing.

On a lighter note, the following are delicious:

[Yes, at Martinez’ outlet/showroom Amstelveen]

The Boring Wine Inn (3 @MichelinGuides stars)

[Repost, edited]
Maybe the relevance of Michelin stars, and accompanying guide, would increase if,
Apart from losing the numbing down, bland-isation of any food innovation by chefs to a style that is either Boring in itself already or a quick to wear off gimmick, that obtaining or even striving for a star(s) often turns into, just to please the judges and don’t forget a bucket of salt (yes, don’t lie to me)
The wine list was innovative, too. By which I don’t mean that the wine list couldn’t have some classics but where the all but most insanely priced items (all tend to sit at some 4-8 times cost anyway, extortionistly – bring that down to 2-3x and your profits go through the roof all the same) have something new. Fresh, beyond the well-trodden paths. The latter, being the average+ quality (if one’s lucky) of the go-with-the-flow (of up to and including last year’s fashion) appellations – with too many New World ones that are so cheap to get. Or from secondary regions of the Old World where the top can still be had at below-top priced – but still with according interestingness of taste. All from the mid-size to big merchants that don’t care anymore about their products and just want to shove as many boxes as they can at incumbent-tied-in margins. Their tell: Aggression towards any that want to offer something off the wine menu for connoisseurs.
As if the chef’s innovation that once was, is enough to stay at the level that once was, qua quality and freshness one wants from top rated places. News flash: The wines can add to the experience. Big time. If one doesn’t see that, well, off you go.
And it also goes for the wine pairing / selection by the glass; how better to showcase one’s innovative wine choices in perfect matches per course ..?
Why not feel free to ask customers for their wine sophistication and preferences? Only a handful of sommeliers seem to understand. Almost all, at the true top places, without food stars.
[One notable exception encountered, in a long life of many attempts… And this one. This one, nearly there; (somewhat) interesting wines but then, with that tad too little wine knowledge transferred and suffering from the above Salt issue plus some very minor other things (though this is a showcase of Don’t Believe the Reviews, peasants flaunting their … lack of knowledge and understanding. Maybe this one may join this class if it’s not my browser through which the link is down…]
[Elsewhere: this place. A drain on your balance, but then …! What great (9) dishes, what excellent wine choices and pairing, even in the ‘simple’ recommended wine pairing.]

And don’t come telling that when the food has to shine, the wines shouldn’t need to shine, too; it would be either-or not AND. That’s just nonsense for n00bs with underdeveloped taste.

So that in the end we may see the return of the true relevance of stars, and see less overhyped craze over joints that suddenly get overbooked way too long in advance and start to double their prices – for nothing of the new but only the already mundane that satisfies only those running after Keeping Up With The Jones’ (“Do you know this-and-that [ill-pronounced] winemaker? Isn’t he great oh we once tasted his [name a random year], I’m on a personal basis with him because I was at the camping on the mudfield next to his’.” – no joke, heard too often in literal or similar ways…) places. Ruining it for true believers from the humble beginnings.

The latter Dutch link brings me to another point by the way (as mentioned in the Salt posts I believe – haven’t read back my own posts ;-/ ): [wow so many in a row] Where are the veggies …!?!? The top of the top does cook with lots of those (take this one, and this one even almost more), and create a feeling of correlation between (being able to) top cooking with vegetables and completely outflanking the stars.

Oh well, and:
[If you know where, you know what I mean. Wink wink and all. Bourgogne yes but which Clos’ ?]

Preferably inaccurate AI

The as ever poignant Seth Godin (@thisissethsblog) pointed to a problem in browsing; not knowing how to make a system not reductionist/analytical but synthesist/creative.

Would blurring give suitable results ..? As in: Not overly perfect fitting in ML, not going for the one outcome with the highest confidence level in AI. Applying trained AI systems outside their (known?) boundaries which may be equivalent to using ‘wrong’ (imperfect domain overlapping) or ‘weak’ (high noise ratio) training sets. Possibly building hybrid systems, not only with straightforward ML but also applying some form of fuzzy logic and expert systems as tool to manage a tool, to put some randomisation onto the rote learning.

Unsure what measure one would use to determine a system is effective; how to determine the degree of randomness of the system changes randomly over time (2nd – 5th++ derivatives still random etc., I’d prefer to see some ‘Chaotic’/fractal behaviour there).

The trade-off then of course being that to train a system that advanced will for the time being take so much effort it might be better to train a bunch of humans, of moderate intelligence but aren’t they all, to do the same fuzzification for you. A sort of Mechanical Turk job. Might be cheaper. By being so cheap, it may prevent a ‘disruption’ (meh) that costs too much to develop into something even ‘minimally’ viable. But man(ual work) will never get as widely copyable as some ‘software’ app on my mobile.

That is, IF by next year we’re still using ‘mobile’s and not call them completely different names because they have changed usage considerably. ‘Smart device’ is gaining a foothold, now that the ‘phone’ part is dropping from the smartphone (qua use, relatively) or is it ..?

So Seth is stuck. Can try to not be too scientific about it, and just be creative. When you realise you’re in a filter bubble, that is because you ran into its wall and you’ve already cracked it to get out, like an operational equivalent of Plato’s cave now I think of it. [Yes, the latter being much more abstract and muchx harder to truly understand]
Maybe its the direction that counts. Either let your own fear of the unknown, irregular make you pull your straighjacket ever tighter (and making you look and behave ever more like a fool), or realise you still control whether you’re pulling and let go.

Oh and then there’s the pretty picture for your viewing pleasure:

[Ah, ha, no that’s not transparency or so; Barcelona]

Lessons learned – Not, nor used

To all project management related types: Did you do your Lessons learned session at the end of your last project? In earnest? Extensively and deep enough to be potentially useful in future projects? Seriously? In all your previous projects?
I just don’t believe you.

And, where did the lessons learned show up in the projects you did later?

Nowhere, I guess. Only where project set-up standards require the most excruciating detail in planning and activities – the kind that no-one will follow in practice since they’re such an overload of bôle-S yes that’s where the word BS derives from (2nd sentence of this follow the link and now you have already learned something). There, we sometimes (not even always) encounter some vague reference to do include past projects’ learnings (bit like this).
But that’s the exception. In the full-detail standards. Qua guideline at the outset.

Common practice (be honest) … not so much. Pino or NePino. Well, …

Lessons will be repeated until they are learned.

Yes, even if (sic) auditors check on your project management at the start, to see that the project is well set up to achieve the objectives – and auditors stick along with project execution to track proper risk control in the project re project governance, progress, and the other half of audit work the deliverables – hardly ever does that cover checking that all the right and applicable (…) lessons learned from past projects (both the preventable errors/slip-ups and the successful risk mitigation actions) are in fact included in this new project under study.

Simple conclusion: To greatly enhance your future projects’ chance of success, include past learnings in earnests.
And require auditors to do the same when they audit projects, e.g., by putting the blame on auditors when projects fail at known, mitigatable risks, for their lack of due advice.

Now, to cheer up:

[Use your Vision; Porto]

ePriv heating up

Just for the record; you noticed how things are heating up regarding the ePrivacy Directive forthcoming ..?

No wonder. Where GDPR was just a consolidation of existing rules and regulations and shouldn’t have had too much impact anywhere apart from the SOx-style totalitarian bureaucratic paperwork requirements (or you had a backlog on perfectly reasonable information security already, the resolution of which may have been pushed by GDPR but wasn’t anything new due to it) OR, if you have been made to believe otherwise, you can get your money back from your ‘consultants’ due to wrong advice and yes this sentence is getting quite long.
This ePrivacy thing however will cost some businesses (certainly not all) some of their business. By clipping the morally unjust parts of data usage; long overdue anyway.

My only surprise is that the current protests by parti pris lobbyists (for the wrong cause) took so long after 25 May 2018 to pick up steam.
We’ll see.

And, of course, ..:

[For no apparent reason whatsoever; Ronchamps]

Accountants’ morale: Don’t let them near a daycare facility

OK, that title may have been a rather cheap bait, but you took it.
As is now discussed, it seems that the morale of accountants (here in NL – one can guess that elsewhere the same goes to varying degrees) is under scrutiny, after time and time again non-performance of proper straight back / spineless financial audit work, leads to upheaval over, among others, partner fees – with underling fees in tow, too – versus morality.

The comparison with daycare timeliness morality, so elegantly explained here, is obvious – if you want to see it.
Like, when the decision was: ‘I work for the general public, so I make sure to serve their best interest. I will be impartial to “client’s” (quod non) pressures – do I sign or not’, the outcome was as intended by societal structures. Just have a look back at agency theories.
When now the decision is: ‘The ‘client’ is the one who pays me. I decide in their favour of course. The golden rule: Whoever has the gold, makes the rules’, the outcome is as intended by the principals. Only the role of the principal has factually gone over to what were the agents to be checked upon, to be kept at a leash.

And when the money is that good, very much literally like the title of this yes feel the empowerment of the opening part, there indeed is no return to anything like moral value and being a firm pillar of the establishment through one’s qualities not one’s wallet. Worth oppressing the underlings for, right?

Should be locked up, that lot. And:

[If only they’d go to the opera more, they’d learn about morality. Valencia]

Overeffectively fair and transparent

From the scares of ‘AI’ HR algorithms that result in biased outcomes … some musings; your (constructive ..!) comments appreciated:

Biases (i.e., errors!) in input of course will result in biases in outcomes. Overstretched classification statistics (what the ‘algorithms’ mainly do, all too often) of course will lead to improper (biased) outcomes.
Those can be solved. With some difficulty, as here.
But, apart from the obvious misses (link: et al., more have been known in particular in the field of credit scoring), what if the algorithm is 100,0% unbiased and it finds socially-unwanted outcomes ..?

To use the above miss as an example (not because I believe this might be the case but only as an example that mysogenists might use …): What if it turns out there’s hidden traits in some gender’s workers that results in their eventual performance lacking compared to other candidates’ ..? E.g., possibly becoming pregnant may be picked up as an indicator of future maternity leave, leading to productivity losses and possible costs of hiring temp replacements, flatter experience growth curves etc. No this shouldn’t be allowed to be let factored in – although the company that prunes their system to not let this count, will not get any return for their consideration which makes the co less financially competitive than their cheating competitors that are still out there quite a lot and in the end, only the financials count. Don’t cheat yourself in believing otherwise.
But when one starts tweaking away the unwanted outcomes, where does one end? And who checks that the tweaking is correct and unbiased, and with continued-learning systems does not creep back in (and at what levels of deviation will one re-balance)? Who checks that the inputs are correct anyway, both for training and in future use? Because, on hard requirements the biases can be implicit but hard, like age biases. Any system will pick up that ‘3 to 5 years experience’ will mean all seriously-experienced, extremely fast and efficient and probably highly motivated, loyal and non-career-chasing workers over 40 will not be considered. As is done by human selectors now (as is outright illegal but what do you do), often working in a much more inflexible-algorithmic fashion than the better AI systems.

Tweaking the inputs, will result in ineffective systems and the whole exercise was to find ‘rules’ that were not easily derterminable, right? When you start on this road, your system will deliver whatever you want it to, not necessarily unbiased. And probably very intransparent.
Tweaking the system or the outputs is introducing biases of your own, badly controllable. Hence probably, very intransparent.

Tweaking doesn’t seem like a path to follow. Strange. (..?)

And how does one get societally-wanted biases in, like quota? They are unfair against others – how can I help that by accident, I was born a white male!? Should I be punished for that by being considered less than others? Because that is what happens, for a fact and if you deny that you’re just incompetent to discuss the whole subject. That I am over 50 and a white male should not be allowed to be Bad Luck or you’re much, very very much worse than your average mysogenist, you ageist sexist racist ..!

Now, can we first changes cultures, one person at a time, and then train systems to mimick/outdo humans ..? That’d be great.

Edited to add: Marc Teerlink’s view on things (in Dutch no less – how much Intelligence is needed to understand that? Give Google Translate a spin and … not much I fear).

And so is:

[For 10 points, comment on the femininity of the curves but also the masculinity of the intended imposing posture; Amsterdam – yes I mean the building or are you actively searching for the wrong details?]

Maverisk / Étoiles du Nord