Welcome to TiddlyWiki created by Jeremy Ruston, Copyright © 2007 UnaMesa Association
<!--{{{-->
<link rel='alternate' type='application/rss+xml' title='RSS' href='index.xml' />
<!--}}}-->
Background: #fff
Foreground: #000
PrimaryPale: #8cf
PrimaryLight: #18f
PrimaryMid: #04b
PrimaryDark: #014
SecondaryPale: #ffc
SecondaryLight: #fe8
SecondaryMid: #db4
SecondaryDark: #841
TertiaryPale: #eee
TertiaryLight: #ccc
TertiaryMid: #999
TertiaryDark: #666
Error: #f88
/*{{{*/
body {background:[[ColorPalette::Background]]; color:[[ColorPalette::Foreground]];}
a {color:[[ColorPalette::PrimaryMid]];}
a:hover {background-color:[[ColorPalette::PrimaryMid]]; color:[[ColorPalette::Background]];}
a img {border:0;}
h1,h2,h3,h4,h5,h6 {color:[[ColorPalette::SecondaryDark]]; background:transparent;}
h1 {border-bottom:2px solid [[ColorPalette::TertiaryLight]];}
h2,h3 {border-bottom:1px solid [[ColorPalette::TertiaryLight]];}
.button {color:[[ColorPalette::PrimaryDark]]; border:1px solid [[ColorPalette::Background]];}
.button:hover {color:[[ColorPalette::PrimaryDark]]; background:[[ColorPalette::SecondaryLight]]; border-color:[[ColorPalette::SecondaryMid]];}
.button:active {color:[[ColorPalette::Background]]; background:[[ColorPalette::SecondaryMid]]; border:1px solid [[ColorPalette::SecondaryDark]];}
.header {background:[[ColorPalette::PrimaryMid]];}
.headerShadow {color:[[ColorPalette::Foreground]];}
.headerShadow a {font-weight:normal; color:[[ColorPalette::Foreground]];}
.headerForeground {color:[[ColorPalette::Background]];}
.headerForeground a {font-weight:normal; color:[[ColorPalette::PrimaryPale]];}
.tabSelected{color:[[ColorPalette::PrimaryDark]];
background:[[ColorPalette::TertiaryPale]];
border-left:1px solid [[ColorPalette::TertiaryLight]];
border-top:1px solid [[ColorPalette::TertiaryLight]];
border-right:1px solid [[ColorPalette::TertiaryLight]];
}
.tabUnselected {color:[[ColorPalette::Background]]; background:[[ColorPalette::TertiaryMid]];}
.tabContents {color:[[ColorPalette::PrimaryDark]]; background:[[ColorPalette::TertiaryPale]]; border:1px solid [[ColorPalette::TertiaryLight]];}
.tabContents .button {border:0;}
#sidebar {}
#sidebarOptions input {border:1px solid [[ColorPalette::PrimaryMid]];}
#sidebarOptions .sliderPanel {background:[[ColorPalette::PrimaryPale]];}
#sidebarOptions .sliderPanel a {border:none;color:[[ColorPalette::PrimaryMid]];}
#sidebarOptions .sliderPanel a:hover {color:[[ColorPalette::Background]]; background:[[ColorPalette::PrimaryMid]];}
#sidebarOptions .sliderPanel a:active {color:[[ColorPalette::PrimaryMid]]; background:[[ColorPalette::Background]];}
.wizard {background:[[ColorPalette::PrimaryPale]]; border:1px solid [[ColorPalette::PrimaryMid]];}
.wizard h1 {color:[[ColorPalette::PrimaryDark]]; border:none;}
.wizard h2 {color:[[ColorPalette::Foreground]]; border:none;}
.wizardStep {background:[[ColorPalette::Background]]; color:[[ColorPalette::Foreground]];
border:1px solid [[ColorPalette::PrimaryMid]];}
.wizardStep.wizardStepDone {background:[[ColorPalette::TertiaryLight]];}
.wizardFooter {background:[[ColorPalette::PrimaryPale]];}
.wizardFooter .status {background:[[ColorPalette::PrimaryDark]]; color:[[ColorPalette::Background]];}
.wizard .button {color:[[ColorPalette::Foreground]]; background:[[ColorPalette::SecondaryLight]]; border: 1px solid;
border-color:[[ColorPalette::SecondaryPale]] [[ColorPalette::SecondaryDark]] [[ColorPalette::SecondaryDark]] [[ColorPalette::SecondaryPale]];}
.wizard .button:hover {color:[[ColorPalette::Foreground]]; background:[[ColorPalette::Background]];}
.wizard .button:active {color:[[ColorPalette::Background]]; background:[[ColorPalette::Foreground]]; border: 1px solid;
border-color:[[ColorPalette::PrimaryDark]] [[ColorPalette::PrimaryPale]] [[ColorPalette::PrimaryPale]] [[ColorPalette::PrimaryDark]];}
.wizard .notChanged {background:transparent;}
.wizard .changedLocally {background:#80ff80;}
.wizard .changedServer {background:#8080ff;}
.wizard .changedBoth {background:#ff8080;}
.wizard .notFound {background:#ffff80;}
.wizard .putToServer {background:#ff80ff;}
.wizard .gotFromServer {background:#80ffff;}
#messageArea {border:1px solid [[ColorPalette::SecondaryMid]]; background:[[ColorPalette::SecondaryLight]]; color:[[ColorPalette::Foreground]];}
#messageArea .button {color:[[ColorPalette::PrimaryMid]]; background:[[ColorPalette::SecondaryPale]]; border:none;}
.popupTiddler {background:[[ColorPalette::TertiaryPale]]; border:2px solid [[ColorPalette::TertiaryMid]];}
.popup {background:[[ColorPalette::TertiaryPale]]; color:[[ColorPalette::TertiaryDark]]; border-left:1px solid [[ColorPalette::TertiaryMid]]; border-top:1px solid [[ColorPalette::TertiaryMid]]; border-right:2px solid [[ColorPalette::TertiaryDark]]; border-bottom:2px solid [[ColorPalette::TertiaryDark]];}
.popup hr {color:[[ColorPalette::PrimaryDark]]; background:[[ColorPalette::PrimaryDark]]; border-bottom:1px;}
.popup li.disabled {color:[[ColorPalette::TertiaryMid]];}
.popup li a, .popup li a:visited {color:[[ColorPalette::Foreground]]; border: none;}
.popup li a:hover {background:[[ColorPalette::SecondaryLight]]; color:[[ColorPalette::Foreground]]; border: none;}
.popup li a:active {background:[[ColorPalette::SecondaryPale]]; color:[[ColorPalette::Foreground]]; border: none;}
.popupHighlight {background:[[ColorPalette::Background]]; color:[[ColorPalette::Foreground]];}
.listBreak div {border-bottom:1px solid [[ColorPalette::TertiaryDark]];}
.tiddler .defaultCommand {font-weight:bold;}
.shadow .title {color:[[ColorPalette::TertiaryDark]];}
.title {color:[[ColorPalette::SecondaryDark]];}
.subtitle {color:[[ColorPalette::TertiaryDark]];}
.toolbar {color:[[ColorPalette::PrimaryMid]];}
.toolbar a {color:[[ColorPalette::TertiaryLight]];}
.selected .toolbar a {color:[[ColorPalette::TertiaryMid]];}
.selected .toolbar a:hover {color:[[ColorPalette::Foreground]];}
.tagging, .tagged {border:1px solid [[ColorPalette::TertiaryPale]]; background-color:[[ColorPalette::TertiaryPale]];}
.selected .tagging, .selected .tagged {background-color:[[ColorPalette::TertiaryLight]]; border:1px solid [[ColorPalette::TertiaryMid]];}
.tagging .listTitle, .tagged .listTitle {color:[[ColorPalette::PrimaryDark]];}
.tagging .button, .tagged .button {border:none;}
.footer {color:[[ColorPalette::TertiaryLight]];}
.selected .footer {color:[[ColorPalette::TertiaryMid]];}
.sparkline {background:[[ColorPalette::PrimaryPale]]; border:0;}
.sparktick {background:[[ColorPalette::PrimaryDark]];}
.error, .errorButton {color:[[ColorPalette::Foreground]]; background:[[ColorPalette::Error]];}
.warning {color:[[ColorPalette::Foreground]]; background:[[ColorPalette::SecondaryPale]];}
.lowlight {background:[[ColorPalette::TertiaryLight]];}
.zoomer {background:none; color:[[ColorPalette::TertiaryMid]]; border:3px solid [[ColorPalette::TertiaryMid]];}
.imageLink, #displayArea .imageLink {background:transparent;}
.annotation {background:[[ColorPalette::SecondaryLight]]; color:[[ColorPalette::Foreground]]; border:2px solid [[ColorPalette::SecondaryMid]];}
.viewer .listTitle {list-style-type:none; margin-left:-2em;}
.viewer .button {border:1px solid [[ColorPalette::SecondaryMid]];}
.viewer blockquote {border-left:3px solid [[ColorPalette::TertiaryDark]];}
.viewer table, table.twtable {border:2px solid [[ColorPalette::TertiaryDark]];}
.viewer th, .viewer thead td, .twtable th, .twtable thead td {background:[[ColorPalette::SecondaryMid]]; border:1px solid [[ColorPalette::TertiaryDark]]; color:[[ColorPalette::Background]];}
.viewer td, .viewer tr, .twtable td, .twtable tr {border:1px solid [[ColorPalette::TertiaryDark]];}
.viewer pre {border:1px solid [[ColorPalette::SecondaryLight]]; background:[[ColorPalette::SecondaryPale]];}
.viewer code {color:[[ColorPalette::SecondaryDark]];}
.viewer hr {border:0; border-top:dashed 1px [[ColorPalette::TertiaryDark]]; color:[[ColorPalette::TertiaryDark]];}
.highlight, .marked {background:[[ColorPalette::SecondaryLight]];}
.editor input {border:1px solid [[ColorPalette::PrimaryMid]];}
.editor textarea {border:1px solid [[ColorPalette::PrimaryMid]]; width:100%;}
.editorFooter {color:[[ColorPalette::TertiaryMid]];}
#backstageArea {background:[[ColorPalette::Foreground]]; color:[[ColorPalette::TertiaryMid]];}
#backstageArea a {background:[[ColorPalette::Foreground]]; color:[[ColorPalette::Background]]; border:none;}
#backstageArea a:hover {background:[[ColorPalette::SecondaryLight]]; color:[[ColorPalette::Foreground]]; }
#backstageArea a.backstageSelTab {background:[[ColorPalette::Background]]; color:[[ColorPalette::Foreground]];}
#backstageButton a {background:none; color:[[ColorPalette::Background]]; border:none;}
#backstageButton a:hover {background:[[ColorPalette::Foreground]]; color:[[ColorPalette::Background]]; border:none;}
#backstagePanel {background:[[ColorPalette::Background]]; border-color: [[ColorPalette::Background]] [[ColorPalette::TertiaryDark]] [[ColorPalette::TertiaryDark]] [[ColorPalette::TertiaryDark]];}
.backstagePanelFooter .button {border:none; color:[[ColorPalette::Background]];}
.backstagePanelFooter .button:hover {color:[[ColorPalette::Foreground]];}
#backstageCloak {background:[[ColorPalette::Foreground]]; opacity:0.6; filter:'alpha(opacity:60)';}
/*}}}*/
/*{{{*/
* html .tiddler {height:1%;}
body {font-size:.75em; font-family:arial,helvetica; margin:0; padding:0;}
h1,h2,h3,h4,h5,h6 {font-weight:bold; text-decoration:none;}
h1,h2,h3 {padding-bottom:1px; margin-top:1.2em;margin-bottom:0.3em;}
h4,h5,h6 {margin-top:1em;}
h1 {font-size:1.35em;}
h2 {font-size:1.25em;}
h3 {font-size:1.1em;}
h4 {font-size:1em;}
h5 {font-size:.9em;}
hr {height:1px;}
a {text-decoration:none;}
dt {font-weight:bold;}
ol {list-style-type:decimal;}
ol ol {list-style-type:lower-alpha;}
ol ol ol {list-style-type:lower-roman;}
ol ol ol ol {list-style-type:decimal;}
ol ol ol ol ol {list-style-type:lower-alpha;}
ol ol ol ol ol ol {list-style-type:lower-roman;}
ol ol ol ol ol ol ol {list-style-type:decimal;}
.txtOptionInput {width:11em;}
#contentWrapper .chkOptionInput {border:0;}
.externalLink {text-decoration:underline;}
.indent {margin-left:3em;}
.outdent {margin-left:3em; text-indent:-3em;}
code.escaped {white-space:nowrap;}
.tiddlyLinkExisting {font-weight:bold;}
.tiddlyLinkNonExisting {font-style:italic;}
/* the 'a' is required for IE, otherwise it renders the whole tiddler in bold */
a.tiddlyLinkNonExisting.shadow {font-weight:bold;}
#mainMenu .tiddlyLinkExisting,
#mainMenu .tiddlyLinkNonExisting,
#sidebarTabs .tiddlyLinkNonExisting {font-weight:normal; font-style:normal;}
#sidebarTabs .tiddlyLinkExisting {font-weight:bold; font-style:normal;}
.header {position:relative;}
.header a:hover {background:transparent;}
.headerShadow {position:relative; padding:4.5em 0em 1em 1em; left:-1px; top:-1px;}
.headerForeground {position:absolute; padding:4.5em 0em 1em 1em; left:0px; top:0px;}
.siteTitle {font-size:3em;}
.siteSubtitle {font-size:1.2em;}
#mainMenu {position:absolute; left:0; width:10em; text-align:right; line-height:1.6em; padding:1.5em 0.5em 0.5em 0.5em; font-size:1.1em;}
#sidebar {position:absolute; right:3px; width:16em; font-size:.9em;}
#sidebarOptions {padding-top:0.3em;}
#sidebarOptions a {margin:0em 0.2em; padding:0.2em 0.3em; display:block;}
#sidebarOptions input {margin:0.4em 0.5em;}
#sidebarOptions .sliderPanel {margin-left:1em; padding:0.5em; font-size:.85em;}
#sidebarOptions .sliderPanel a {font-weight:bold; display:inline; padding:0;}
#sidebarOptions .sliderPanel input {margin:0 0 .3em 0;}
#sidebarTabs .tabContents {width:15em; overflow:hidden;}
.wizard {padding:0.1em 1em 0em 2em;}
.wizard h1 {font-size:2em; font-weight:bold; background:none; padding:0em 0em 0em 0em; margin:0.4em 0em 0.2em 0em;}
.wizard h2 {font-size:1.2em; font-weight:bold; background:none; padding:0em 0em 0em 0em; margin:0.4em 0em 0.2em 0em;}
.wizardStep {padding:1em 1em 1em 1em;}
.wizard .button {margin:0.5em 0em 0em 0em; font-size:1.2em;}
.wizardFooter {padding:0.8em 0.4em 0.8em 0em;}
.wizardFooter .status {padding:0em 0.4em 0em 0.4em; margin-left:1em;}
.wizard .button {padding:0.1em 0.2em 0.1em 0.2em;}
#messageArea {position:fixed; top:2em; right:0em; margin:0.5em; padding:0.5em; z-index:2000; _position:absolute;}
.messageToolbar {display:block; text-align:right; padding:0.2em 0.2em 0.2em 0.2em;}
#messageArea a {text-decoration:underline;}
.tiddlerPopupButton {padding:0.2em 0.2em 0.2em 0.2em;}
.popupTiddler {position: absolute; z-index:300; padding:1em 1em 1em 1em; margin:0;}
.popup {position:absolute; z-index:300; font-size:.9em; padding:0; list-style:none; margin:0;}
.popup .popupMessage {padding:0.4em;}
.popup hr {display:block; height:1px; width:auto; padding:0; margin:0.2em 0em;}
.popup li.disabled {padding:0.4em;}
.popup li a {display:block; padding:0.4em; font-weight:normal; cursor:pointer;}
.listBreak {font-size:1px; line-height:1px;}
.listBreak div {margin:2px 0;}
.tabset {padding:1em 0em 0em 0.5em;}
.tab {margin:0em 0em 0em 0.25em; padding:2px;}
.tabContents {padding:0.5em;}
.tabContents ul, .tabContents ol {margin:0; padding:0;}
.txtMainTab .tabContents li {list-style:none;}
.tabContents li.listLink { margin-left:.75em;}
#contentWrapper {display:block;}
#splashScreen {display:none;}
#displayArea {margin:1em 17em 0em 14em;}
.toolbar {text-align:right; font-size:.9em;}
.tiddler {padding:1em 1em 0em 1em;}
.missing .viewer,.missing .title {font-style:italic;}
.title {font-size:1.6em; font-weight:bold;}
.missing .subtitle {display:none;}
.subtitle {font-size:1.1em;}
.tiddler .button {padding:0.2em 0.4em;}
.tagging {margin:0.5em 0.5em 0.5em 0; float:left; display:none;}
.isTag .tagging {display:block;}
.tagged {margin:0.5em; float:right;}
.tagging, .tagged {font-size:0.9em; padding:0.25em;}
.tagging ul, .tagged ul {list-style:none; margin:0.25em; padding:0;}
.tagClear {clear:both;}
.footer {font-size:.9em;}
.footer li {display:inline;}
.annotation {padding:0.5em; margin:0.5em;}
* html .viewer pre {width:99%; padding:0 0 1em 0;}
.viewer {line-height:1.4em; padding-top:0.5em;}
.viewer .button {margin:0em 0.25em; padding:0em 0.25em;}
.viewer blockquote {line-height:1.5em; padding-left:0.8em;margin-left:2.5em;}
.viewer ul, .viewer ol {margin-left:0.5em; padding-left:1.5em;}
.viewer table, table.twtable {border-collapse:collapse; margin:0.8em 1.0em;}
.viewer th, .viewer td, .viewer tr,.viewer caption,.twtable th, .twtable td, .twtable tr,.twtable caption {padding:3px;}
table.listView {font-size:0.85em; margin:0.8em 1.0em;}
table.listView th, table.listView td, table.listView tr {padding:0px 3px 0px 3px;}
.viewer pre {padding:0.5em; margin-left:0.5em; font-size:1.2em; line-height:1.4em; overflow:auto;}
.viewer code {font-size:1.2em; line-height:1.4em;}
.editor {font-size:1.1em;}
.editor input, .editor textarea {display:block; width:100%; font:inherit;}
.editorFooter {padding:0.25em 0em; font-size:.9em;}
.editorFooter .button {padding-top:0px; padding-bottom:0px;}
.fieldsetFix {border:0; padding:0; margin:1px 0px 1px 0px;}
.sparkline {line-height:1em;}
.sparktick {outline:0;}
.zoomer {font-size:1.1em; position:absolute; overflow:hidden;}
.zoomer div {padding:1em;}
* html #backstage {width:99%;}
* html #backstageArea {width:99%;}
#backstageArea {display:none; position:relative; overflow: hidden; z-index:150; padding:0.3em 0.5em 0.3em 0.5em;}
#backstageToolbar {position:relative;}
#backstageArea a {font-weight:bold; margin-left:0.5em; padding:0.3em 0.5em 0.3em 0.5em;}
#backstageButton {display:none; position:absolute; z-index:175; top:0em; right:0em;}
#backstageButton a {padding:0.1em 0.4em 0.1em 0.4em; margin:0.1em 0.1em 0.1em 0.1em;}
#backstage {position:relative; width:100%; z-index:50;}
#backstagePanel {display:none; z-index:100; position:absolute; width:90%; margin:0em 3em 0em 3em; padding:1em 1em 1em 1em;}
.backstagePanelFooter {padding-top:0.2em; float:right;}
.backstagePanelFooter a {padding:0.2em 0.4em 0.2em 0.4em;}
#backstageCloak {display:none; z-index:20; position:absolute; width:100%; height:100px;}
.whenBackstage {display:none;}
.backstageVisible .whenBackstage {display:block;}
/*}}}*/
/***
StyleSheet for use when a translation requires any css style changes.
This StyleSheet can be used directly by languages such as Chinese, Japanese and Korean which need larger font sizes.
***/
/*{{{*/
body {font-size:0.8em;}
#sidebarOptions {font-size:1.05em;}
#sidebarOptions a {font-style:normal;}
#sidebarOptions .sliderPanel {font-size:0.95em;}
.subtitle {font-size:0.8em;}
.viewer table.listView {font-size:0.95em;}
/*}}}*/
/*{{{*/
@media print {
#mainMenu, #sidebar, #messageArea, .toolbar, #backstageButton, #backstageArea {display: none ! important;}
#displayArea {margin: 1em 1em 0em 1em;}
/* Fixes a feature in Firefox 1.5.0.2 where print preview displays the noscript content */
noscript {display:none;}
}
/*}}}*/
<!--{{{-->
<div class='header' macro='gradient vert [[ColorPalette::PrimaryLight]] [[ColorPalette::PrimaryMid]]'>
<div class='headerShadow'>
<span class='siteTitle' refresh='content' tiddler='SiteTitle'></span>
<span class='siteSubtitle' refresh='content' tiddler='SiteSubtitle'></span>
</div>
<div class='headerForeground'>
<span class='siteTitle' refresh='content' tiddler='SiteTitle'></span>
<span class='siteSubtitle' refresh='content' tiddler='SiteSubtitle'></span>
</div>
</div>
<div id='mainMenu' refresh='content' tiddler='MainMenu'></div>
<div id='sidebar'>
<div id='sidebarOptions' refresh='content' tiddler='SideBarOptions'></div>
<div id='sidebarTabs' refresh='content' force='true' tiddler='SideBarTabs'></div>
</div>
<div id='displayArea'>
<div id='messageArea'></div>
<div id='tiddlerDisplay'></div>
</div>
<!--}}}-->
<!--{{{-->
<div class='toolbar' macro='toolbar [[ToolbarCommands::ViewToolbar]]'></div>
<div class='title' macro='view title'></div>
<div class='subtitle'><span macro='view modifier link'></span>, <span macro='view modified date'></span> (<span macro='message views.wikified.createdPrompt'></span> <span macro='view created date'></span>)</div>
<div class='tagging' macro='tagging'></div>
<div class='tagged' macro='tags'></div>
<div class='viewer' macro='view text wikified'></div>
<div class='tagClear'></div>
<!--}}}-->
<!--{{{-->
<div class='toolbar' macro='toolbar [[ToolbarCommands::EditToolbar]]'></div>
<div class='title' macro='view title'></div>
<div class='editor' macro='edit title'></div>
<div macro='annotations'></div>
<div class='editor' macro='edit text'></div>
<div class='editor' macro='edit tags'></div><div class='editorFooter'><span macro='message views.editor.tagPrompt'></span><span macro='tagChooser'></span></div>
<!--}}}-->
To get started with this blank TiddlyWiki, you'll need to modify the following tiddlers:
* SiteTitle & SiteSubtitle: The title and subtitle of the site, as shown above (after saving, they will also appear in the browser title bar)
* MainMenu: The menu (usually on the left)
* DefaultTiddlers: Contains the names of the tiddlers that you want to appear when the TiddlyWiki is opened
You'll also need to enter your username for signing your edits: <<option txtUserName>>
These InterfaceOptions for customising TiddlyWiki are saved in your browser
Your username for signing your edits. Write it as a WikiWord (eg JoeBloggs)
<<option txtUserName>>
<<option chkSaveBackups>> SaveBackups
<<option chkAutoSave>> AutoSave
<<option chkRegExpSearch>> RegExpSearch
<<option chkCaseSensitiveSearch>> CaseSensitiveSearch
<<option chkAnimate>> EnableAnimations
----
Also see AdvancedOptions
[[QR|Quant Results]] emerged from the Analytics department at [[Adknowledge, Inc.|http://adknowledge.com/]]
While the process described here applies to a wide range of analytics teams, our work specifically focuses on large-scale data for an online advertising network. At the heart of our business, we warehouse terabytes of customer data, collected from real-time feeds. Then we perform [[analysis|Analysis]] and on that data to create [[collaborative filters|http://en.wikipedia.org/wiki/Collaborative_filtering]] and other kinds of [[predictive models|Modeling]]. Those in turn drive [[control systems|http://en.wikibooks.org/wiki/Control_Systems]] to make hundreds of millions of automated business decisions each day. Our company revenue streams are almost all based on that pattern.
This area of data analytics is often called [[Big Data]], based on experiences at Amazon, eBay, Google, Yahoo!, etc. -- dating from the 1997 holiday sales season. It poses several challenges for which most analysts, programmers, and managers are not well-equiped. However, we have learned from the lessons at larger firms in this space, and recognized opportunities for leveraging [[MapReduce|http://en.wikipedia.org/wiki/Mapreduce]] and related technology. In particular, our team makes much use of [[R|http://www.r-project.org/]] and [[Hadoop|http://hadoop.apache.org/core/]], running on elastic resources at [[Amazon AWS|http://aws.amazon.com/]].
The team needs highly-qualified professionals, not merely a collection of "A" students.
People who mimic the accepted standards taught in schools do not necessarily perform well on [[Big Data]].
Evaluate candidates via tabula rasa: does an analyst handle large data sets or data quality issues by approaches which lead to poor sampling / censored data? does a programmer insist on using web app patterns or relational databases, even though a problem does not indicate them?
Jim Highsmith
http://www.jimhighsmith.com/articles/messy.htm
Speculate, Collaborate, Learn as an adaptive cycle.
http://en.wikipedia.org/wiki/Agile_software_development
Repeated anti-patterns in the organization which generally lead to problems...
* [[Drunk Bus Driver]]
* [[Excluded Middle]]
* [[A Student]]
* [[Lone Wolf]]
* [[Time Slicing]]
* [[Email Overload]]
* [[Wiki Rot]]
Big Data is a name used to describe techniques and business models based on managing large-scale data. It derives from experiences at Amazon, eBay, Google, Yahoo!, etc., dating from the holiday sales season at the end of 1997. Arguments for this approach tend to follow the arc of successes at Amazon and Google since that time. For excellent insights and discussions about the topic in general, see public blogging by [[Greg Linden|http://glinden.blogspot.com/]], one of the authors of the Amazon recommender system.
This area of data analytics poses several challenges and issues -- life or death issues for effective business -- for which most analysts, programmers, and managers are not well-equiped.
For analysts, the size of the data is orders of magnitude larger than data sets which they generally encounter. Positive cases are quite rare and [[data quality|Data Quality]] is generally very poor. In other words, one is faced with an urgent need to find a "needle in a haystack" amidst a lot of noise. However, such large data overwhelms the tools which statisticians tend to use. Consequently, analysts tend to prefer to "sample" from the data, working with smaller subsets, which in turn lose important characteristics of the customer data at scale.
For programmers, paradoxes emerge from the data size and data quality issues. Both programming bugs and application results tend to be elusive and counter-intuitive, quite often leading to non-deterministic contexts. Popular [[software development methodologies|Agile Software Development]] and related [[tools|http://www.eclipse.org/]] which emphasize [[unit testing|http://en.wikipedia.org/wiki/Unit_test]] become almost useless; code which behaves well on laptop running a test with few million rows can fail in many unexpected ways when run in production at scale on a cluster of hundreds of multi-core servers running a trillion rows of data. Statistics provides ways to address these problems, and ultimately the analysts must determine software requirements as well as perform "quality assurance" for the code. However, the bulk of available programmers lack the math background (attention span, humility, temperament, etc.) to accept those terms, which leads to deadlock on teams.
For managers, most lack sufficient math background to make [[good, consistent decisions about analysts and their work|Hype Cycle]]: discern among their disagreements, provide effective project planning, nurture team building, etc. Software engineering managers far too often promote anti-patterns such as the [["Cult of the Lone Wolf"|Lone Wolf]], which make team work virtually impossible. Traditionally, the safest bet as an engineering manager has been to follow a practice called [["scale up"|http://en.wikipedia.org/wiki/Scalability#Scale_vertically_.28scale_up.29]]: buy larger hardware, build larger relational databases, constrain analysts to systems based on SQL, etc. In other words, leverage traditionally trained programmers and their preferred practices. Unfortunately, that approach falls apart at terabyte-scale, and it stops being cost-effective long before that.
Looking at this in another way, consider the [[quote|http://radar.oreilly.com/2008/12/google-walmart-mybarackobama.html]] Tim O'Reilly: "I came to recognize that web applications, unlike desktop applications, still have the programmers inside them." To a large extent, the process of statistical analysis and modeling generally has the statisticians "still inside", while generally speaking the management of large IT infrastructure also has the sysadmins "still inside". A challenge presented by Big Data is to automate the modeling at scale. In other words, one must refactor the statisticians, the programmers, the sysadmins, and the managers ''outside of the apps'' all in one fell swoop -- must against their protest.
On one hand, those kinds of problems can be turned around. Google has led the industry by pioneering and publishing [[highly effective approaches|http://www.youtube.com/watch?v=qsan-GQaeyk]] for handling Big Data. In general, they tend to follow the practice of [["scale-out"|http://en.wikipedia.org/wiki/Scalability#Scale_horizontally_.28scale_out.29]]. In general, they avoid use of large RDBMS systems and instead promote the use of cost-effective, fault-tolerant parallel processing based on commodity hardware. Amazon, eBay and others have similarly followed that arc of using commodity hardware for parallel processing. In particular, Google pioneered the use of [[MapReduce|http://en.wikipedia.org/wiki/Mapreduce]] and other "batch" techniques on terabyte- or petabyte- scale consumer data.
One the other hand, bad news from analysis of Google, et al., is a tendency toward an anti-pattern which we call the [[Excluded Middle]]. In other words, the problems encountered by [[integrating|Integration]] teams of analysts plus programmers simply got ignored.
Initially successful business models in Big Data -- such as Google -- were based on large market shares, leveraging "least common denominator" contexts, where the perceived business need for statistical modeling is superficial. Moreover, during the period since 1997 practices coming out of Computer Science such as [[machine learning|http://en.wikipedia.org/wiki/Machine_learning]] have advanced steadily and fit well for architectures used in scale-out approaches. Comparable advances in Statistics which leverage scale-out have not made so much progress. Programmers can leverage machine learning approaches, even while they lack depth in how to interpret results. In a simplified example, programmers are effective for calculating a mean as a estimator, when 98% of the population can be monetized through the same approach. It works, it earns money, end of discussion.
Not every business enjoys a market share the size of Google. In general, most businesses are swimming in data -- whether they acknowledge it or not -- however most lack the capabilities required to leverage and monetize that data. However, niche markets seem to be changing that situation. Effective use of analytics can turn terabytes into profit, but that requires careful team work and management. It is rare that statisticians and programmers work side-by-side, let alone interact without heavy process ([[PRDs|http://en.wikipedia.org/wiki/Product_requirements_document]], etc.) separating their disciplines.
[[QR|Quant Results]] describes our patterns for success, addressing the challenges of Big Data.
Programmers have a weekly gathering called the "Code Review"
* organized as a code review
* anyone in the company with a software background may attend as an active participant
* all who attend as active participants must present / submit to peer review at least once
* rotate to allow voices / cut through cultural hesitations
* analysts may attend as observers, but may not interfere with discussions
Background: #fff
Foreground: #000
PrimaryPale: #8fc
PrimaryLight: #1f8
PrimaryMid: #0b4
PrimaryDark: #041
SecondaryPale: #ffc
SecondaryLight: #fe8
SecondaryMid: #db4
SecondaryDark: #841
TertiaryPale: #eee
TertiaryLight: #ccc
TertiaryMid: #999
TertiaryDark: #666
Error: #f88
[img[Creative Commons License|http://i.creativecommons.org/l/by-sa/3.0/us/80x15.png][http://creativecommons.org/licenses/by-sa/3.0/us/]]
Quant Results by [[Paco Nathan|http://ceteri.org]] is licensed under a [[Creative Commons Attribution-Share Alike 3.0 United States License|http://creativecommons.org/licenses/by-sa/3.0/us/]] based on work at [[ceteri.org|http://ceteri.org]]
Copyright © Paco Nathan, 2009.
[[Quant Results]]
[[General Precepts]]
A less-experienced programmer, tends to tear up mission critical code, end-around source code control, not big on team coordination, fixed on the allure of an interesting idea without examining its trade-offs.
Recognizable by history of behavior.
Potentially, a [[Lone Wolf]] given misplaced authority within a toxic environment.
The phrase "Drunk Bus Driver" implies that an organization pushes responsibility for critical applications or infrastructure onto someone who is not ready to handle it.
Overgrown start-up mentality, where executive management must wear many hats and therefore tends to expect/reward individuals who serve multiple roles.
However, don't confuse someone with 2+ specializations as a generalist.
Promoting a person who is good at two extremes (e.g., analysis + infrastructure) defeats the point of having integration, leadership, apps, modeling, etc.
Team suffers, imbalance prevails.
The phenomenology used here derives from work by Ward Cunningham on [[Pattern Language]].
In general, our department workplace leverages concepts based on the successes of Cali Ressler and Jody Thompson in [[Results-Only Work Environment]].
Some principles we use for analysis and modeling in in the context of [[Big Data]] derive from work by Jim Highsmith on [[Adaptive Software Development]].
One caution: those who hold [[Aristotelian|http://en.wikipedia.org/wiki/Aristotelian_rhetoric]] notions or [[Cartesian|http://en.wikipedia.org/wiki/Ren%C3%A9_Descartes]] assumptions particularly dear need not read any further.
To get started with this blank TiddlyWiki, you'll need to modify the following tiddlers:
* SiteTitle & SiteSubtitle: The title and subtitle of the site, as shown above (after saving, they will also appear in the browser title bar)
* MainMenu: The menu (usually on the left)
* DefaultTiddlers: Contains the names of the tiddlers that you want to appear when the TiddlyWiki is opened
You'll also need to enter your username for signing your edits: <<option txtUserName>>
http://en.wikipedia.org/wiki/Known_unknown
Products are not created by individuals, only by teams.
Zero tolerance is the only acceptable solution: anyone who disagrees, show them the door.
[[Quant Results]]
[[A Brief History]]
[[General Precepts]]
[[Copyright Notice]]
© Paco Nathan
Advice about defining [[pattern languages|http://en.wikipedia.org/wiki/Pattern_language]], quoted from [[Ward Cunningham|http://www.c2.com/cgi/wiki?TipsForWritingPatternLanguages]]:
1) Pick a whole area, not just one idea. I like subject matter that is practical but seldom explored in a text book. You know, the kind of stuff you have to learn from your colleagues on the job. The discussion on the "patterns" list got me thinking about checking data.
2) Make a list of all the little things you have learned through the years about the area. Imagine that your kid brother has just taken responsibility for this area on his first big job. You're getting together this weekend. What are you going to tell him. Make a list.
3) Cast each item on your list as a solution. I like to write a sentence with "therefore" in the middle. You will have to think a little deeper here to figure out the forces that bear on your solutions. It's ok to speculate. I find this to be a rewarding activity since I often find new reason for what I do.
4a) Now write each item as a Pattern. I've come to favor a four paragraph form where the second paragraph ends with the pivotal "therefore:". This is a good time to flip through Alexander's Pattern Language. I feel my work has always improved when I more closely mimic his style. I'm just now learning to make the first and last paragraphs carry weight. These are the ones that link a pattern with others in the language.
4b) Organize your patterns into sections. Write a little introduction to each section that lists each pattern by name. You may find you need to adjust your linking paragraphs as you study the higher level structure of your patterns. Try to keep 4a and 4b fluid as you write. As you become more familiar with your patterns you may find that they organize themselves.
5) Now write an introduction to your pattern language that hints at the forces you will be addressing. Pick a good name too. And send a summary to the "patterns" mail list.
This text describes a process called QR, used for team management and project planning in large-scale data analytics.
Experience it, recognize it, give it a name...
* [[Useful Process]]: Find approaches which tend to work well and describe how they can be generalized and learned.
* [[Anti-Patterns]]: Identify approaches which do not work well and give them names plus a brief list of key indicators.
* [[Known Unknowns]]: Articulate problems and concerns which have no good solutions currently and keep those handy to guide new directions in the [[Roadmap]].
Cali Ressler and Jody Thompson
(ROWE): http://www.culturerx.com/
!Analysis
;provides "decision"
* //focus of//: analysts
* statistical analysis of samples from the data set
* supports special business decisions, ad hoc queries
* generates documented process
* //checklist//: data sample, R script, visualizations, business decision guidance, suggested next steps, wiki report
!Modeling
;provides "technology"
* //focus of//: team lead, principal scientist(s), analysts
* predictive models which agree with available analysis
* must be suitable for automation at scale
* supports frequent decisions
* generates runnable scripts, requirements for apps and infrastructure
* //checklist//: trade-offs, IP protection
!Integration
;provides "planning"
* //focus of//: technical management
* coordinates with customers/stakeholders
* drives business decisions for the department
* executes on team process and project planning
* recruiting, training, coaching, team-building
* determines "stable" branch commits
* decides resource allocation, system architecture trade-offs
* //monitors//: who's consistently ahead? who's behind? which planning isn't realistic? what can be learned from this?
* //checklist//: tasks, priorities, milestones, roadmap, budget, revenue overlays
!Apps
;provides "product"
* //focus of//: team lead, principal engineer(s), engineers
* automates predictive models with data at scale
* implements algorithms, parallelization
* determines data objects, workflows
* batch vs. online, remote vs. local
* //checklist//: build script, unit tests, inline docs, architecture diagram
!Systems
;provides "infrastructure"
* //focus of//: systems engineers
* troubleshooting, helpdesk, wiki howto's
* develop/commit patches to open source projects
* supports team infrastructure and operations
* manages code repository, servers, config, data services
* //checklist//: crontab updates, wiki instructions
Analysts have a weekly gathering called the "Secret Statisticians Meeting"
* organized like a graduate seminar
* anyone in the company with a math degree may attend as an active participant
* all who attend as active participants must present / submit to peer review at least once
* rotate to allow voices / cut through cultural hesitations
* developers may attend as observers, but may not interfere with discussions
Process and Project Management for Analytics on Big Data
.headerForeground {
left: 0;
padding: 1em 0 1em 1em;
position: absolute;
top: 0;
}
.headerShadow {
left: -1px;
padding: 1em 0 1em 1em;
position: relative;
top: -1px;
}
* team meets outside the office, e.g., at a coffeehouse
* 2x per month
* allow time for off-the-record feedback
Team retrospectives:
* lookback on projects
* are we repeating approaches?
* have we tried new tricks?
* reward innovation -> conferences, papers, OSS
Build a departmental [["toolbag"|Toolbag]] for patterns to apply for any incoming problem description.
[[TiddlyWiki|http://en.wikipedia.org/wiki/TiddlyWiki]] is the content management system used for this document.
For links to basic instructions, see [[GettingStarted]].
Project planning which emphasizes achieving time-based goals, without giving priority to effective results.
For example, most [["Agile"|Agile Software Development]] software development methodologies tend to use short time frames for project planning, called [["timeboxes"|Timeboxes]] -- which emphasize rapid turnaround as a measure of success.
Time periods called "timeboxes" are heuristics used to help organize the team schedule.
|!results|!period|
|priority updates|daily|
|individual tasks - planning cycle|2-3 days|
|[[Secret Statisticians Meeting]]|each week|
|[[Code Review]]|each week|
|[[Team Offsites]]|2x per month|
|team projects - planning cycle|2-3 weeks|
|company goals, board meetings|2 months|
|individual performance review|6 months|
Our planning follows calendar patterns based on [["timebox" heuristics|Timeboxes]], which help organize the team schedule.
Team composition follows patterns defined in [[Roles and Responsibilities]].