<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-1565309247787646668</id><updated>2011-07-28T21:59:04.875-07:00</updated><category term='scala reservoir sampling interview'/><title type='text'>Learning the Hard Way</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://cgordon.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1565309247787646668/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://cgordon.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Charles Gordon</name><uri>http://www.blogger.com/profile/18220920684633877544</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>4</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-1565309247787646668.post-3188746646226067045</id><published>2008-10-10T10:13:00.000-07:00</published><updated>2008-10-10T11:04:10.211-07:00</updated><title type='text'>Great Programming Quotes</title><content type='html'>I found a great post on &lt;a href="http://stackoverflow.com"&gt;stackoverflow.com&lt;/a&gt;, the new joint venture between &lt;a href="http://joelonsoftware.com"&gt;Joel Spolsky&lt;/a&gt; and &lt;a href="http://www.codinghorror.com/blog/"&gt;Jeff Atwood&lt;/a&gt;. Most of the stuff there hasn't been very interesting to me (I don't work with .Net or Java, and a lot of the other questions seem fairly basic). Recently someone asked people to &lt;a href="http://stackoverflow.com/questions/58640?sort=votes&amp;page=1#sort-top"&gt;list their favorite programming quotes&lt;/a&gt;. The results are great; here are some of my favorites:

&lt;blockquote&gt;
Hofstadter's Law:

It always takes longer than you expect, even when you take into account Hofstadter's Law.
&lt;/blockquote&gt;

&lt;blockquote&gt;
Brian Kernighan:

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.
&lt;/blockquote&gt;

&lt;blockquote&gt;
Tom Cargill

The first 90% of the code accounts for the first 90% of the development time. The remaining 10% of the code accounts for the other 90% of the development time.
&lt;/blockquote&gt;

&lt;blockquote&gt;
Isaac Asimov

The most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' but 'That's funny...'
&lt;/blockquote&gt;

&lt;blockquote&gt;
Alan Kay

I invented the term Object-Oriented, and I can tell you I did not have C++ in mind.
&lt;/blockquote&gt;

&lt;blockquote&gt;
P. J. Plauger

My definition of an expert in any field is a person who knows enough about what's really going on to be scared.
&lt;/blockquote&gt;

&lt;blockquote&gt;
Edsger Dijkstra

If debugging is the process of removing software bugs, then programming must be the process of putting them in.
&lt;/blockquote&gt;

&lt;blockquote&gt;
Edsger Dijkstra

Computer Science is no more about computers than astronomy is about telescopes.
&lt;/blockquote&gt;

&lt;blockquote&gt;
Antoine de Saint Exupéry

Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.
&lt;/blockquote&gt;

&lt;blockquote&gt;
Donald Knuth

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
&lt;/blockquote&gt;

&lt;blockquote&gt;
Rich Cook

Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning.
&lt;/blockquote&gt;

&lt;blockquote&gt;
Jamie Zawinski

Linux is only free if your time has no value.
&lt;/blockquote&gt;

&lt;blockquote&gt;
Brooks' The Mythical Man-Month

Plan to throw one away; you will anyway.
&lt;/blockquote&gt;

&lt;blockquote&gt;
R. Buckminster Fuller

When I am working on a problem I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong.
&lt;/blockquote&gt;

&lt;blockquote&gt;
John Carmack

Fight code entropy.
&lt;/blockquote&gt;

&lt;blockquote&gt;
Douglas Adams

I (...) am rarely happier than when spending an entire day programming my computer to perform automatically a task that would otherwise take me a good ten seconds to do by hand.
&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1565309247787646668-3188746646226067045?l=cgordon.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cgordon.blogspot.com/feeds/3188746646226067045/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1565309247787646668&amp;postID=3188746646226067045' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1565309247787646668/posts/default/3188746646226067045'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1565309247787646668/posts/default/3188746646226067045'/><link rel='alternate' type='text/html' href='http://cgordon.blogspot.com/2008/10/great-programming-quotes.html' title='Great Programming Quotes'/><author><name>Charles Gordon</name><uri>http://www.blogger.com/profile/18220920684633877544</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1565309247787646668.post-6785788710844662256</id><published>2008-06-08T20:02:00.000-07:00</published><updated>2008-06-09T09:29:34.926-07:00</updated><title type='text'>Enterprise Man Boobs</title><content type='html'>&lt;p&gt;
Jim Webber and Martin Fowler gave a hilarious talk about the Enterprise Service Bus at &lt;a href="http://qcon.infoq.com/london-2008/conference/"&gt;QCON 2008&lt;/a&gt; entited "Does My Bus Look Big In This?". You can watch the &lt;a href="http://www.infoq.com/presentations/soa-without-esb"&gt;whole thing&lt;/a&gt;, with slides at &lt;a href="http://www.infoq.com"&gt;InfoQ&lt;/a&gt;, which has joined the &lt;a href="http://www.ted.com"&gt;TED&lt;/a&gt; talks as one of my favorite places to waste my employer's valuable time.
&lt;/p&gt;

&lt;p&gt;
The talk is fairly content free, but they are very animated and have a lot of surprisingly funny jokes scattered throughout, including an improv sequence on man boobs which is destined to spawn an entry in the &lt;a href="http://www.urbandictionary.com"&gt;Urban Dictionary&lt;/a&gt;. Their point is that the Internet is a better model for Enterprise Integration than whatever model is behind things like SOAP. I couldn't agree more: I've been reading &lt;a href="http://steve.vinoski.net/blog/"&gt;Steve Vinoski's&lt;/a&gt; blog just to get that vicarious thrill which comes from someone smarter than you proving that something you hate does suck.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1565309247787646668-6785788710844662256?l=cgordon.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cgordon.blogspot.com/feeds/6785788710844662256/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1565309247787646668&amp;postID=6785788710844662256' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1565309247787646668/posts/default/6785788710844662256'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1565309247787646668/posts/default/6785788710844662256'/><link rel='alternate' type='text/html' href='http://cgordon.blogspot.com/2008/06/enterprise-man-boobs.html' title='Enterprise Man Boobs'/><author><name>Charles Gordon</name><uri>http://www.blogger.com/profile/18220920684633877544</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1565309247787646668.post-8059671292349710999</id><published>2008-02-23T11:46:00.000-08:00</published><updated>2008-02-23T16:42:57.452-08:00</updated><title type='text'>Notes for "The Landscape of Parallel Computing Research"</title><content type='html'>&lt;h1&gt;Introduction&lt;/h1&gt;

&lt;p&gt;
First, the paper itself: &lt;a href="http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html"&gt;
The Landscape of Parallel Computing Research: The View From Berkeley
&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;Conclusions of interest to software engineers:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems.&lt;/li&gt;
  &lt;li&gt;To maximize programmer productivity, future programming models must be more human-centric than the conventional focus on hardware or applications.&lt;/li&gt;
  &lt;li&gt;To be successful, programming models should be independent of the number of processors.&lt;/li&gt;
  &lt;li&gt;Traditional operating systems will be deconstructed and operating system functionality will be orchestrated using libraries and virtual machines.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
The thesis of the paper is that real world applications and hardware are
naturally parallel. Therefore we need a programming model, system software and
supporting architectures that are naturally parallel. This will require a
fundamental re-design.
&lt;/p&gt;

&lt;p&gt;
The paper lists some items of conventional wisdom that have been, or will be,
superceded soon. Here are a few of interest to software engineers:
&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;i&gt;Old:&lt;/i&gt; Multiply is slow, but load and store are fast.&lt;br/&gt;
    &lt;i&gt;New:&lt;/i&gt; Load and store are slow, but multiply is faster (by as much as two orders of magnitude).
  &lt;/li&gt;
  &lt;li&gt;
    &lt;i&gt;Old:&lt;/i&gt; Don't bother parallelizing your application, as you can just wait a little while and run it on a much faster sequential computer.&lt;br/&gt;
    &lt;i&gt;New:&lt;/i&gt; It will be a very long wait for a faster sequential computer (perhaps five years).
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;
The rest of the paper is organized around four themes: applications, programming
models, hardware and evaluation.
&lt;/p&gt;

&lt;h1&gt;Applications&lt;/h1&gt;

&lt;p&gt;
The paper lists 13 "dwarfs", which are patterns of computation (within a
processor) and communication (between processors) that characterize certain
classes of applications. The first seven are numerical methods from scientific
computing:
&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Dense Linear Algebra&lt;/li&gt;
  &lt;li&gt;Sparse Linear Algebra&lt;/li&gt;
  &lt;li&gt;Spectral Methods&lt;/li&gt;
  &lt;li&gt;N-body Methods&lt;/li&gt;
  &lt;li&gt;Structured Grids&lt;/li&gt;
  &lt;li&gt;Unstructured Grids&lt;/li&gt;
  &lt;li&gt;Monte Carlo (generalized to "MapReduce" in paper and considered "embarrassingly parallel")&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;
In addition to those seven, the papers authors add these six:
&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Combinational Logic (CRC codes, Hashing, Encryption)&lt;/li&gt;
  &lt;li&gt;Graph Traversal (Quicksort, Decision Trees)&lt;/li&gt;
  &lt;li&gt;Dynamic Programming&lt;/li&gt;
  &lt;li&gt;Backtrack and Branch-and-Bound&lt;/li&gt;
  &lt;li&gt;Construct Graphical Models (Bayesian Networks, Hidden Markov Models)&lt;/li&gt;
  &lt;li&gt;Finite State Machine (may be "embarrassingly sequential")&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There is an interesting note at the end of the section on dwarves:&lt;/p&gt;

&lt;blockquote&gt;
In the era of multicore and manycore. Popular algorithms from the sequential
computing era may fade in popularity. For example, if Huffman decoding proves to
be embarrassingly sequential, perhaps we should use a different compression
algorithm that is amenable to parallelism.
&lt;/blockquote&gt;

&lt;p&gt;
The paper goes on to describe two ways to compose the dwarfs. The first is via
"temporal distribution" which involves using a common set of processors for the
composed dwarfs with some sort of time sharing. The other is via "spatial
distribution" in which the different dwarfs run on separate (sets of)
processors. The paper makes it clear that optimizing the composition of dwarfs
can be incredibly difficult in pratice due to issues with loose coupling (to
promote code reuse), data structure translation (the need for one dwarf to use
the same data with a different structure) and the ability to predict the format
of the data at the start of the computation.
&lt;/p&gt;

&lt;p&gt;
The authors cite
an &lt;a href="http://www.intel.com/technology/magazine/research/it04042.pdf"&gt;Intel
Study&lt;/a&gt; that lists three categories of computing that are expected to increase
in demand:
&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Recognition: machine learning approaches in which data is examined and models are constructed.&lt;/li&gt;
  &lt;li&gt;Mining: searches the web to find instances of the models.&lt;/li&gt;
  &lt;li&gt;Synthesis: creation of new models as in graphics.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
The paper contains a very interesting graphic (on page 16) that maps some common
tasks from various disciplines to a set of basic mathematic primitives (which
are, presumably, what the hardware manufacturers are targetting).
&lt;/p&gt;

&lt;h1&gt;Hardware&lt;/h1&gt;

&lt;p&gt;
The bulk of the hardware discussion is about
&lt;a href="http://en.wikipedia.org/wiki/Manycore_processing_unit"&gt;MPU&lt;/a&gt; based
systems.
&lt;/p&gt;

&lt;p&gt;
Section 4.1.3 has an interesting discussion about heterogeneous collections of
processors in the same system. The idea is that the sequential parts of a
program can be the bottleneck on a parallel architecture. An interesting
solution for that problem is to have a few more powerful processors in the
system that are dedicated to sequential jobs. They refer
to &lt;a href="http://en.wikipedia.org/wiki/Amdahl's_law"&gt;Amdahl's Law&lt;/a&gt; to
illustrate the point. The question of how to identify and schedule these
sequential elements is left open, but they admit that the overhead of managing a
heterogenous collection of processors could be prohibitive.
&lt;/p&gt;

&lt;p&gt;
Section 4.3 discusses the interconnections between processors in a manycore
system. The make a distinction between point-to-point communication for two
processors and global communication that involves all the processors. They claim
that the latter is composed of smaller messages that are latency-bound. The
former are somewhat larger and are bandwidth-bound as a result. They posit using
a different set of interconnects for each of these use cases.
&lt;/p&gt;

&lt;p&gt;
The bandwidth-bound communication can take place using internal packet
switch-like networks on the chip. Then the topology of the interconnect can be
modified on demand to meet the needs of a particular application. For
latency-bound communication they recommend something like
what &lt;a href="http://en.wikipedia.org/wiki/Blue_Gene"&gt;BlueGene/L&lt;/a&gt; does: a
tree network for collective communcation (BlueGene/L also uses a torus
interconnect for point-to-point messages).
&lt;/p&gt;

&lt;p&gt;
Section 4 discusses communication primitives for MPU systems, which they are now
calling "chip-scale multiprocessors" or CMPs (no wikipedia link on that name,
yet). They claim that CMPs have higher inter-core bandwidth and lower
latency. They also claim that they can offer new lightweight coherency
primitives that operate between cores. They use these claims to motivate the
notion that programming a CMP is fundamentally different than programming an
SMP.
&lt;/p&gt;

&lt;p&gt;
The first stop is to note
that &lt;a href="http://en.wikipedia.org/wiki/Cache_coherency"&gt;cache coherency&lt;/a&gt;
schemes will need an overhaul to handle the number of processors in a CMP. They
don't give any clear examples of what they mean by this, except to note that
on-chip caches with "private or shared configurations" could work. They go on to
discuss &lt;a href="http://en.wikipedia.org/wiki/Software_transactional_memory"&gt;
software transactional memory&lt;/a&gt; as an approach to synchronization between
processors. It isn't clear, at this point, what the memory model for CMP systems
looks like. I.e., it isn't obvious how much of the memory is shared (as it is in
a traditional SMP system) and how much of it is private to each processor. The
note on message passing in section 4.4.5 seems to indicate that the memory is
shared, and that it is more efficient in CMP than in SMP. This seems to be
because the CMP systems are implemented on a single chip, whereas SMP systems
use multiple chips plugged into the same memory subsystem.
&lt;/p&gt;

&lt;h1&gt;Programming&lt;/h1&gt;

&lt;p&gt;
The section starts with a well-written piece on why the tradeoff between
implementation efficiency and programmer productivity is so difficult. If the
level of abstraction level is too high, it becomes impossible to optimize the
performance. If the abstraction level is too low all the time is spent dealing
with the underlying details. In both cases, productivity suffers, and IT
productivity is already low enough thanks to reddit and digg.
&lt;/p&gt;

&lt;p&gt;
The paper separates programming languages into three groups: hardware-centric
(IXP-C), application-centric (MatLab) and formalism-centric (Actor-based). They
describe these as related to efficiency, productivity and correctness (in that
order). They decry the lack of effor in devising languages that have proven
effects on the psychology of the people writing programs in them. They give, as
an example, the notion that software transactional memory is easier to reason
about than other traditional concurrency primitives. They promote the use of
actual user studies to validate the efficacy (or lack thereof) of languges. This
section is far too light on details, and misses major issues with leaky
abstractions (i.e., even with "simple" models there are cases where a real
knowlege of the underlying complexity is crucial). It is difficult to shield
programmers from the reality of the hardware they are working with, although god
knows Java is doing its level best here.
&lt;/p&gt;

&lt;p&gt;
There is a brief section on making the programming model independent of the
number of processors. They
describe &lt;a href="http://en.wikipedia.org/wiki/Message_Passing_Interface"&gt;
MPI &lt;/a&gt; as the dominant paradigm in parallel programming for scientific
computing, and how it requires the programmer to explicity map tasks to
processors. The paper doesn't have a lot to offer here, other than the fact that
this is still a very open research area, and it would be nice if it wasn't.
&lt;/p&gt;

&lt;p&gt;
As an aside, it is interesting that they don't mention languages
like &lt;a href="http://erlang.org"&gt;Erlang&lt;/a&gt;, as these are the hot new thing in
distributed/parallel computation (at least, for "alpha geeks", as Tim O'Reilly
loves to call them). They do briefly mention
the &lt;a href="http://www.c2.com/cgi/wiki?ActorsModel"&gt;Actor&lt;/a&gt; model, which has
roots back into the seventies. Erlang is a member of this family, which they
refer to as a "formalism-centric" approach, even though it is also very
application-centric (at least, it was for Ericsson).
&lt;/p&gt;

&lt;p&gt;
The next few sections deal with compilers and operating systems. They discuss
"auto-tuners" which can automatically tune a kernel for use on a system. It does
this by mutating the optimization parameters and testing until it finds an ideal
(or near ideal) setting for the target hardware. This seems very promising,
although it isn't clear how widely applicable it is. They go on to discuss the
need for a more flexible OS that allows applications to only use the
capabilities they really need. They point to virtual machines as an indication
that the time has come for this sort of optimization.
&lt;/p&gt;

&lt;p&gt;
This is the section I was most looking forward to, and I found it
disappointing. It felt like there was more hand-waving than real science here. I
agree, in principal, that programming language development should be the subject
of user testing, but it isn't clear (at all) how one would go about such a
monumental project, or that the result would be any better than our current
languages. In the end, the humans that program computers are significantly more
flexible than the computers themselves.
&lt;/p&gt;

&lt;p&gt;
The real wins, in that area, have been tools like compilers and auto-tuners, as
they point out. These are the things that allow us to work at a much higher
level of abstraction. Perhaps it is the case that the growing power of computers
will continue that trend, so that close optimization won't be as important as
productivity. The success of languages like Ruby seems to indicate that this is
true, as it's performance is rather awful. In that case, however, we're making
an explicit trade off between programmer productivity and implementation
efficiency, with little regard for the latter. The paper seems to want to take a
more nuanced position, but doesn't back it up with enough examples. The example
of software transactional memory (which is one of the few they use) is good, but
limited. It isn't clear that you can wholly replace existing models of
concurrency with an STM model (or that you would want to if you could).
&lt;/p&gt;

&lt;h1&gt;Links&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href="http://view.eecs.berkeley.edu/wiki/Main_Page"&gt;Project Wiki at UC Berkeley&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1565309247787646668-8059671292349710999?l=cgordon.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cgordon.blogspot.com/feeds/8059671292349710999/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1565309247787646668&amp;postID=8059671292349710999' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1565309247787646668/posts/default/8059671292349710999'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1565309247787646668/posts/default/8059671292349710999'/><link rel='alternate' type='text/html' href='http://cgordon.blogspot.com/2008/02/notes-for-landscape-of-parallel.html' title='Notes for &quot;The Landscape of Parallel Computing Research&quot;'/><author><name>Charles Gordon</name><uri>http://www.blogger.com/profile/18220920684633877544</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1565309247787646668.post-1633957306539658977</id><published>2007-04-27T23:32:00.000-07:00</published><updated>2008-04-03T09:10:42.231-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='scala reservoir sampling interview'/><title type='text'>Hello Scala: Reservoir Sampling</title><content type='html'>&lt;p&gt;
I've started to climb the rather steep learning curve
for &lt;a href="http://www.scala-lang.org/"&gt;Scala&lt;/a&gt;. The documentation for Scala
is mostly fairly high level, and there aren't as many code samples as I would
have liked. I'm hoping to help out with that problem here. I've selected what I
think is a fairly interesting problem and written Scala code to solve it in
three different ways. I had a lot of fun doing it, and I think the code really
shows how flexible this language is. I actually wrote it in another dozen ways,
but these are the ones that turned out best (some of the others were really
horrendous). Hopefully some of you will find this helpful in your own efforts to
learn Scala.
&lt;/p&gt;

&lt;p&gt;
I have interviewed nearly four hundred software engineers in the last four
years. As a result I have developed an unhealthy fascination with interview
questions, and I have a lot of opinions on their relative merit. A friend who
has been out interviewing sent me this one recently:
&lt;/p&gt;

&lt;blockquote&gt;
Write a program to select &lt;code&gt;k&lt;/code&gt; random lines (with uniform
probability) from a file in one pass. You do not
know &lt;code&gt;n&lt;/code&gt; (the total number of lines) ahead of time.
&lt;/blockquote&gt;

&lt;p&gt;
This is a tough problem if you've never heard it before (or don't happen to be
quite good with probability). It took me about twenty minutes to come up with a
correct algorithm (along with a half dozen incorrect ones). The correct
algorithm had a righteous feeling to it, but I had to enlist the help of a
friend (with a degree in math) to help me prove it. I don't think this is a
particularly fair question to ask in an interview, unless you are willing to
offer a lot of help. A good interview question has an involved algorithm, but an
easy way to check its correctness. This is the opposite: the algorithm is
fairly simple, but checking its correctness is time consuming.
&lt;/p&gt;

&lt;p&gt;
It turns out that there are a number of correct algorithms to this problem, and
some of them run in asymptotic time &lt;i&gt;less&lt;/i&gt; than O(n)! The problem is one of
a set of "reservoir sampling" algorithms, and there have been a number of papers
written in the area. The earliest one that I could find was from Jeffrey Scott
Vitter and was published in the ACM Transactions on Mathematical Software,
Vol. 11, No. 1, March 1985, Pages 37-57. It lists four algorithms, which it
refers to as &lt;code&gt;R&lt;/code&gt;, &lt;code&gt;X&lt;/code&gt;,
&lt;code&gt;R&lt;/code&gt; and &lt;code&gt;Z&lt;/code&gt;. The average CPU
time for &lt;code&gt;Z&lt;/code&gt; is O(k(1 + log(n/k))). That probably isn't
much better than the O(n) for algorithm &lt;code&gt;R&lt;/code&gt;, since the
I/O time is likely to dominate. In a situation where you are reading a stream of
data from memory, or a network card, it could be a big savings, however.
&lt;/p&gt;

&lt;p&gt;
This problem seemed custom designed for a first project in Scala. It includes a
lot of different features: file I/O, simple mathematics, conditional logic and
some iterative processing. It is also quite interesting, and likely to be very
useful for things like log parsing and file sampling (as input to unit tests,
for instance). I decided to implement algorithm &lt;code&gt;R&lt;/code&gt;
since it was the most straightforward.
&lt;/p&gt;

&lt;p&gt;
The intuition behind algorithm &lt;code&gt;R&lt;/code&gt; is something like the
following. Any correct solution must return &lt;code&gt;min(k,n)&lt;/code&gt;
lines. In addition, all &lt;code&gt;n&lt;/code&gt; lines in the file must have
an identical probability of being selected: &lt;code&gt;k / n&lt;/code&gt;.
The first constraint can be satisfied by just choosing the
first &lt;code&gt;k&lt;/code&gt; lines with probability one. Each line after
the &lt;code&gt;k&lt;/code&gt;'th should be selected with
probability &lt;code&gt;k / i&lt;/code&gt; where &lt;code&gt;i&lt;/code&gt; is
the line number of the file. If &lt;code&gt;i == n&lt;/code&gt; then we have
selected the last line with the correct probability. If we select
the &lt;code&gt;i&lt;/code&gt;'th line, then we choose a random line from the
already selected set to replace.
&lt;/p&gt;

&lt;p&gt;
I started by writing the algorithm using standard imperative constructs. Using
this approach with Scala produces code that looks a lot like Java:
&lt;/p&gt;

  &lt;pre&gt;
&lt;span style="color: rgb(250, 128, 114);"&gt;import&lt;/span&gt; java.io._
&lt;span style="color: rgb(250, 128, 114);"&gt;import&lt;/span&gt; Console._
&lt;span style="color: rgb(250, 128, 114);"&gt;import&lt;/span&gt; Math._

&lt;span style="color: rgb(250, 128, 114);"&gt;object&lt;/span&gt; &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;ReservoirSampleImperative&lt;/i&gt;&lt;/b&gt;&lt;/span&gt; {
  &lt;span style="color: rgb(250, 128, 114);"&gt;def&lt;/span&gt; &lt;span style="color: rgb(123, 104, 238);"&gt;&lt;b&gt;algorithmR&lt;/b&gt;&lt;/span&gt;(&lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;readers&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;: &lt;span style="color: rgb(154, 205, 50);"&gt;&lt;b&gt;Iterator[BufferedReader]&lt;/b&gt;&lt;/span&gt;, &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;k&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;: &lt;span style="color: rgb(154, 205, 50);"&gt;&lt;b&gt;int&lt;/b&gt;&lt;/span&gt;) : &lt;span style="color: rgb(154, 205, 50);"&gt;&lt;b&gt;Array[String]&lt;/b&gt;&lt;/span&gt; = {
      &lt;span style="color: rgb(250, 128, 114);"&gt;val&lt;/span&gt; &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;a&lt;/i&gt;&lt;/b&gt;&lt;/span&gt; = &lt;span style="color: rgb(250, 128, 114);"&gt;new&lt;/span&gt; Array[String](k)

      &lt;span style="color: rgb(250, 128, 114);"&gt;var&lt;/span&gt; &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;t&lt;/i&gt;&lt;/b&gt;&lt;/span&gt; = 0

      &lt;span style="color: rgb(250, 128, 114);"&gt;while&lt;/span&gt; (readers.hasNext) {
          &lt;span style="color: rgb(250, 128, 114);"&gt;val&lt;/span&gt; &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;reader&lt;/i&gt;&lt;/b&gt;&lt;/span&gt; = readers.next
          &lt;span style="color: rgb(250, 128, 114);"&gt;var&lt;/span&gt; &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;line&lt;/i&gt;&lt;/b&gt;&lt;/span&gt; = reader.readLine
          &lt;span style="color: rgb(250, 128, 114);"&gt;while&lt;/span&gt; (line != &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;null&lt;/b&gt;&lt;/span&gt;) {
              &lt;span style="color: rgb(250, 128, 114);"&gt;if&lt;/span&gt; (t &amp;lt; k)
                  a(t) = line
              &lt;span style="color: rgb(250, 128, 114);"&gt;else&lt;/span&gt; &lt;span style="color: rgb(250, 128, 114);"&gt;if&lt;/span&gt; (random &amp;lt; (k.toDouble/t.toDouble))
                  a((random * k.toDouble).toInt) = line

              t = t + 1
              line = reader.readLine
          }
      }

      a
  }

  &lt;span style="color: rgb(250, 128, 114);"&gt;def&lt;/span&gt; &lt;span style="color: rgb(123, 104, 238);"&gt;&lt;b&gt;main&lt;/b&gt;&lt;/span&gt;(&lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;args&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;: &lt;span style="color: rgb(154, 205, 50);"&gt;&lt;b&gt;Array[String]&lt;/b&gt;&lt;/span&gt;) : &lt;span style="color: rgb(154, 205, 50);"&gt;&lt;b&gt;Unit &lt;/b&gt;&lt;/span&gt;= {
      &lt;span style="color: rgb(250, 128, 114);"&gt;val&lt;/span&gt; &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;iter&lt;/i&gt;&lt;/b&gt;&lt;/span&gt; = args.elements
      &lt;span style="color: rgb(250, 128, 114);"&gt;val&lt;/span&gt; &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;k&lt;/i&gt;&lt;/b&gt;&lt;/span&gt; = iter.next.toInt

      &lt;span style="color: rgb(250, 128, 114);"&gt;val&lt;/span&gt; &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;readers&lt;/i&gt;&lt;/b&gt;&lt;/span&gt; = &lt;span style="color: rgb(250, 128, 114);"&gt;if&lt;/span&gt; (args.length &amp;gt; 0)
          iter.map (f =&amp;gt; &lt;span style="color: rgb(250, 128, 114);"&gt;new&lt;/span&gt; BufferedReader(
              &lt;span style="color: rgb(250, 128, 114);"&gt;new&lt;/span&gt; FileReader(f)))
      &lt;span style="color: rgb(250, 128, 114);"&gt;else&lt;/span&gt;
          Array[BufferedReader](&lt;span style="color: rgb(250, 128, 114);"&gt;new&lt;/span&gt; BufferedReader(
              &lt;span style="color: rgb(250, 128, 114);"&gt;new&lt;/span&gt; InputStreamReader(System.in))).elements

      &lt;span style="color: rgb(250, 128, 114);"&gt;val&lt;/span&gt; &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;a&lt;/i&gt;&lt;/b&gt;&lt;/span&gt; = algorithmR(readers, k)
      &lt;span style="color: rgb(250, 128, 114);"&gt;for&lt;/span&gt; (&lt;span style="color: rgb(250, 128, 114);"&gt;val&lt;/span&gt; &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;e&lt;/i&gt;&lt;/b&gt;&lt;/span&gt; &amp;lt;- a) println(e)
  }
}
&lt;/pre&gt;

This version actually has a lot going for it. Neither
the &lt;code&gt;Array&lt;/code&gt; class (which is where it gets its arguments)
nor the Java I/O modules are very functional, so the OO/imperative approach
works very well with them. Writing the program in a more functional style
required that the I/O be converted to a &lt;code&gt;Stream&lt;/code&gt;:

  &lt;pre&gt;
&lt;span style="color: rgb(250, 128, 114);"&gt;import&lt;/span&gt; java.io._
&lt;span style="color: rgb(250, 128, 114);"&gt;import&lt;/span&gt; Math._
&lt;span style="color: rgb(250, 128, 114);"&gt;import&lt;/span&gt; Console._

&lt;span style="color: rgb(250, 128, 114);"&gt;object&lt;/span&gt; &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;ReservoirSampleFunct&lt;/i&gt;&lt;/b&gt;&lt;/span&gt; {
  &lt;span style="color: rgb(250, 128, 114);"&gt;def&lt;/span&gt; &lt;span style="color: rgb(123, 104, 238);"&gt;&lt;b&gt;inputStream&lt;/b&gt;&lt;/span&gt;(&lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;i&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;: &lt;span style="color: rgb(154, 205, 50);"&gt;&lt;b&gt;Iterator[BufferedReader]&lt;/b&gt;&lt;/span&gt;) : &lt;span style="color: rgb(154, 205, 50);"&gt;&lt;b&gt;Stream[String]&lt;/b&gt;&lt;/span&gt; = {
      &lt;span style="color: rgb(250, 128, 114);"&gt;def&lt;/span&gt; &lt;span style="color: rgb(123, 104, 238);"&gt;&lt;b&gt;inputStreamString&lt;/b&gt;&lt;/span&gt;(&lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;b&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;: &lt;span style="color: rgb(154, 205, 50);"&gt;&lt;b&gt;BufferedReader&lt;/b&gt;&lt;/span&gt;) : &lt;span style="color: rgb(154, 205, 50);"&gt;&lt;b&gt;Stream[String]&lt;/b&gt;&lt;/span&gt; = {
          &lt;span style="color: rgb(250, 128, 114);"&gt;val&lt;/span&gt; &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;line&lt;/i&gt;&lt;/b&gt;&lt;/span&gt; = b.readLine()
          &lt;span style="color: rgb(250, 128, 114);"&gt;if&lt;/span&gt; (line == &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;null&lt;/b&gt;&lt;/span&gt;)
              inputStream(i)
          &lt;span style="color: rgb(250, 128, 114);"&gt;else&lt;/span&gt;
              Stream.cons(line, inputStreamString(b))
      }

      &lt;span style="color: rgb(250, 128, 114);"&gt;if&lt;/span&gt; (i.hasNext)
          inputStreamString(i.next)
      &lt;span style="color: rgb(250, 128, 114);"&gt;else&lt;/span&gt;
          Stream.empty
  }

  &lt;span style="color: rgb(250, 128, 114);"&gt;def&lt;/span&gt; &lt;span style="color: rgb(123, 104, 238);"&gt;&lt;b&gt;algorithmR&lt;/b&gt;&lt;/span&gt;(&lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;k&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;: &lt;span style="color: rgb(154, 205, 50);"&gt;&lt;b&gt;int&lt;/b&gt;&lt;/span&gt;)(&lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;p&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;: &lt;span style="color: rgb(154, 205, 50);"&gt;&lt;b&gt;(Int,Array[String])&lt;/b&gt;&lt;/span&gt;, &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;s&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;: &lt;span style="color: rgb(154, 205, 50);"&gt;&lt;b&gt;String&lt;/b&gt;&lt;/span&gt;) : &lt;span style="color: rgb(154, 205, 50);"&gt;&lt;b&gt;(Int, Array[String])&lt;/b&gt;&lt;/span&gt; = {
      &lt;span style="color: rgb(250, 128, 114);"&gt;val&lt;/span&gt; (&lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;i&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;,&lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;a&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;) = p

      &lt;span style="color: rgb(250, 128, 114);"&gt;if&lt;/span&gt; (i &amp;lt; k)
          a(i) = s
      &lt;span style="color: rgb(250, 128, 114);"&gt;else&lt;/span&gt; &lt;span style="color: rgb(250, 128, 114);"&gt;if&lt;/span&gt; (random &amp;lt; (k.toDouble/i.toDouble))
          a((random * k.toDouble).toInt) = s
      (i+1, a)
  }

  &lt;span style="color: rgb(250, 128, 114);"&gt;def&lt;/span&gt; &lt;span style="color: rgb(123, 104, 238);"&gt;&lt;b&gt;main&lt;/b&gt;&lt;/span&gt;(&lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;args&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;: &lt;span style="color: rgb(154, 205, 50);"&gt;&lt;b&gt;Array[String]&lt;/b&gt;&lt;/span&gt;) : &lt;span style="color: rgb(154, 205, 50);"&gt;&lt;b&gt;Unit &lt;/b&gt;&lt;/span&gt;= {
      &lt;span style="color: rgb(250, 128, 114);"&gt;val&lt;/span&gt; &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;i&lt;/i&gt;&lt;/b&gt;&lt;/span&gt; = args.elements
      &lt;span style="color: rgb(250, 128, 114);"&gt;val&lt;/span&gt; &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;k&lt;/i&gt;&lt;/b&gt;&lt;/span&gt; = i.next.toInt

      &lt;span style="color: rgb(250, 128, 114);"&gt;val&lt;/span&gt; &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;readers&lt;/i&gt;&lt;/b&gt;&lt;/span&gt; = &lt;span style="color: rgb(250, 128, 114);"&gt;if&lt;/span&gt; (i.hasNext)
          i.map (f =&amp;gt; &lt;span style="color: rgb(250, 128, 114);"&gt;new&lt;/span&gt; BufferedReader(&lt;span style="color: rgb(250, 128, 114);"&gt;new&lt;/span&gt; FileReader(f)))
      &lt;span style="color: rgb(250, 128, 114);"&gt;else&lt;/span&gt;
          Array[BufferedReader](&lt;span style="color: rgb(250, 128, 114);"&gt;new&lt;/span&gt; BufferedReader(
              &lt;span style="color: rgb(250, 128, 114);"&gt;new&lt;/span&gt; InputStreamReader(System.in))).elements

      &lt;span style="color: rgb(250, 128, 114);"&gt;val&lt;/span&gt; (&lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;_&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;,&lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;a&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;) = inputStream(readers).foldLeft (
          (0, &lt;span style="color: rgb(250, 128, 114);"&gt;new&lt;/span&gt; Array[String](k))
      ) (algorithmR(k))

      &lt;span style="color: rgb(250, 128, 114);"&gt;for&lt;/span&gt; (&lt;span style="color: rgb(250, 128, 114);"&gt;val&lt;/span&gt; &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;e&lt;/i&gt;&lt;/b&gt;&lt;/span&gt; &amp;lt;- a) println(e)
  }
}
&lt;/pre&gt;

&lt;p&gt;
There is still a lot of non-functional, side-effect producing code here. I'm
using &lt;code&gt;Array&lt;/code&gt; for both the command line arguments and
the list of k sampled lines. I think that is the real power of Scala,
actually. Both of those things are cases where having side effects really makes
the programming easier. I could have written a helper function to replace on of
the &lt;code&gt;k&lt;/code&gt; lines in a &lt;code&gt;List&lt;/code&gt;, but I
think the resulting code would have been longer and no more understandable (in
fact, probably less).
&lt;/p&gt;

&lt;p&gt;
A good friend, who has been at this longer than me, contributed this version of
the code, which uses an implicit constructor. He notes that it probably isn't a
great idea to do this in large scale production code. The proliferation of
implicit constructors can lead to some truly bizarre behavior due to unintended
conversions. Still, it does nicely simplify some of the code from my previous
example:
&lt;/p&gt;

  &lt;pre&gt;
&lt;span style="color: rgb(250, 128, 114);"&gt;import&lt;/span&gt; java.io._
&lt;span style="color: rgb(250, 128, 114);"&gt;import&lt;/span&gt; Console._
&lt;span style="color: rgb(250, 128, 114);"&gt;import&lt;/span&gt; Math._

&lt;span style="color: rgb(250, 128, 114);"&gt;object&lt;/span&gt; &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;ReservoirSampleImplicit&lt;/i&gt;&lt;/b&gt;&lt;/span&gt; {
  &lt;span style="color: rgb(250, 128, 114);"&gt;implicit&lt;/span&gt; &lt;span style="color: rgb(250, 128, 114);"&gt;def&lt;/span&gt; &lt;span style="color: rgb(123, 104, 238);"&gt;&lt;b&gt;readerToStream&lt;/b&gt;&lt;/span&gt;(&lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;r&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;: &lt;span style="color: rgb(154, 205, 50);"&gt;&lt;b&gt;BufferedReader&lt;/b&gt;&lt;/span&gt;) : &lt;span style="color: rgb(154, 205, 50);"&gt;&lt;b&gt;Stream[String]&lt;/b&gt;&lt;/span&gt; = {
      &lt;span style="color: rgb(250, 128, 114);"&gt;val&lt;/span&gt; &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;line&lt;/i&gt;&lt;/b&gt;&lt;/span&gt; = r.readLine()
      &lt;span style="color: rgb(250, 128, 114);"&gt;if&lt;/span&gt; (line == &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;null&lt;/b&gt;&lt;/span&gt;)
          Stream.empty
      &lt;span style="color: rgb(250, 128, 114);"&gt;else&lt;/span&gt;
          Stream.cons(line, readerToStream(r))
  }

  &lt;span style="color: rgb(250, 128, 114);"&gt;def&lt;/span&gt; &lt;span style="color: rgb(123, 104, 238);"&gt;&lt;b&gt;main&lt;/b&gt;&lt;/span&gt;(&lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;args&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;: &lt;span style="color: rgb(154, 205, 50);"&gt;&lt;b&gt;Array[String]&lt;/b&gt;&lt;/span&gt;) : &lt;span style="color: rgb(154, 205, 50);"&gt;&lt;b&gt;Unit &lt;/b&gt;&lt;/span&gt;= {
      &lt;span style="color: rgb(250, 128, 114);"&gt;val&lt;/span&gt; &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;k&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;: &lt;span style="color: rgb(154, 205, 50);"&gt;&lt;b&gt;Int &lt;/b&gt;&lt;/span&gt;= args(0).toInt
      &lt;span style="color: rgb(250, 128, 114);"&gt;val&lt;/span&gt; &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;r&lt;/i&gt;&lt;/b&gt;&lt;/span&gt; = &lt;span style="color: rgb(250, 128, 114);"&gt;new&lt;/span&gt; BufferedReader(&lt;span style="color: rgb(250, 128, 114);"&gt;new&lt;/span&gt; FileReader(args(1)))

      &lt;span style="color: rgb(250, 128, 114);"&gt;def&lt;/span&gt; &lt;span style="color: rgb(123, 104, 238);"&gt;&lt;b&gt;algorithmR&lt;/b&gt;&lt;/span&gt;(&lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;p&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;: &lt;span style="color: rgb(154, 205, 50);"&gt;&lt;b&gt;(Int,Array[String])&lt;/b&gt;&lt;/span&gt;, &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;e&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;: &lt;span style="color: rgb(154, 205, 50);"&gt;&lt;b&gt;String&lt;/b&gt;&lt;/span&gt;) : &lt;span style="color: rgb(154, 205, 50);"&gt;&lt;b&gt;(Int, Array[String])&lt;/b&gt;&lt;/span&gt; = {
          &lt;span style="color: rgb(250, 128, 114);"&gt;val&lt;/span&gt; (&lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;i&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;,&lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;a&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;) = p

          &lt;span style="color: rgb(250, 128, 114);"&gt;if&lt;/span&gt; (i &amp;lt; k)
              a(i) = e
          &lt;span style="color: rgb(250, 128, 114);"&gt;else&lt;/span&gt; &lt;span style="color: rgb(250, 128, 114);"&gt;if&lt;/span&gt; (random &amp;lt; (k.toDouble/i.toDouble))
              a((random * k.toDouble).toInt) = e

          (i+1,a)
      }

      &lt;span style="color: rgb(250, 128, 114);"&gt;val&lt;/span&gt; (&lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;_&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;,&lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;rs&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;) = r.foldLeft (
          (0, &lt;span style="color: rgb(250, 128, 114);"&gt;new&lt;/span&gt; Array[String](k))
      ) (algorithmR)

      &lt;span style="color: rgb(250, 128, 114);"&gt;for&lt;/span&gt; (&lt;span style="color: rgb(250, 128, 114);"&gt;val&lt;/span&gt; &lt;span style="color: rgb(127, 255, 212);"&gt;&lt;b&gt;&lt;i&gt;r&lt;/i&gt;&lt;/b&gt;&lt;/span&gt; &amp;lt;- rs) println(r)
  }
}
&lt;/pre&gt;

That's it for now. I'm going to try to tackle some of Scala's Actor libraries
for the next week or so. I'll post here if I come up with any interesting
applications of them. If you know Scala and have a better way to solve the
Reservoir Sampling problem, I would &lt;b&gt;love&lt;/b&gt; to hear about it. Scala is a
huge language with a lot of features, and I don't feel like I've even started
to explore the space of possible ways to solve this problem using it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1565309247787646668-1633957306539658977?l=cgordon.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cgordon.blogspot.com/feeds/1633957306539658977/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1565309247787646668&amp;postID=1633957306539658977' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1565309247787646668/posts/default/1633957306539658977'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1565309247787646668/posts/default/1633957306539658977'/><link rel='alternate' type='text/html' href='http://cgordon.blogspot.com/2007/04/hello-scala-reservoir-sampling.html' title='Hello Scala: Reservoir Sampling'/><author><name>Charles Gordon</name><uri>http://www.blogger.com/profile/18220920684633877544</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
