Tags

Scala - end of an era

March 7th, 2014

I use Scala since 2006 - almost 8 years now. It was a bumpy ride in the beginning, but I had a lot of success with it, but now it seems that I ended up in swamp.

The IDE story with Scala was always difficult. It kind of worked somehow, but mixed compilation wasn’t ever anything for the faint of heart. This works now -SOMETIMES. Sometimes is the worst thing you can have in a professional environment. When you commit code that falls apart on a clean build for example. Or it builds fine clean, but shows unpredictable errors on other machines doing a refresh.

This wastes so much time that almost all productivity gains are lost. Add now terribly slow compiles, almost sure forced restarts after the most innocent fixes (unless the compiler doesn’t insert them into the running machine so that you fix them “twice” before realizing the fix is not executed ever) and the balance sheet becomes red.  

Try to find call sites - negative - 90% of the callers aren’t found. No wonder that there isn’t any refactoring worth mentioning. Coding Scala feels like coding C in vi in the early 90’s. Yes, I am old enough to tell.  Inspecting a variable in the debugger? Only in the variable view (oh, you are in a case - sorry, no variable bound). Inspecting an expression (the thing SOOO great with functional languages) - sorry, we don’t have that. I debug mostly with println now.  Early 90’s

While I was writing this, the Scala IDE processes the save of a single .scala file (1100 lines) and the “Building Workspace…” dropped from 73% (after staying there for 10 min) to 70%. No one laughs about Windows progress bars anymore. Needless to say that 100% is followed by 0% just one minute later.

So what is the problem?

One of the compiler hackers, Paul Phillips  gave some insights from the technical side

I think the problem is more rooted in the concept. Scala needs to know a lot of source code to infer types and verify that the types are consistent. In the end the time to compile will go with the complexity of the code. Note that this is not linear with the code-size, it might be easily worse. My experience is that it is at least a higher degree of polynomial: 1000 lines -fine, 10kloc - OK , 50kloc - wait, 8Mloc - write a long blog post.

It can be fast, Haskell is a living proof, but Scala aimed too high. Scala has classes and inheritance. I feel inheritance is a questionable feature that is inherited (unintended pun)  from the JVM. But Java and thus the JVM have a very “generic” type system. Now Scala comes with higher kinded types and whatever while the Java libraries always accept “Object”. This creates a constant mismatch between what Scala thinks a type is (or should be) and what the JVM can cope with. To compensate this, the Scala compiler needs to see through the code and this takes time … and more time.

And even more.

So where to go?

Clojure? Go? Dart?

All three are capable languages, but with the exception of Clojure they don’t integrate with Java code. Mlocs of Java are the reality. It is delusional to  write today a large system without reference to Java code (Googlers, I envy you, but you are not the norm). The typical systems today are huge Java codebases and are maintained incrementally. A one-shot migration is out of question. There are normally only incomplete auto-tests, so you have to migrate step by step to get feedback. And you have to apply them quickly: You can’t refactor for 12 months and then integrate. Real systems are a moving target, otherwise the migration is not worth the effort.

Clojure is a great language. My experience with it was mixed: Great coding (small scale), but when showing this to coworkers: WTF!

The main problem is the syntax. A bit more of “C” syntax wouldn’t hurt. I think the success of JavaScript is largely due to its simple C-style syntax that is in some weird sense “clear”. DSLs are the anti-thesis of this.

DSLs are a good indicator. Almost all good uses of them - as far as I have seen - is  to implement a kind of a meta-object protocol.

Good old CLOS… give me a better syntax and I’ll be all yours (as long as you run on the JVM) 

 

Great JVM news

April 8th, 2009

 The Google AppEngine opens up for Java. Python is a great language, but still a nice-language (compared to Java). A light-weight alternative to host Jav-applications in the cloud.

What some might have guessed is now official?: Twitter is doing its heavy lifting with Scala We don’t have to feel bad anymore to do write statically typed code:-)

Automatic Resource Management blocks

December 30th, 2007

In his (perhaps successful) attempt to sink the BGGA proposal, Captain Bloch send the closuralists packing to Scala island in the Swiss Sea and promised milk(CICE) and honey(ARM) for the rest.

Besides some remaining choices of freedom a ARM block will look somehow like this:

do (InputStream in   = new FileInputStream(src);
    OutputStream out = new FileOutputStream(dest)) {
  byte[] buf = new byte[1024];
  int n;
  while ((n = in.read(buf)) >= 0)
    out.write(buf, 0, n);
}

I thought this should be pretty easy to imitate in Scala as well. Ideally I’d like to have

with (fi<-  new FileInputStream(src);
      fo<- new FileOutputStream(dest)) {
  val buf = new Array[byte](1024);
  def cp() {
    val n = fi.read(buf)
    if (n>0) {
      fo.write(buf,0,n)
      cp
   }
  }
  cp
}

But writing a “with” is not possible - the 2nd idea was to have a with where the resource is “this”:

with (new DbConnection) {
  executeUpdate("delete from scratch")
  commit()
}

As Jamie Webb pointed out a simple wrapper is easy to write and might do the job:

class ManagedResource[T<:{def close()}] (resource:T) {
  def foreach(f : T => Unit) : Unit =
    try {
      f(resource)
    } finally {
      resource.close()
    }
  }
}

and so it is easy to write:

for(pi<-new ManagedResource(new PipedInputStream);
    po<-new ManagedResource(new PipedOutputStream(pi))) {
  po.write('A');
  Console.println((pi.read).asInstanceOf[Char]);
}

So interesting part it: How does this magic work. On Scala island everybody knows swiss-army knives. We usually use “for”. The code above is equivalent to

for(pi<-new ManagedResource(new PipedInputStream)){
    for(po<-new ManagedResource(new PipedOutputStream(pi))) {
  po.write('A');
  Console.println((pi.read).asInstanceOf[Char]);
    }
}

and each block (or closure) is passed to foreach:

new ManagedResource(new PipedInputStream).foreach(pi=>
  new  ManagedResource(new PipedOutputStream(pi)).foreach(po=>{
     po.write('A'); //where is po defined?
     Console.println((pi.read).asInstanceOf[Char]);
   }
  )
)

Here you see the for-magic in action. For defines some nice symbols pi and po for us that make writeing the function much easier than wrapping it all up in nested anonymous functions.

As the expansion of for shows we are getting the nest of resources as we would get if we would write everything by hand. Less visible is the cleanup. Note that the argument to each foreach is a closure. This means th code is excecute where it is writen and not evaluated and then passed to foreach. By this the finally block in foreach executes two times:

  1. After the print
  2. After po had been closed

So far so simple, but why some might ask can I refer to pi and po as PipedInput/Outputstreams instead of ManagedResource[T]s?

The reason is that “for” is just some compiler magic, what eventually gets executed is the foreach which takes a function T=>Unit (i.e. a funtion with a single argument of type T and return value void). T here is either PipedInputStream or PipedOutputstream and the pi and po are type to accommondate for that.

In practice it is a bit more complicated to define ManagedResource because there are three different possible expansions (via map and flatMap). We investigated as well lazy resource acquisition, but this might open some other wholes, but it is possible to write ManagedResources that aquire the underlying resource iff it is accessed. You opening a ctor to the application developer is too dangerours as inadvertedly an open resource might get passed to the ManagedResource without that it can tell that it got already opened.

Resources with explicit open methods as Josh mentioned in his text would solve this problem easily with no overhead for the application developer. An alternative are Linear Types, but that doesn’t go well together with a language like Scala.

Java should stay Java (?|!)

December 16th, 2007

Josh Bloch’s talk at Javapolis caused quite a stir in the blogosphere, why is best summed up in a blogpost and its comments: Is Java a simple, less skill demanding language or is it - better can it be - also a home for programming geeks.There are two sides and a point can be made for both of them. As java had been need it was a geeky language - admittedly fueled by the internet hype - which attracted brillant minds. Thanks to these forerunners Java is now one of the top choices for mainstream programming. The ones of these who still stick around in the Java world now look with envy on Ruby, Scala and sometimes back on Smalltalk and want to use the ideas and techniques prevalent there in Java as well. But in practice - unless you happen to work at Google - most java programmers are not of this class. Big application written in Java are written and maintained by ordinary developers, not übergeeks that think in monads.The typical Java programmer in a large enterprise is more of the type of the VB programmer, perhaps he did some C or COBOL before. These people and their thinking about programming is deeply rooted in imperative style.A point could - and personally I think it should - be made if projects really require such large staff and if it weren’t better to use more advanced stuff to build it with fewer, but appropriately skilled developers (this is the Ruby claim), but face it, the enterprise world changes too slowly to accomondate for that.More than 10 years ago we evaluated some new development environments as a replacement for Oracle Forms 3. The most interesting candidates had been Forms 4.5 (obviously), Smalltalk (I don’t remember the vendor) and Franz Allegro CLOS (Ok, this had been the one I fancied). Java had been 1.1 or so in these days and the first looks at it were disappointing, C++ had been already in use, but there had been so many problems with it that it wasn’t ever really a consideration. Eventually we went for Forms because it would give our staff the easiest transistion. The main problem with Smalltalk and CLOS had been the paradigm shift. Neither the pure OO of Smalltalk nor the functional programming in Lisp were accessible to them.I think that the same could happen to Java if all the fancy functional programming (I love it!) would get into it. There are three problems

  • FP is not compatible with the mindset of the majority of enterprise programmers. Also FP programs are harder to debug - this is a problem for the prevalent trail-and-error programmer (aka debugging into existence)
  • Java is not Scala, one problem of BGGA is the non-local return. This isn’t an issue in true FP languages which are expression oriented, the statement oriented Java makes it awkward to deal with it
  • DSLs are in fashion and a powerful concept for layering a complex application, but they create as well new languages: You can’t hire a Lisp or C++ programmer and expect him to be immediately productive as they have to learn the DSLs of your projects first

I initially found CICE (which Bloch promotes) to come a bit short, but now I am convinced that this is the right way to change Java as it keeps the characteristics of the language and can easily integrate into larger teams in any organization.This is the curse of being a mainstream language, you most not let out you users in the cold.For the other classes of problems languages like Scala that run on the JVM could be chosen. There some research has to be done by the system architect(s) how such code can integrate with Java code and how maintenance can be organized. This isn’t a trivial problem as not everything from another JVM language integrates seemlessly with Java code (i.e. you use this code as if it was written in Java), as well the question has to be answered (in each indiviual organization) if there is enough stability in the project to support an other language throughout the life-cycle - imagine a freelancer writing some core code in a language only known by him.

Expression idiom for Java

November 17th, 2007

Scala has spoiled me, accepting that an if-else doesn’t yield an expression gets harder and harder for me. As my day job is still Java based, I looked for some replacement for

class Foo {
    val baz = Map(1->2, 2->3)
}

First step is to javaize it:

class Foo {
    val bar = {
        val map = new java.util.HashMap()
        map put(1,2);
        map put(2,3);
        map
    }
}

And now in Java this can be translated to:

public class Foo {

	final Map<Integer,Integer> bar = new Object(){
		Map<Integer,Integer> map() {
			Map<Integer,Integer> map = new HashMap<Integer,Integer>();
			map.put(1,2);
			map.put(2,3);
			return Collections.unmodifiableMap(map);
		}}.map();
}

Why the anonymous class? I could wrap it into a method, but this method must be private, otherwise bad things can happen when the class gets extended (never call non-finals from a constructor). In the anonymous class it is clear that this code serves only this single purpose and is not meant to be used elsewhere; nobody would lightheartly widen the visiblity of this code.

The thing new for some might be that I can call map() on Object. It works on an anonymous calls definition, not elsewhere. Thus you can store away your initializer, if access from a reference it degenerates back to an ordinary object, things that don’t work are:

final Object initializer = new Object() {
  String name() { return "MU";}
}
final String name = initializer.name(); //method doesn't exist

and

final String name = new Object() {
  Object me() { return this;}
  String name() { return "MU";}
}.me()
  .name(); // me returns a simple Object, no name() method

Besides the Java clutter, if you can establish it as an idiom it is still fairly readable and expresses well your intent - it is just the syntax imposed by the language which makes it different from the from bit of code. It also helps eliminating non-final variables from a method as you can construct a value completely before exposing it. Of course there is a runtime penalty for the extra Object to be created and collected. The limitations can play for your advantage: The method (why singular?:-) you declared can’t be called elsewhere (unless you use reflection), a “private” method you can call!

Anyway, one question remains: Why did the “return this” idiom heavily used in C++ (at least in my days) out of fashion? In Java you find it in StringBu(ild|ff)er, but rarely elsewhere. Some attempts to create DSLs in Java used it, but besides it seems to be bad style - anyone know some good reason for that? Doing so would allow to write

final Map<Integer,Integer> bar = new HashMap<Integer,Integer>()
    .put(1,2)
    .put(2,3);

Reflection

If you read so far, here the reflection leak:

Object o = new Object(){
    Map<Integer,Integer> map() {
        Map<Integer,Integer> map = new HashMap<Integer,Integer>();
        map.put(1,2);
        map.put(2,3);
        return Collections.unmodifiableMap(map);
    }};
Method mapFn = o.getClass().getDeclaredMethod("map");
final Map map= (Map) mapFn.invoke(o);

Even more interesting - by this you can pass hidden methods (why plural?:-) through your system - I can imagine even a good use for that: This allows hooks into the defining object so that you can tell where it comes from - worst OO style, but for some intricate manipulation within an API perhaps worth a consideration - or simply an unobstrusive way to to debug with a decorated instance.

Monads are Elephants

October 20th, 2007

Monads are usually associated with being something highly theoretical that nobody really understands - unless you either worked with category theory or came to Haskell via another route. An elephant in some sense - big grey matter…

James Iry provides a Monad introduction for Scala programmers, he leaves out most of the what a monad is, but explains more pragmatically what the qualities of monads are. So far part 1, 2, 3 and 4 (of a series of 5?) are available on his blog.

Give it a read, you might learn that you had been already using monads without knowing and why this perfectly nice Java code is hard to understand for other programmers. If this is the case give this intro to those programmers, in any other case learn it yourself.

(updated 15.11.2007)

Scala on .NET

October 9th, 2007

Scala is now again capable to compile to IL-code so that you can use it from a .NET program!This feature vanished a while ago as the compiler got overhauled and now it is back - according to Martin Odersky, the main figure behind Scala “There are some known problems with exceptions, which we might be able to solve soon.” - but still, the best thing since sliced bread.Here is how to do it (get your paths properly set!):

  1. Get Scala (JVM >1.4.2 needs to be installed)
  2. Install the MSIL package with “sbaz install scala-msil”
  3. Compile your  scala files with “scalac-net  <files>”
  4. Build  a dll from them with  “ilasm /DLL  yourfiles.msil”  (use /EXE if you have a main method to get an executable)
  5. Reference the dll from you .NET project

You can also use .NET libraries from Scala code, for the compilation you have to add the referenced DLLs with the parameter -Xassem-path First.dll;Second.Dll to the scalac-net commandNot so difficult and with some scripts pretty quick to integrate. Of course it will not work to use scala classes that  reference java libraries! Neither will you be able to run (not even compile) scala classes that use external DLLs - unless you want to dive into the JNI hell. A compile switch that warns of “unpure code” to the Scala compiler would be helpful here (Something I should ask for, hopefully I won’t forget).The limitations are not such a problem, the main goal is having a modern language (that is even more powerful than C#, more modern than Java 7 with all discussed extensions) to implement business function that can be directly used on both platforms is a huge win. I am curios what nice libraries will be published for the use on both platforms - distributed as a DLL for .NET or a simple JAR for Java environments.

Designing in Code

October 8th, 2007

From time to time I re-read two books: The Deadline and The Soul of a new Machine. They both are a sanity check to me and to what is going on around me. Both are in a novel form that makes them also a bit fun to read. This night I picked up again The Deadline, went to my favorite brasserie for some dinner an wine, I flipped through the pages and started reading somewhere in the middle.

 

Fine grained design (sorry I have only the German translation)

 

The key statement was that you can move quicker if you spend less less time on debugging. This is undoubtedly true I think, but their remedy is something I ever disliked: Fine-grained designed where you try to verify before starting coding . I never liked theses design documents for a simple reason: They spend lots of pages on the obvious to hide the sloppiness of what is poorly (if at all) understood. When verifying it you spend much time on the obvious and happily overlook the parts that aren’t well defined, as they can’t be that important – otherwise the author(s) would have spend more space on it, haven’t they? When implementing it, you’ll get the 90% syndrome: ask the developer after about half of the estimated time how far he got and he will answer “90% completed” - problem is – these ten percent will stay around forever, after 200% of the estimated time the programmer (now with tired eyes and a bad shave) will still tell “about 90% done”.

 

Short, I believe the only true design is the code (“The system is fully documented – granted you can read C++“- Z.Köntös), but of course not any code, it has to be expressive. In C days a basic design had been made from documented header files, in Java you create interfaces or if someone pays you to do so you’ll sketch endless variations of UML diagrams. UML is believed to be a superset of what you can code – given a smart generator it transforms your design into code where “you just have to fill the gaps” and changes to the design in the code “can be reverse-engineered”, NOT. There are some cases where certain diagrams are really useful to get you an overview of a project, but all examples I know fall in two categories:

  1. Reverse engineered and handcrafted for better readability

  2. Metacode that is feed into interpreters and MDD tools

 

If we have to accept that the design and the code have nothing in common, why not simply skipping this stage. 20 years ago this hadn’t been a proposition. But it is not true anymore. Modern languages and IDEs provide browsing and refactoring or mix in mechanisms that make the design malleable even if it it is (already) in code. With C and COBOL it might be possible as well, but it hadn’t been possible in their time. This means that your design process is less a constant than perhaps the design! If we accept this the consequences for the work are huge: If you use a language that is capable of expressing your design, it is idiotic to use a design tool outside to describe it. You’ll break your neck to describe in prose or diagrams what is a common idiom in your language (…API, DSL) – only to end up with something that is weaker than the code you have written. And this is the crucial point: Design tools and generators assume that their representation is complete, as any generator can only loose information – if your code is more expressive the generation and the reverse engineering of code cannot work at all. If you are inclined to follow, beware of the caveat: If your language is expressive enough, but you don’t use it appropriately the representation of you design will stay incomplete and you might need additional documentation - or better (but fewer) programmers.

You can use an expressive language to get around formal design, what do you get:

  1. Instead of an intermediate layer, you have something you can work with

  2. Your design is as live as the code is – any refactoring updates your documentation

But back to the claim of DeMarco’s book. It is is indeed still that verification of the design saves debugging, but that doesn’t mean that the design has to be made in another medium than the code. I came to this during my recent experiences learning Scala. I wrote some pretty complicated stuff and went naturally a bit wild with some feature of the language, but I noted one thing: I spend very little time on debugging (one thing I wrote was a specialized implementation of nauty). The language forced me to (or perhaps I did myself as I wanted to learn as much as possible from my examples) express concisely what I want to do. And as the language allows you to be extremely type-hard in your code it actually enabled me to do so. I also discovered that I wrote code that tended to be more declarative to avoid debugging into deeply nested list comprehensions. This topped even the positive experience I had years ago starting using Java, which I found pretty crippled as a language coming from C++.

 

One of my famous war stories is that I once wrote 400 lines of C++ on a single afternoon that compiled with no errors and also worked as I intended. I never managed to do it again. It was pretty convoluted code (technically a small interpreter) and I never managed to the same in another language I know very well: PL/SQL. And I have also Java code that has a complexity below those 400 lines C++ where I think that I’ll never be able to get them to work correctly.

 

So what is the key difference? Abstractions. Scala makes it easy to build abstractions, also because you get instantly rewarded by it, C++ as a very flexible language allows it also (though it is much harder), Java knows abstractions in form of APIs and frameworks – but this doesn’t help you as soon you are getting into region far from the API and your framework. These abstractions can be called DSLs, you shape your environment to you needs, perhaps even in multiple layers. In C++ operator overloading has been the too for that, in Java frameworks and APIs (quite clumsy in terms of syntax) where in Scala the difference between operators and methods dissolves completely ( as does the difference between a block and a parameter; and the extractors as antonyms to constructors). My C++ example was btw a good example of abstractions, many classes, few methods with am orthogonal interaction – such code either works or can’t be written at all. Making good abstrations in PL/SQL is as in Java on API-level.

 

Where is the dynamic world?

 

As much as I share the thoughts of Paul Graham, I can’t relate to his indissented preference for dynamic typing; there he is missing the point. Yes, malleable code is important, by if you have to keep track of each instance you might reinterpret it will likely blow your mind. I think he meant that you are coding DRY (Don’t Repeat Yourself), something that can be done perfectly (not that hacking style that is considered as expert Ruby) in Lisp (if you know macros!). Static languages can offer you the same comfort without resorting too often to macros (hard to implement in any language with syntax – Lisp doesn’t have any that’s why it is the only language having this feature) if you have type inference as well (Scala, C# 3.0). If you need to change a type, do it, the program will follow, but the compiler will shout at you if you crossed the line. This makes it even possible for others to change you program, because they get a quick feedback when they do something your initial design didn’t foresee.

 

The others

 

How is most software written today? Unfortunately with the methods of the 70’s and the tools of the 80’s (if at all). Even if they happen to use a technology of the 90’s, the methods are the old ones and the code will not exploit what is possible, but what is in the line what the environment gives you. In summary: You have a spotty design, verbose inexpressive code (“it is simple everyone can write it” – “if you have some dozens of programmers in India”) and exploding budgets and/or disgruntled customers. Poorly understood design leads to missing/faulty features, correcting these in a mess of thousands of lines of indistinguishable rubbish slows you down; the actual development moves to the debugging stage (if you are still lucky with a customer a beta(?) tester) – I call this “debugging into existence”. Even if development is always trail and error - debugging is different: When you discover a misconception during development, you go and update the spec – in debugging, the only trace will be a “resolved” in the bug database, more over debugging should fix issues and mustn’t create new ones, thus most people that correct bugs, correct them as locally as they can, even for the price of copy&paste code and compromised architecture.

 

Apply on Tuples

June 10th, 2007

After playing with implicit anon function I came up with the following useful extension to Tuple, here the new method for Tuple2 :

def apply[R] (fn : Function2[T1,T2,R]) = fn(_1,_2);

This allows the following to write very easily:

def add(a: Int, b:Int) = a+b;

val a = 1::2::3::Nil
// calling binary functions with implicit parameter of Tuple is not possible
val li246 = (a zip a){_(add)}
//nested implicit parameter - acts like an explode on Tuple
val liSquared = (a zip a){_{_*_} }

Perhaps this is best implemented on the Product traits.

What bothers me a bit is that there are now two equivalent ways of adding

(2,2)(add) and add(2,2)

Anyway, I think having a tuple and calling a function with this as parameter list is quite a common idiom. Really nice if this would also work for currying, have to check this

Why not adding this to Function2? I thought about (and tried it) and then decided otherwise:

  1. It would weaken the typing, afterward it would be impossible for the compiler to tell if I was calling a Function1[Tuple2,R] or the Function2[T1,T2,R]
  2. It gets very confusing for the parser, especially with the new uses of ‘_’ for impartial function evals and implicit args

Java-generics in Scala

April 6th, 2007

Something I overlooked in the tool-docs and complained about:

–Xgenerics
Use generic Java types.

This makes the following possible:

implicit def JavaCollectionToList[T>:Null<:Object] (coll: java.util.Collection[T]) : List[T] = {

coll.iterator();

}

implicit def JavaIteratorToScalaList[T>:Null<:Object](it: java.util.Iterator[T]):List[T]={

it.hasNext match{

case true=> val head = it.next(); head :: JavaIteratorToScalaList(it);
case false=> Nil;

}

}

With these coercions a List[String] can be assigned from a java.util.Collection<String>

Open question: How will this work with .NET generics (not an issue for me at the moment and the IL-code generation seems to have a problem)