While browsing through open-source project sonar’s test-code I noticed that they had package imports with Mockito namespace. What I noticed was that the mocking test-code looked similar to easymock but less cluttered and better readable. So I gave Mockito (version was 1.8.3 back then) a try when implementing new test-cases and did not regret it :).
Easymock before
Around 2005 there were several mocking frameworks available. The main reason I chose to work with easymock was that it was both powerful and refactoring friendly. It supports automatic safe refactorings well because the expectations on method calls aren’t setup as a loose string-snippet but on statically typed info (method call expectations are directly bound to the object-type).
Though I found easymock great and made stubbing and mocking much easier as before, I found it had some drawbacks (speaking of version 2.5):
- The mocking/stubbing of interfaces vs. classes is not transparent. It is done through different Main-classes (EasyMock + classextension.EasyMock). This implied that mixing mocking both interfaces and classes inside one test-class followed in cluttered code and importing hell.
- The error messages of easymock are confusing sometimes. Often it is not clear whether the test-case has failed or easymock was used wrong (e.g. forgetting to call replay()).
- The mandatory call of replay() after having setup the mocked object always felt redundant and made test-case longer.
- The non-clear separation between setting up a mock and verifying it. Setting up a mock added also a verification on all expectations as soon as you called verify(). When writing + reading test-code this always confused me, because you already had to cope with verify logic in the setup part of the test-case.
Mockito after
The guys of Mockito say that they were inspired by Easymock and indeed you see its heritage. After having used it for about 3 months now so far the hands-on impressions are great and I now exclusively use Mockito for writing unit-tests.
My positive experiences were:
- Test-code still is safe in regard of using static-typed based automatic refactorings.
- Transparency of classes vs. interfaces. In both cases you call Mockito.mock(MyInterface.class) or Mockito.mock(MyClass.class).
- Clear seperation between setting up a mock and verifiying it. This feels more intuitive and the clear setup/excercise/verify test-code order is preserved.
- Helpful error message, when an assertion wasn’t met or the tool guessed a framework usage error.
- The naming of methods is intuitive (like when(), thenReturn()).
- When earlier I used the real domain-objects as test-data (i.e. by filling data through setters/constructors), now I use mockito to stub them (i.e. stubbing the getters). Domain code logic has now much less impact on test-runs.
- Nice short, straightforward documentation.
- A great name + logo ;)
In summary: The mockito folks did a great job (they took the nice ideas from Easymock creator and improved its drawbacks). Now looking at old test-code using Easymock I subjectively need much more time to grasp what the intent of the test is. With Mockito the test-cases read more like a clear sequential “requirements story” like test-cases always should.
Migration of test-code
If you are already using easymock the tool switch is amazingly quick. Following migration path helped me:
- Give yourself and your colleagues around two weeks investing time to work with the tool and get comfortable with it. Write all your new test-classes with Mockito.
- If you like it make the switch: Explicitly communicate that using the old mocking framework is deprecated (if possible use static code analysis tools where you can mark severaly packages deprecated (org.easymock.*)). Now usage of Mockito for new test-classes should be mandatory.
- If you have already big test-codebase I do NOT recommend a big-bang test-code migration. Such migration work is time consuming and boring. Therefore taking the incremental approach is better: Only migrate Easymock code to Mockito in case you anyway touch the class, i.e. are modifying or adding test-cases.
Looking at the test-migrations I did so far, migrating Easymock code to Mockito is quite straightforward. Get rid of all replay(), verify() calls and adjust to the slight API changes. The only thing you have to watch out for more are the explicit verification on mocked-calls. Easymock did implicitly verify all expectations when calling verify() on the mock-object, on Mockito side you explicitly have to call verifications on each method. The same counts for strict mocks. You have to add respective verifications.
Tags: Continous Integration · Software Engineering · Technologies/Tools
A appropriate modularization of your codebase is important. It has positive side-effects in many directions (testability, code comprehensability, build-speed etc.). Still there are different levels of modularization. These levels can be categorized from fine-grained to coarse grained.
Note: For simplicty throughout the post I will use java terms (class, method, package). Of course this can be mapped to other languages constructs like functions, file-includes or plain-script statements.

Advantage of more fine grained modularization:
- Maintaining less artifacts (e.g. files, packages, libraries) makes the build/deployment-lifecyle easier.
- For simple features code browsing gets easier (“You see all stuff with one eye-sight”)
The drawback is that the fine grained approach doesn’t scale: The bigger the codebase gets the more difficult it is to “see” any modularization or separatation of concerns. The coarse grained modularization gives you advantage here:
- Bigger systems are easier to comprehend if the “split” is done on package or library level.
- Refactorings are getting easier because inside the modules the direct numbers of dependencies get less (submodules only import dependencies they need).
- Unit-test setup gets easier (for reason see Refactoring)
Where to start?
The question is at which modularization level you should start. There are two major antipatterns, either developers sticked on too fine level (e.g. 6000 LOC inside one single file) or they started on a too coarse level (e.g. each of the many packages only contains one or two classes). The too coarse pattern often occurs if you overengineer solution.
Code from scratch
To avoid the too fine/coarse pitfall I follow the Inside-Out modularization approach:
- Start to edit a single file. Implement the highest priority requirements of feature inside the most fine grained “module” your language can you offer (e.g. class-method).
- When code inside a method gets bigger and you lose overview try to cluster statements (e.g. by speparating them with line-breaks).
- When code statement clusters get too many and you see duplications, extract these section to a new method.
- When there are too many methods and the lines of code inside the single file are very high, cluster your methods by problem domain and extract them to a new class in the same package.
- When there are too many classes inside on package, either create a subpackage or sibling-package which fit to the problem domain (separation of concerns).
- When a package hierachy/tree gets too deep and wide, create a new package hierachy (e.g. com.foo becomes com.foo.system1 and com.foo.system2)
- When there are too many package hierachies inside one library (like .jar), create another library-project (e.g. Maven project/module).
Integrate changes in existing code
Above is a more or less complete list when starting with code from scratch. But how does it apply to existing code and integrating changes? The main principle is the same but you would start your Inside-Out modularization on a different level. As Example: If you have to add code inside a class and see it you feature-adding would result in too many methods you start off with step number four (extracting class).
At which step to level up?
It is always the question, when to level up from a fine to a more coarse grained modularization. It is very difficult to have a thumb of a rule, because this highly matters on code-style taste, on density of ‘if/else’ logic and also on the problem-domain you are trying to solve. A very good test is either ask colleagues for review whether the modularization is intuitive or take another look the next day or a week after to get a fresh view.
Tags: Software Engineering · Software Maintenance
Following discusses the implications of big codebases. Codebase size can be measured with the well known ‘lines of code’ (LOC) metric.
The following codebase size and LOC metric scope is not fine grained on function or class level but for complete codebase or at least on subcomponent level.
Bad (anti-pattern): Codebase size as progress metric
Sometimes (though fortunately rarely) QA or project management is taking codebase size and LOC as a progress metric to see what the project’s state is. The more lines of code have been written the closer the project is seen to have been completed. This is a definite anti-pattern for following reasons:
- It is extremely difficult to estimate, how much code will be necessary for a certain scope or a set of requirements. This implies that project or product management cannot know, how much code is missing to mark the requirements as done.
- It is more about quality as of quantity of code. Well structured code with avoidance of duplication tends to have less lines of code.
- It is very important and valuable to throw away dead code (code which isn’t used or executed anywhere). Using lines of code as a progress metric would mean this important refactoring will cause a negative project progress.
Good: Codebase size as compexity metric
With a higher LOC metric you are likely to face following problems:
- Increase of feeback time: It takes longer to build deployable artifacts, to startup application and to verify implementation behaviour (this both applies to local development and CI servers).
- Tougher requirements on development tools: Working on large codebases makes the IDE often run less smoothly (e.g. while doing refactorings, using several debugging techniques).
- Code comprehension: More time has to be spent for reverse engineering or reading/understanding documentation. Code comprehension is vital to integrate changes and debugging.
- More complex test-setup: Bigger codebases tend to have more complicated test-setup. This includes setting up external components (like databases, containers, message-queues) and also defining test-data (the domain model is likely to be rich).
- Fixing bugs: First of all exposing a bug is harder (see test-setup). Further more localization of bug is tougher, because more code has to be narrowed down. Potentially more theories exist to have causes the bug.
- Breaking code: New requirements are more difficult to implement and integrate without breaking existing functionality.
- Product knowledge leakage: Bigger codebases tend to cover more functionality. The danger increases, that at some point the organization loses knowledge which functionality the software supports. This blindness has very bad implications on defining further requirements or strategies.
- Compatibility efforts: The larger a codebase the more likely it is that it already has a long lifetime (codebases tend to grow over the years). Along the age of software down-compatibility is a constant requirement, which increases (a lot of) effort.
- Team size + fluctuation: Bigger codebases tend to have been touched by a big size of developers, which can cause knowledge leakage. Due to communication complexity, each developer only knows just a little part of the system and does not distribute it. Even worse due to team-size fluctuation is likely to be higher and knowledge gets completely lost for company.
- etc. …
Quantification of LOC impact is hard
Above statements are more qualitative and are not quantifiyable, because the exact mapping of a certain LOC number to a magic complexity number is unfeasible. For instance there are other criterias which have an impact on the complexity of a software system, which are independent of LOC:
- Choice of programming language/system: Maintaining 1.000 LOC of assembly is a complete different story as doing it with 1.000 of Java code.
- Problem domain: Complex algorithms (e.g. to be found in AI or image processing) tend to have less lines of code but still are complicated.
- Heterogenity of chosen technology in your complete source-code ecosystem: E.g. using 10 different frameworks and/or programming-languages and making them integrate to the overall system harder as concentrating on one framework.
- Quality and existence of documentation: E.g. Api-interfaces aren’t documented or motivations for major design decision are unknown. From developers point of view such a system is effectively more complex because a lot of effort has to be spent in reverse engineering.
- etc. …
Conclusion
The metric LOC representing codebase size has a big impact on your whole software development cycle. Therefore it should be measured, observed and tracked over time (also by subcomponent). Apart from showing you the current state and evolution of your codebase from historical point of view you can also use it proactively for future:
- Estimation/planning: When estimating features take the LOC metric has influence criteria. The higher the LOC the more complicated it will be to integrate feature.
- YAGNI: Take YAGNI (“you ain’t gonna need it”) principle to the extreme. Only implement really necessary features. Do not make your software over-extensible and as simple as possible.
- Refactor out dead code: Being aware of LOC as a complexity metric, you can create a culture of dead-code awareness. Throw away as much unused code away as you can.
- Refactor out dead functionality: Software products often are unneccessarily overcomplex. Also push business towards are more simple product strategy and throw away unused features and achieve a smaller codebase.
Tags: Software Engineering · Software Maintenance · Uncategorized