A few months ago I wrote a short blog post about nodejs, the technology webinos is based on, and it’s module system NPM. I questioned why, considering how lucrative it might be for a malware writer, was there no malicious software listed in the NPM repository?

After receiving some great comments – thanks to all who got in touch with me – I decided to write a follow-up article expanding further on this question.

Many people pointed out (as I did in the original article) that nodejs and the NPM repository were just one example of the problem. The same could be argued of Ruby Gems, PHP, or any package management and web framework system. Indeed, one could ask the same question of the Debian package management system, which has been going for much longer and has many more packages available. This is all absolutely true. It seems that malware is extraordinarily rare on most package management systems. Even the Google Play Market for Android – much maligned for containing malware – actually has a good track record* considering the number of applications available. Compared to downloading files from arbitrary websites and shareware, app stores appear to be significantly more reliable. Why is that? What makes a package management systems such a good security mechanism, despite the apparent lack of oversight or significant security controls?

App stores and package repositories do offer some accountability, as well as revocation and, potentially, mediation. Developers have to ‘sign up’ in one way or another to an app store or package repository, and can then be held accountable when their submission is found to be harmful. This is only of limited value in most repositories, as new user accounts can easily be created, but makes a reputation for good software something worth protecting. Furthermore, if the repository charges money to create an account or upload a new module, then this money is lost when the terms of service are violated. Revocation is probably more important: when an application or module is identified to be malicious, it can be removed before doing further damage to other people. This provides no protection for early adopters, but the more cautious certainly benefit. Of course, the benefit is lessened if automatic updates are allowed, as malware can be disguised as legitimate software until it reaches maximum market penetration, and then can be modified to exploit the end user. Mediation is often cited as an advantage of a package repository or app store, and the reason why Apple have avoided significant security incidents. Unfortunately, unless you have Apple-like resources, this is unlikely to be cost effective. One of the few exceptions to this rule is third-party mobile app stores. There is something of a malware issue on Android app stores outside the US and western Europe (essentially anywhere that Google Play doesn’t dominate).

Where does nodejs fit in? When I wrote the last article, there was no evidence of any malware, but also no evidence of any accountability, mediation or revocation mechanisms. The NPM repository offers no way to report bad modules and accountability appears minimal. Nodejs certainly isn’t special, but should malware writers take the time to focus on it, it could be especially bad.

Perhaps it is more interesting to ask whether it is worth writing malware for nodejs? As I mentioned in my first post, the potential targets are certainly rich enough. Almost all server-side web apps will have some privileged access to data or services. While good system design can mitigate some of the threats — I received several comments suggesting that TLS connections should not be terminated by the nodejs app, but should use a reverse proxy such as Nginx instead, to protect private keys — it can’t mitigate them all. Furthermore, the reality is likely to be far from the best practice: I suspect many people do run nodejs applications with super-user privileges, and without trying to protect against these kind of attacks.

If malware is a possibility (and is still potentially lucrative) then why does it appear to be so scarce? Having thought about this problem for a while, I can only assume that it is because such malware does not offer a high reward/effort ratio. There is too much low-hanging fruit elsewhere (phishing end users, for example) to make this particular avenue of attack worthwhile. For one thing, it is probably too hard to automate using server-side malware after it has been deployed. There are thousands of ways in which each endpoint might be configured, and no obvious single set of malicious actions that might be performed. As a targeted attack it would be effective (an attacker could spend time exploiting the specific machine it was installed on) but this would rely upon the target making use of this module, which is relatively unlikely. This kind of attack can only work if planned a long way in advance, and if it can be effectively deployed and run on multiple targets at the same time. Indeed, for this kind of malware to be successful, the author would have to develop a popular nodejs module in the first place, which is a significant amount of work. If the exploit is then enabled by an update, it would then be reliant on developers frequently updating their modules. On balance, therefore, it seems that nodejs malware simply isn’t worth the effort.

A more worrying possibility was exposed by Adam Baldwin recently. He discovered that a CSRF flaw on npmjs potentially allowed anyone to update any package. This dramatically shifts the reward/effort ratio. If an attacker simply has to repackage a legitimate module with a few additional malicious components, and then wait for developers to run updates, the impact could be enormous. For example, the ‘underscore’ module has been downloaded nearly 10,000 times in the last day, and has over 1000 other modules depending on it. There will be hundreds of developers who may not even realize that they are using this module. If it was updated to include, say, virus.js with a more malicious payload, thousands of production boxes might be at risk.

However, the main threat from nodejs packages is currently badly written software rather than malicious software. Of the 32 thousand nodejs modules on NPM, it’s not crazy to suggest that several hundred will have exploitable vulnerabilities. Simple attacks, such as malicious content injection, will be present in many modules. Native modules may suffer from all the classic exploits found in any other piece of software. To combat this, the Node Security project aims to audit and inspect every nodejs module and provide “advisories, issues and pull requests so modules get fixed”. This is a laudable goal, although the sheer amount of effort required is daunting. Inspecting this number of modules seems impractical and it is inevitable that only a subset of security issues will be identified. There is a good opportunity for program analysis: if modules can be assessed in an automated way to identify common flaws then the general level of security can be raised without too high a cost. Indeed, the idea of automatic exploit generation is something that The University of Oxford has an interest in, and was recently discussed at the Crest Open Workshop on Malware (PDF presentation by Daniel Kroening). There are many questions as to how such automated vulnerability analysis should be performed in a responsible manner, but the technology exists to make a big difference to a very large number of systems.

One potential system-level mitigation for vulnerable nodejs modules is the use of least-privilege permissions like those found in mobile applications. If developers could intentionally limit themselves to only certain built-in nodejs modules (e.g., just “URL” and “HTTP” modules) then this would greatly reduce the impact of a vulnerability being exploited. Of course, it would not help those modules that have “file” or “process” permissions, but it would aid the auditing effort as only privileged modules would need extensive review. I expect that the same security controls we see employed to protect user-focused applications will slowly begin to find their way into developer-focused tools.

Finally, I’ll end this somewhat rambling blog post by proposing that this is a very promising area of research. With NPM we have a huge, open source repository of source code that has not, in general, been subject to much security analysis. This is ripe for studies on how good developers are at implementing secure software, how effective particular mitigations might be, and identification of the most common mistakes.