Overview - Student Internship Position - Understand, investigate and explain complex and/or malicious websites
Lookyloo started as a side project aiming to help internal teams in media organisations to have an overview of the content loaded on their websites. It is now used on a daily basis by CIRCL in order to analyse phishing and other malicious websites in the context of incident response.
The current status of the system makes it relatively simple for an analyst with a good understanding of web technologies to understand what is going on on a website, but there is still work to do in order to make the capture easier to understand and analyse by less technical users, expecially for big websites, often loading massive amount of contents from a vast array of 3rd party services.
The core goals of this internships are following:
- Getting used to the concepts and limitations of web scraping with the following tutorial
- Getting used to Lookyloo by writing test cases using the testing page, and expanding the examples is necessary
- Add a functionality in Lookyloo to let the user pass a proxy when making a capture in order to compare captures made from different locations
- Write a report explaining the findings made along the way
Proxy feature
Websites will often be very different depending on the origin of the visitor. For phishing websites it will often mean that the visitor will either be redirected to he page of a service used in their country, or to something legitimate if the attacker is specifically targeting victims in a specific place. For websites using ads networks, it will impact the bidding process and the ads shown to the user will be completely different.
The goal of this feature is as follow:
- Allow a user of Lookyloo to pass the proxy configuration from the capture page to the service doing the capture (splash, see below for references)
- Make sure every requests are using the proxy
- If everything is working as expected, make it possible for the administrator of a Lookyloo instance to setup a list of proxies that can be picked from the capture page
Write up if the findings
Web technologies are extremely versatile, allowing the developers to do a lot of extremely odd things. Many of them are untangled by Lookyloo and the libraries used in the project, and some of them are in the testing page but all the techniques are poorly documented.
Writing a report describing the techniques used by attackers, but also by legitimate(-ish) services in order to track their users would be a great contribution to privacy research in general.
Current status of the project
Lookyloo connects together a few tools in a consistent manner:
- Scrapy, a webcrawling framework (Python).
- Splash, a webservice used for rendering the website and generating the HTTP Archive (HAR) file (runs in a docker).
- ETE Toolkit, a Python framework for the analysis and visualization of (phylogenetic) trees (Python).
-
D3JS, for the visualisation of the tree in the browser (JavaScript).
- ScrapySplashWrapper, a simplistic library relying on scrapy to filter out the ressources to open on the website to investigate. Then, it queries Splash, format and returns the data generated by it (Python 3.7+).
- har2tree, a library that generates an ETE Toolkit tree from the HAR file, and other data returned by Splash (Python 3.7+)
- Lookyloo glues all the parts together (Python 3.7+, Javacript, CSS, HTML). Note that the webserver used is flask
The current code is stable but needs a lot of improvements in order to support the required features.
Your task is to understand the code and interfaces to other services and bring the code to the next level.
Your work will be part of the daily activities of CIRCL and for countless people doing lookups against our web service.
If this is a challenge you like to accept, talk to us!
Qualification
- Must be an EU citizen with a valid work permit in Luxembourg
- Must be eligible for a student internship in the field of information security and/or computer science
- Must have a high-level of ethics due to the nature of the work
- Must be fluent in English, Unix, git, and Python. JavaScript and web development in general would be a plus.
- Contributions performed under this internship will be released as free software
How to apply
The application package must include the following:
- A resume in ASCII text format
- A motivation letter why you are interested in the internship
The package is to be sent to info(@)circl.lu indicating reference internship-lookyloo-02.
Application deadline
The deadline for the application is the 15th of March 2021. Applications received after the deadline will not be considered.
Classification of this document
TLP:WHITE information may be distributed without restriction, subject to copyright controls.