milieau.01 plans and legality
25.11.13 The home for source code: github.com/kaipapar/milieau-analysis
Primary ideas and project scope
The goal is to
- create a CLI program that scrapes a specified property listing site for relevant data about properties on sale in the Turku city centre.
- Use the resulting data with Turku city center environmental types dataset
- and analyze whether or not milieau affects property value in the data.
I’m setting the scope so that I can get the scrape app up and running as soon as possible considering the functional and non-functional requirements, setup the milieau dataset so it can be used with the scraped data ie proof of concept. Then I’ll scrape the bare minimum and try to get the analysis tools set up so that further analysis is easier if I have time to scrape more data. I’m limiting my scope to the centre of Turku as the milieau data only reaches that area.
It will also be a learning experience for me in statistical analysis, not just programming, at least I hope I get that far.
Inspiration
I’ve wanted to utilize open data somehow for a few years now! I dipped my toes into open data through a university course where we cleaned and uploaded a dataset on Theses from our uni and then analyzed it. I was the team lead in that project which we worked on through the whole course.
I have since then (or maybe before) been envious of people clever enough to utilize open data for informative and useful purposes. For example quite recently an app that showed how much the buses in Helsinki area were speeding: ylinopeudet.com. There are other great ones aswell, which I am forgetting now.
This one that catalogues movie reviews by finnish critics: https://ollisulopuisto.github.io/kriitikot/ albeit I couldn’t find the source code, I’m not use if the data is gathered manually or with a webcrawler. Either way the dataset is most likely generated by Olli Sulopuisto and not found somewhere. As a sidenote: Mr. Sulopuisto has had a very interesting career in media.
I’m also very interested in properties, the city and architecture. I also currently am living in Turku. So it is only natural that I would drift towards those subjects when browsing avoindata.fi on a rainy day. Alas, there it was: Turku city center environmental types, I call the types milieaus, as they are called miljöö in the finnish version. Combining data relating to the built environment of my hometown was instantly in my head. I was probably browsing oikotie.fi, because the idea for utilizing property listings didn’t take too long to develop.
Legality
The legality of systematically browsing and scraping a website was a matter I was and am concerned with. I have little interest in violating copyright and when it comes to working with pricetags and businesses I get antsy. But after some research I came to the conclusion that scraping a site is legal if:
- the data gathered is only factual in its nature (price, size etc) and not creative output,
- scraping is not forbidden in the TOS of the site,
- sources are cited properly and
- it is done in good taste, eg. for educational reasons. I might be forgetting something, but those four points ease my mind. In fact, the first point already clears my usage from copyright laws which the third and fourth points declare further, but I like to have most bases reasonably covered.
If you have any additions or corrections, get in touch. I’m not that well versed in law and would like to know whether I am correct or not.