Data Deutsch Privacy Uncategorized


Die grossen Datensammler wie Facebook und Google haben viele Schnittstellen in ihren Systemen. Die heissen APIs und bieten Zugriff auf Daten. Externe Anbieter können unter bestimmten Bedingungen auf diese Daten zugreifen und ein Geschäftsmodell darum bauen. Das ist im Prinzip so ähnlich, als wenn ein Handyhüllenhersteller für ein bestimmtes Telefonmodell eine Hülle anbietet oder ein Autozubehörhersteller für einen bestimmten Autotyp sein Zubehör. Wenn das Telefon nicht mehr gebaut wird, ist das Hüllengeschäft vorbei, genauso beim Autozubehör.

Im Falle der Daten geht es dabei immer um Monetarisierung, das Kombinieren und Anreichern von existierenden Daten und darum, noch mehr Anreize zu schaffen um neue Daten zu sammeln, die dann wieder für irgendwelche Services genutzt werden können. An die Kunden wird immer die “Gute Seite” verkauft.

Data Life

We Lost The War

Since 11 years I’m living in France. Since November 2015 France has a so called state of emergency (état d’urgence).

On 13 November 2015, …, due to expire after four extensions in 2017. As of 23 July 2016, almost 3,600 houses had been raided under the state of emergency, leading to more than 400 arrests, the seizure of more than 500 weapons including 40 war weapons, and four or five of these raids led to a terrorism-linked judicial investigation. Some Muslim rights groups criticized the raids as unfairly targeting French Muslims, especially those of North African descent, claiming that they are conducted with little concern for civil rights, and pointing out that only one terrorism-related investigation led to prosecution by August 2016. On 16 November 2016, President François Hollande and Prime Minister Manuel Valls announced that the state of emergency would be extended until the 2017 presidential elections, stating that the measure would be necessary to protect rallies and other events during the electoral campaign.

France is still in this state of emergency …

12 years ago (2005) I attended the CCC congress and saw this talk …

Data English Fun Germany Internet Open-Source-Software server Software Website

Mobile Devices, Drupal, Composer and 217 km

Beside my Drupal work at I attended the Mobile Users FFM monthly meeting on Wednesday. The group exists since 13 years and started as the “Palm user group, Frankfurt”. Do you remember Palm? During the last 13 years the world has changed and it was interesting for me to hear stories about different types of smartphones, smartwatches, phone contracts and gadgets like the Yota Phone 2.

Data Software Website

Start your Blog – Today!

A blog is a discussion or informational site published on the World Wide Web consisting of discrete entries (“posts”) typically displayed in reverse chronological order (the most recent post appears first).

A conceivable simple concept.

The existence of blogs has now arrived in everyone. Blogs are read and also the traditional media take them seriously. I’ve also a blog that you are reading :)

When you post messages on Facebook, Twitter, Instagram, Snapchat, Pinterest, Foursquare, CouchSurfing, DeviantArt, Ello, Flickr, Google+, LinkedIn, Meetup, SoundCloud, Tumblr, LinkedIn, YouTube and all the other platforms, then you run also something like a blog.

If you are looking for an entry (status, check in, photo, video), you realize many things. It is not easy to search, archive or download your “own” data. It is often impossible to transfer your data from one platform to another. You can not evaluate “your” data.

You also’ll realize that it is not “your” data. Depending on the conditions of use of the platform you have issued many or all rights for your content to the company that operates the corresponding service.

My data is not worth anything anyway

If LinkedIn is purchased for US $ 26,000,000,000 by Microsoft, then Microsoft paid about US $ 60 per user (User 433,000,000). Do you have a LinkedIn account? Do you also have an account with other services?

I have accounts at 12 major services where I post occasionally content. If I expect US $ 60 per service, my data is currently worth around 720 US $ on average.

Your data is worth something!

At LinkedIn, users networked to do business. Please consider briefly what kind of data you are generating : texts, photos, music, videos, your fitness bracelet, your car, your bank card, your home automation and all the other stuff that creates data .

Collect Data

Do you also have a box of keepsakes from your childhood? Any tinkering, postcards, pictures, souvenirs and other things. Sometimes it takes 30 years or longer, until you look back into that box. Often to show them to your children. Maybe you don’t have a box and store everything to your memories.

In the age of online communication most memorabilia are made from bits and bytes and are stored on data storage media, on which you have little control. That means, even if you have no box, other people you do not know personally have a “collection box with your experiences”.

Private companies, and increasingly governments, collect treasures from historical data. They are used for predictions about the future. Based on these data decisions are made by SMART algorithms. Of course, Microsoft is only interested in the details of 433’000’000 business contacts and the raw data in the case of LinkedIn. Microsoft sells software, hardware and services for this target group and 26,000,000,000 will recoup a profit.

But back to your blog project.

A private blog

Now a private blog is of course not the solution to all problems related to data and not the ultimate archiving machine, but it is a bit “more ownership” than on the platforms with their services.

When I wrote the STOP BÜPF article (German), it struck me how important any blog can be, even if it’s a small one.

This tweet lead to some blogposts of Swiss providers.

Swiss provider whether access, Web or communication provider of any kind: Please call on your clients to sign! # StopBÜPF

Finally it was possible to collect more than 50,000 signatures which is a base for a referendum against the law. BÜPF is a proposed law about censorship and surveillance in Switzerland. Have a look at this video to get an idea what could be possible afterwards (subtitles in English are available).

It’s a good feeling when you publish your text on your platform and then post the link to your platform in different services. The principle is called POSSE (Publish on your Own Site, Syndicate Elsewhere).

In POSSE your content is stored on an environment over which you have control. If you refer for example to Facebook on your own blog, then Facebook “visits” your blog and copies the first words and an image and displays both in your status message. Similarly, it works on all platforms. When you offer an RSS feed, your data can be read in an external feed reader. However, the data remains under your control.

The consequence is (among others):

You are perceived. This can not prevent, because everybody wants your data and are curious what you have to say.

The longer you think about it, the sooner you’ll probably notice why it is good to have your own blog.

This is a small list of benefits:

  • It helps you to learn new things
  • You begin to think more clearly
  • You learn to write better
  • Your self-confidence grows
  • You talk more structured about topics you wrote about
  • You can make money (if you wish)
  • You can support a good cause
  • You need no prior knowledge
  • It is a real challenge :)
  • It’s free (or affordable)
  • You learn always something about yourself, others and the issues about which you write

How to start?

The popular software for blogs is WordPress. It works well with many other programs too and all the known and unknown content management systems, but WordPress is simply practical and has become the de facto standard.

First steps

If you do not have a blog, you can set up a blog on for free in minutes. is also a service (from the USA) and your data will be stored on their hard disks, as in the platforms described above. But you’re the one who determined about your data. You can export them at any time and import them into a self hosted WordPress installation. At this moment you begin to take over any responsibility for your data. You can import the data also into many other systems (Joomla, Drupal).

The WordPress software is open source and is developed by a large community. You can download the source code from and install it locally or at a hosting provider of your choice. You can also at any time move your data to another place.


The farther you go away from “all around carefree” services, the more responsibility you transfer to yourself. It’s a bit like growing up. Suddenly you have a car / phone / bike / boyfriend / girlfriend / family / apartment / house / boat and you learn that you have to care so that it continues to work well.

A blog is therefore also a good exercise in “grow up”.


You need a blog and if you already have one, then please post but your URL as a comment.

Content Management System Data English India Internet jwc15 Open-Source-Software Software Statistics

How to make money with Joomla? SURVEY

As you see above, I’m on my way to India to present at Joomla World Conference in Bangalore, India (November 8th) .

The title of the session sounds simple, but it’s a big issue around the globe. In addition to this great community, the whole Joomla love, peace, any Joomla Pizza, each Joomla beer and the ultimate happiness for all, many people just want to make money with Joomla :). This is good and often not easy. In this session, I would like to show examples of real companies and people from different countries and how they make money with Joomla. I’ll show business models and opportunities and probably you will start after this session immediately your career based on Joomla or even set up a company and sponsor the # jwc16 next year :)

Let’s do it together

To prepare the session I want to ask just a few questions about money!
Don’t worry,

  • I don’t want to have your identity
  • the survey takes less than a minute :)
  • I’m just interested in some data.

So, click the link NOW!

Take the “Earning money with Joomla?” survey!

Don’t hesitate to contact me or provide your email address if you want to talk with me about your “Making money with Joomla” story.

Hope to meet you at #jwc15 !

Data Politics Privacy server Software

Meine Daten in den Zeiten der Cholera


Viele Menschen teilen die Geschichte des Internet in eine Zeit vor den Enthüllungen von Edward Snowden und der Zeit danach. Mir geht es ähnlich. In den letzten 12 Monaten (2013/2014) war ich schlicht sauer und enttäuscht über das was ich da täglich hörte und las (NSA, GCHQ, BND, etc) und erinnerte mich an ein Lied von Kraftwerk aus dem Jahr 1981, also an ein Lied von vor mehr als 30 Jahren!

Interpol und Deutsche Bank,
FBI und Scotland Yard
Flensburg und das BKA,
haben unsere Daten da
Nummern, Zahlen, Handel, Leute
Denn Zeit ist Geld
(Video auf Youtube)

Edward Snowden wurde 2 Jahre später geboren (1983).

Im Jahr 2005, also etwa 25 Jahre später war diese neue Computerwelt bereits Realität (Frank Rieger: We lost the war. Welcome to the world of tomorrow).

Wie sieht es denn bei Ihnen so aus?

Je nach persönlicher Situation haben Sie sich im letzten Jahr sicherlich einige Fragen im Zusammenhang mit Ihrer Privatsphäre, Ihren persönlichen Daten und den Aktivitäten von Regierungen und grossen Unternehmen gestellt.

Hier ein paar der harmloseren Fragen, die mir auf Anhieb einfallen:

  • E-Mail: Über welchen Provider verschicken Sie ihre E-Mails? Verschlüsseln Sie Ihre E-Mails?
  • Multimedia: Wo lagern Sie Ihre Fotos und Videofilme?
  • Bücher: Wo lagern Sie Ihre PDF und EPUB Dateien? Sind Sie Eigentümer dieser Dateien oder Besitzer?
  • Musik: Wo lagern Sie Ihre Musikdateien? Gehören Ihnen diese Dateien?
  • Art der Speicherung: Syncen oder streamen Sie?
  • Instant Messaging: Mit welcher Art Messenger kommunizieren Sie mit wem?
  • Dokumente: Wenn Sie eigene oder gemeinsame Dokumente erstellen, wie machen Sie das?
  • Website: Haben Sie eine Website? Wo lagern Sie diese?
  • Backup: Haben Sie eigentlich irgendwo ein Backup dieser Daten?
  • Kosten: Bezahlen Sie an irgendeiner Stelle etwas für diese Dienstleistungen?
  • Soziale Netzwerke: Haben Sie einen Facebook Account? Twitter? Google+?
  • Mobil: Nutzen Sie ein Smartphone?
  • Haus und Garten: Haben Sie per App schaltbare Lampen, Garagentore, Bewässerungsanlagen, Türöffner oder Heizungen?
  • Gesundheit: Messen Sie Ihren Blutdruck oder andere Parameter mit Hilfe von Apps?

Ich gehe mal davon aus, dass Sie ein E-Mail Konto bei Google oder einem anderen grossen Unternehmen, sowie ein Facebook Benutzerkonto haben und ein paar weitere Services nutzen.

Alle diese Services müssen auf der einen Seite Geld verdienen (meistens mit Werbung) und Ihre Daten je nach Gesetzeslage an entsprechende Regierungsstellen weitergeben. Die meisten Firmen dürfen nicht über die Weitergabe sprechen. Die konkreten Gesetze unterscheiden sich von Land zu Land. Bei Urheberrechtsbruch-, Terrorismus- und Kinderpornoverdacht dürfen die Behörden meist sehr schnell, sehr offiziell auswerten. Ohne ein Verdachtsmoment versuchen die meisten Länder die Metadaten der Kommunikation unterschiedlich lange zu speichern (Stichwort Vorratsdatenspeicherung).

Drohneneinsätze werden auf der Basis von Metadaten geplant und durhgeführt (Lesenswert: “Wir töten auf Basis von Metadaten” )

Um es mal kurz zu machen: Wir stehen alle ziemlich nackt im Netz.

Und ja: “Wir haben etwas zu verbergen!

Was können Sie und ich tun?

Ich glaube nicht, dass ich die Praxis des Abhörens abschaffen kann (Allein mit den Gedanken zu diesem Satz könnte man Bücher füllen :) )

Ich kann es den Abhörern schwerer machen, in dem ich versuche, Alternativen zu den Services der grossen Unternehmen zu finden und wo immer es geht, meine Daten zu verschlüsseln.

Und ich kann darüber schreiben, damit das mehr Menschen tun :-)


Ich habe beschlossen ausser meinen Websites einen möglichst grossen Teil meiner Daten selbst zu hosten und meine E-Mails selbst zu verwalten. Dazu konfiguriere ich mir einen Server, der mir, wo es geht, verschlüsselte Kommunikation erlaubt und der die Dienste anbietet, die ich so brauche und die auf Open Source Software basieren.
Beispiele siehe Tabelle, ich freue mich über weitere Ideen.

E-MailGmail, Yahoo!, etcE-Mail Server, Exim + Aufsatz
Website/IntranetTumblr,, Trello, Basecamp, wordpress.comSelbstgehostete Website Webserver/DB FTP
FotosFlickrOwncloud, andere Idee?
Dateien (Musik, Film, PDF, eBooks)DropboxOwncloud, andere Idee?
Kollaborative DokumenteGoogle docsOwncloud, EtherPad
Surfen ohne Einschränkungenhidemyass.comSelbstgehostetes VPN
BackupiCloudDuply? Aber wohin?
MessengerSkype, TelegramJabber
TelefonierenSkypeVoIP Asterisk
OSOSX, WindowsLinux



Vertrauen ist wichtig. Aber an dieser Stelle wird es heikel.

Um einen Server zu betreiben, braucht es Hardware. Die können Sie kaufen oder mieten. Normalerweise tun Sie das bei einem Hosting Provider. Der Hosting Provider mietet dafür Platz im Rechenzentrum und Konnektivität bei Netzbetreibern (Carrier) an. Netzbetreiber verlegen und vermieten die Kabel. Der Betreiber eines Rechenzentrums baut und vermietet die Räumlichkeiten an den Hosting Provider. Alle drei können prinzipiell meine Daten lesen.

  • Der Hosting Anbieter von Virtual Private Servern und Dedicated Servern kann auf diese Server auch ohne root Passwort zugreifen.
  • Ein Server ohne root Zugang fuer Dritte bedeutet, eine eigene Maschine in ein Rechenzentrum zu bringen (colocation). Diese Hardware ist dann allerdings jederzeit physikalisch greifbar für Dritte.
  • Die einzige “sichere” Möglichkeit ist ein Server, der von zuhause, oder einem Ort betrieben wird, über den Sie Kontrolle haben.

Jede Variante hat Vor- und Nachteile.

Ich werde über meine Erfahrungen berichten.

Data European Projects

How to collect, structure and publish data?

This workshop provides an outline about the collection, structure and publishing of data.

You will learn about different types of data, data models and publishing methods.

The workshop is for everyone who collects data, needs to structure and share them.

Data, Information and Knowledge

Data is the lowest level of abstraction, information is the next level, and finally, knowledge is the highest level among all three.

Data on its own carries no meaning. For data to become information, it must be interpreted and take on a meaning. For example, the height of Mt. Everest is generally considered as “data”, a book on Mt. Everest geological characteristics may be considered as “information”, and a report containing practical information on the best way to reach Mt. Everest’s peak may be considered as “knowledge”.

To create information and knowledge based on data it’s necessary to know

  • about the different manifestations of data
  • various structuring processes
  • ways to collect and enrich the data
  • ways to publish the enriched data


In computer science, data is information in a form suitable for use with a computer. Data can be divided by human readable data like text and binary data like images, audio and video data.


A written text is the representation of a spoken language by means of a writing system. A writing system can be a pencil, a piece of paper and a defined set of characters (alphabet). In computing, its nearly the same. The pencil is the keyboard where you can choose characters from a character set. All characters for most existing languages are defined in Unicode Transformation Formats (UTF).

Unicode provides a unique number for every character,

  • no matter what the platform,
  • no matter what the program,
  • no matter what the language.

Today (April 2012) UTF-8 ist widely used in the internet and every email client and browser is able to display its 65,536 characters. More than 900,000 characters can be found in UTF-16 which is used for most operating systems like Windows, Apple OSX and Linux.

Human Readable Plain Text

Plain text is used as content of an ordinary file readable as textual material without much processing. On a website like Facebook it is used in text area fields (Figure 1).

Plain Text

Figure 1: Plain text in a text area field of a website

Formatted Text

Because plain text consists only of characters and so called white spaces, there is a need to give people the possibility to format text. Formats can be bold, italic, strikethrough, underlined, superscript, subscript, different sizes of headlines and uncountable more possibilities.

To format text, you usually mark the part of text that you want to add the special format and then you have to click on a button or press a special keyboard combination (e.g. ctrl-b for bold). You a need word processor with special features to be able to format text.

‘Binary’ Word Processor

A well known word processor is Microsoft Word. Microsoft created an application that offers the possibility to format and structure text in the year 1985 and it is now part of the Microsoft Office package (Figure 2).

The advantage is and was the ease of use.

The disadvantages are, that you have to buy a license for the right to use it, have to install it on your PC and that the .doc format is not human readable in case there is no Microsoft Word installed on your device. Later versions of Microsoft Word introduced a machine readable, so called XML format, but this is not widely used. The .doc files are still very common.

MS Word 2007

Figure 2: Microsoft Word

Microsoft Word is used for creating and editing documents on a PC.

‘Human Readable’ Word Processor

The alternatives to Microsoft Word are and LibreOffice. Both are Free and Open Source Software (FOSS) which means, that you don’t have to pay for it and that you are free to change the software (if there is a need and you are able to do so).

The files (e.g .odt) are compressed text files.

Note: To see the human readable content without using, you have to rename the file example.odt to Than you have to extract the file (usually with a right mouse click) and afterwards you’ll see a folder with many files inside (Figure 3)

Figure 3: Content of an example.odt file and LibreOffice are used for creating and editing documents on a PC.

Formatting Text using HTML

In the internet, the markup language HTML (Hypertext Markup Language) is used to format texts.

If you want to have part of your text in bold, italic, strikethrough, underlined, superscript, subscript you have to do it with the help of HTML tags. There is always a “tag” in front of the text you want to format and a tag behind that text. The browser will display (render) it in a nice way:

  • HTML: <strong>bold</strong> = result in a browser: bold
  • HTML: <em>bold</em> = result in a browser: italic

It is of course not very comfortable to work like this and as a solution so called What You See Is What You Get (WYSIWG) editors are used. These editors are usually FOSS software like TinyMCE (Figure 4).


Figure 4: WYSIWYG editor on a website

These editors have sometimes the possibility to see the underlying HTML code by clicking a button called Source or HTML (Figure 5).

TinyMCE Source Code

Figure 5: WYSIWYG editor in Source mode with HTML display

The switching between the WYSIWYG mode and HTML mode is not possible in word processors and most user don’t want that possibility as it confuses them.

The general handling of the formatting in a HTML WYSIWYG editor is nearly the same compared to word processors. The main difference is, that you do not save the content as a file on your PC. The content is saved in the database of the server where the website is hosted.

The advantages are, that it is not necessary to buy and install something on your PC and the fact that everything is human readable, even when you access the website on your mobile device.

WYSIWYG editors are used on websites to create content in a particular field.

A hybrid word processor

Services like Google Docs allows you to combine the advantages of both worlds. The service is web/browser based. You are writing by using a WYSIWYG editor that it allows you to create and edit documents online while collaborating in real-time with other users. You can store the documents at Google’s server cloud or can save (download) it in various binary and human readable formats.

Web based word processors are used like binary word processors to create and store documents.


An image is something that you are drawing on a piece of paper, take with a camera or as a screenshot from your PC or mobile device. If the image is on paper, it is possible to scan it. In computing an image is a big lump of bits and bytes which are usually stored in a file. The information is binary and it is not possible to display an image without using a special application like an image viewer or a web browser.

Most images consists of dots, so called pixels.

A pixel is a little dot on a screen with a related color. If you come closer to your screen you’ll see something like this:


The more pixel an image consists of, the better is the resolution of your image. That means the better and sharper it is displayed. The more colors you use, the better a photo is looking.

But how to display and store it?

The data of images is not human readable.

Like word processors for text we have image processors to create and edit images. Examples are Adobe Photoshop or the FOSS version Gimp. Images are usually stored in files.

In most operating systems, image file viewers are included in the file explorer. Sometimes, these viewers are able to edit the image, e.g. change the file size, the contrast and reduce red eyes on photos.

The formats .jpg, .png and .gif are widely used in the internet. Let’s have a closer look

What does image resolution mean?

Resolution refers to the number of pixels in an image. For example, an image that is 2048 pixels wide and 1536 pixels high (2048 x 1536) contains 3,145,728 pixels (or 3.1 Megapixels).  You could call it a 2048 x 1536 or a 3.1 Megapixel image.

How is the image resolution related to the resolution of your computer monitor?

Your computer screen is able to display different resolutions. Usually, you can configure your resolutions in your operating system.

The larger the screen, the larger you likely have your screen resolution set.

If your monitor is set to 1024 x 768 and you open up an image that is 640 x 480, it will only fill up a part of your screen. If you open an image that is 2048 x 1536 (3.1 megapixels), then you will find yourself moving the slider bar around to see all the different parts of the image. It just won’t fit.

What does image quality mean?

In addition to image size, the quality of the image can also be manipulated. By using compression, you can keep the physical size of the image the same and reduce the amount of disk space required to store it ,but you will be sacrificing the quality of the image.

Joint Photographic Experts Group (JPEG)

The jpg format was invented by the Joint Photographics Experts Group. If you take a photo with a digital camera it is usually stored in a compressed .jpg format (Figure 6).


Figure 6: Typical .jpg photo

The .jpg format is used for publishing photos on websites

Graphics Interchange Format (GIF)

The Graphics Interchange Format (GIF) is an image format that was introduced by CompuServe in 1987 and has since come into widespread usage on the World Wide Web due to its wide support and portability. Controversy over the licensing agreement between the patent holder, Unisys, and CompuServe in 1994 spurred the development of the Portable Network Graphics (PNG) standard; since then, all the relevant patents have expired.

  • GIFs are suitable for sharp-edged line art (such as logos) with a limited number of colors. This takes advantage of the format’s lossless compression, which favors flat areas of uniform color with well defined edges.
  • GIFs can be used for small animations and low-resolution film clips.
  • In view of the general limitation on the GIF image palette to 256 colors, it is not usually used as a format for digital photography.

Portable Network Graphics (PNG)

Portable Network Graphics (PNG) is an image format that employs lossless data compression. PNG was created to improve upon and replace GIF (Graphics Interchange Format) as an image-file format not requiring a patent license. The initials PNG can also be interpreted as a recursive initials for “PNG’s Not GIF”.

PNG was designed for transferring images on the Internet, not for professional-quality print graphics.

Comparison to Graphics Interchange Format (GIF)
  • On small images, GIF can achieve greater compression than PNG (see the section on filesize, below).
  • On most images, except for the above cases, GIF will be bigger than PNG.
  • PNG gives a much wider range of transparency options than GIF (Figure 6)
  • Whereas GIF is limited to 256 colors, PNG gives a much wider range of color depths (millions of colors), allowing for greater color precision, smoother fades, etc.
  • GIF intrinsically supports animated images. PNG supports animation only via unofficial extensions.
Comparison to JPEG
  • JPEG format can produce a smaller file than PNG for photographic (and photo-like) images, since JPEG uses a lossy encoding method specifically designed for photographic image data, which is typically dominated by soft, low-contrast transitions, and an amount of noise or similar irregular structures. Using PNG instead of a high-quality JPEG for such images would result in a large increase in file size with negligible gain in quality. By contrast, when storing images that contain text, line art, or graphics – images with sharp transitions and large areas of solid color – the PNG format can compress image data more than JPEG can, and without the noticeable visual artifacts which JPEG produces around high-contrast areas. Where an image contains both sharp transitions and photographic parts a choice must be made between the two effects. JPEG does not support transparency.
  • The PNG specification does not include a standard for embedded Exif image data from sources such as digital cameras.


Figure 7: transparent .png file


Sound recording and reproduction is an electrical or mechanical inscription and re-creation of sound waves, such as spoken voice, singing, instrumental music, or sound effects.

Digital recording stores audio as a series of binary numbers representing samples of the amplitude of the audio signal at equal time intervals, at a sample rate high enough to convey all sounds capable of being heard. A digital audio signal must be reconverted to analog form during playback before it is applied to a loudspeaker or ear phones.

MP3 (MPEG-1 or MPEG-2 Audio Layer III) is a patented digital audio encoding format using a form of lossy data compression. It is a common audio format for consumer audio storage, as well as a de facto standard of digital audio compression for the transfer and playback of music on digital audio players.

The data is stored in files with the extension .mp3.

These files can be edited with digital audio editors like Audacity.


A video is a mixture of everything mentioned above. It is the technology of electronically capturing, recording, processing, storing, transmitting, and reconstructing a sequence of still images representing scenes in motion.

There are many different digital encoding formats. The most common is MPEG-4 Part 14 or MP4. It is most commonly used to store digital video and digital audio streams, but can also be used to store other data such as subtitles and still images. It allows streaming over the Internet. The only official filename extension for MPEG-4 Part 14 files is .mp4.

Structuring Process and Databases

To structure all this data you need to understand the relations between the different types of data. Data about a house can be:

  • a written history
  • photos
  • ground plan
  • an interview with the owner
  • a video about the house


Usually the analogy with an object is a good way to start. A “real” girl or a boy is an object.

It has properties, like size, name, hair color, etc. It is possible to define a kind of abstract description of an object called “girl” or “boy”. This abstract description can be called class, content type, structure, or abstract description (Figure 8).


Figure 8: abstract and real objects

The important part here is that each boy or girl can be described in an abstract way and the “real” boy or girl has values for all the abstract properties

The same works for an objects like a house or a car. It works even for a news article.


All of these objects have relations to each other.

  • A “boy” live in a “house”,
  • the house is build in a city
  • other houses are in the city too,
  • the city is an object in a region/country too

Databases and data models

Any type of data can represent an object. To store the data technically, usually a database is used. A database consists of structures, data and ways to add, edit, select and delete data.

The structures are tied to another, related.

Common relational database systems use tables. A table is a kind of objects structure. Each properties is represented by a field, HId is the identification number of a house (HouseId) (Table 1).

Square metersnumerical6

Table 1: Structure of the a house table

Each row in the table represents one “real” house (Table 2).

HIdTitleCitySquare meters
1Village houseFitou200
2Passive houseFreiburg120
3Modern style houseIstanbul400

Table 2: House objects as rows in the houses table


Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships. Our last example with one table was a simple representation.

Let’s normalize a “create a City” table structure too, thus we would have two tables: a house table and a city table. The city table could have fields for all the “City properties” like country, inhabitants, size, etc.

The house table could have a related field to an existing City. In this case the city field would not contain anymore the name of the city. It would contain the Cid (City Id).

Example in tables 3-6


Table 3: Structure of the the City table

Square metersnumerical6

Table 4: Structure of the the House table

Each row in the City table represents one “real” City.


Table 5: City objects as rows in the City table

Each row in the table represents one “real” house (Table 6).

HIdTitleCIdSquare meters
1Village house1200
2Passive house2120
3Modern style house1400

Table 6: House objects as rows in the Houses table

User interface

The database itself has no user interface. It is just a place to store data. A possible user interface to add data to a database is web based form, like phpMyAdmin.

Retrieving Data

To retrieve data it is necessary to describe what you want to have. An example could be: “Give me the data of all houses located in Istanbul”.

Because there is no user interface it is a bit complicate to tell your database what you want to achieve :) A common way is to use the Structured Query Language (SQL).

“Give me the data of all houses located in Istanbul” would look like this in SQL

FROM  house JOIN city
ON =
WHERE = ‘Istanbul’;

When we send this SQL statement to our database, it would send us the desired data.

Collecting Process

When collecting data, it is necessary to decide in advance how you will use them after, how you want to make them available and publish. Usually, in the time of internet, data are stored in databases. Therefore, define the data structure, collection methods and target data before you start collecting data.

The structure of the defined tables are a blueprint for creating forms. These forms can be available on paper or on a website.


The data collection can be a tricky process. Depending on the data that has to be collected there are different ways to collect it (Text, Audio, Photos, Videos).

Paper form

Only text and sketches can be collected by using paper based forms. Photos (nowadays), sound and video are not possible to collect in a paper based process. The data, collected using a paper based form, needs to be submitted to the database in a separate step.

Web based form

The data which was collected using web based forms is automatically added to the database after submitting the form. Web based form can contain several validation methods to avoid inconsistent data. It is possible too to collect audio, video, spoken and written text using a mobile device (Figure 9).

Mobile first

Figure 9: Collecting data and author text using a mobile device

Present Findings

To present the findings one could use

an interactive map


Figure 10: Interactive Google Map

a table


Figure 11: Table

a grid


Figure 12: Grid

statistics of data


Figure 13: Statistics

Publishing Process

The publishing process of data was and still is discussed by scientists around the world. For the last 500 years books printed on paper were the only method to publish data. The whole concept of reading and publishing is still today based on the idea to print content on paper. But this is changing rapidly!

With the advent of computers and the internet, scientists and researchers were looking for ways to use and share documents and data. Nearly 20 years ago, the world wide web was “invented” and even today, everything that is available on the internet consists of HTML files that include other files such as .css, .mp3, .mp4, .jpg, .png, .gif and even more.

The World Wide Web (WWW)

The world wide web is in common use since 15 years. It consists of web browser, web servers, markup languages and the availability of internet access.

Hypertext Markup Language (HTML)

HTML was specified by Tim Berners Lee in the year 1990. It was the idea to combine different types of files and text in a HTML document and link to other HTML document in a world wide web.

HTML elements still today form the building blocks of all websites. HTML allows images and objects to be embedded and can be used to create interactive forms. It provides a means to create structured documents by denoting structural semantics for text such as headings, paragraphs, lists, links, quotes and other items.

HTML is usually stored in .html or .htm files.

Example of an HTML tag for a headline: <h1>Headline </h1>

Cascading Style Sheets (CSS)

To improve web presentation capabilities a language called CSS was published in 1996.

CSS is stored in .css files.

Example of a CSS statement that colors a headline: h1 { color:red;}

Web server

A web server is a service running on a computer that stores and delivers .html and all the other files. The most common server is called Apache.

Web browser

A web browser is the software that runs on your device. Most devices (computers) today are  24/7 connected to the world wide web. By typing a url like into your browser you are asking one of Google’s servers to deliver a .html page to your device. The browser read the files and renders it to a webpage as you know it. The currently most used web browsers are Internet Explorer, Chrome, Firefox and Safari – Usage share of web browsers (Figure 14).

Wikimedia Browser Share

Figure 14: Wikimedia usage share of web browser

Content Management Systems

The workflow of the described publishing system of storing data in static HTML pages makes publishing and sharing of documents in general possible. At that time it was an enormous progress compared to paper based books. It changed and still changes the world!

But the pages were not interactive!

For that reason, scripting languages like PHP (Personal Home Page) were invented in the middle of the 90ties. With PHP it was possible to generate HTML pages on the fly, based on data that came from database queries and files from different ressources. Database server already exists at that time.

Most web servers, database servers and PHP itself were and are Free and Open Source Software, and so are the Content Management Systems that are based on these foundations.

The most common CMS’s today are WordPress, Joomla! and Drupal. The idea behind is to give people the possibility to publish content in an easy way by using a web browser to add and edit data by using a web browser.

Text based content is today widely created by the help of CMS. All newspaper websites are using CMS’s to create and edit their content.

But still today only 30% of all websites of the world uses CMS’s, 70 % of all websites are still made in the “old fashioned” static way of writing HTML code that was invented 20 years ago.

Web Applications

The “next generation” CMS’s are called Web Applications. They are as powerful as applications that have to be installed on your device like Microsoft Word or

As an example, have a look on Google Docs, a web based Office System.

Platforms like Flickr offers browser based image editing, YouTube offers browser based video recording and editing and more and more services try to lower the barriers of collecting, recording, editing and storing data. The data is still stored in databases but another possibility of storing data becomes additionally more and more common.

The cloud!

The marketing buzzword “Cloud” is a mixture of the “good old” hard disk and a kind of database service. Cloud can be Software as a Service (SaaS), Infrastructure as a Service (IssS) and Platform as a Service (PaaS). The most important thing to know about “the cloud” is that it is much easier to use it compared to the hard disk and that you don’t have to deal with physical “things”.

If your device is connected 24/7 to the internet, all of your data are available at anytime, everywhere through a cloud system.

App Ecosystems

A parallel development was introduced in 2007 with Apple’s iPhone. Little Applications (Apps), that are not browser based, were available in so called App Stores. The apps are able to use the camera, the microphone, the GPS, and all the other features of the device for collecting and publishing data. The store concept allowed developers to earn money by writing apps.


In the app ecosystems, ebooks are playing a more and more important role. They consist of HTML/CSS files and are packed in a format called EPUB. It is possible to sell them in app stores and they look like a book printed on paper. Inside they are HTML based like every website.

In 2012 the biggest ecosystems are Amazon Kindle and Apple iBookstore

Paper based books

Traditional books will still play a role in the future, but it will decrease. They are not linkable to other resources, and not interactive. It is expensive, complicate to store and deliver them.

The current role of CMS’s

Content Management Systems are still used to combine all these data. They retrieve data from file systems or databases in the cloud or elsewhere and provide the data to web browsers or apps.


Since the advent of the world wide web, the publishing process of content has changed massively. In former times it needed a publishing house to create and distribute your book. Today it is possible to create, write and publish paperback books, content on interactive websites and content for mobile phones totally on your own.

And thus the importance to know how to use the world wide web for publishing data has been increased constantly.