07 giugno 2008

SoundIndex. La "Top 1000" del supercomputer

Che noia. Ecco un altro software sul sito della BBC. Voglio dire, ecco un altro servizio intelligente, ben progettato, ben realizzato, ben interfacciato, bello da vedere e da usare e "a great value for your money" considerando che anche questo exploit viene finanziato con i soldi dei contribuenti. Questa volta parliamo di SoundIndex, un "crawler" molto particolare che esplora i principali depositi musicali di Internet (iTunes, Last.fm, Bebo, YouTube, MySpace ecc, ecc) e cerca di "capire" i gusti musicali del momento, generando da solo una hitpared virtuale di band e cantanti più ascoltati, visualizzandola poi attraverso una accattivante interfaccia sviluppata da NovaRising specialista britannica di IPTV. Non una hitparade qualsiasi ma una lista delle Top Thousand, mille e non più mille tra le musiche preferite dai giovani costruita con algoritmi software, una rivoluzione rispetto alle storiche "chart" di dischi messe insieme contando il numero di pezzi venduti. SoundIndex, che è ancora in beta e finora ha ricevuto pochissima visibilità fuori dalla blogosfera, va a tastare il polso della musica che piace, non quella che viene acquistata (figuriamoci con quali condizionamenti). Ovviamente, associata c'è anche una componente social, MyIndex, che permette di affinare le proprie classifiche personali e di scambiare pareri con la comunità. Ma vi rendete conto?
Il software di esplorazione gira su Semantic Super Computing di IBM. Ecco la spiegazione fornita dalla producer del servizio Beth Garrod, che come se non bastasse è uno di quei pezzi di figliola con i quali vorresti fare tutto fuorché parlare di software.

The Sound Index is a massive index of the hottest bands and tracks that are being talked about on the internet right now.
Every six hours the Sound Index crawls some of the biggest music sites on the internet - Bebo, MySpace, Last.FM, iTunes, Google and YouTube - to find out what people are writing about, listening to, watching, downloading and logging on to. It then counts and analyses this data to make an instant list of the most popular 1000 artists and tracks on the web. The more blog mentions, comments, plays, downloads and profile views an artist or track has, the higher up the Sound Index they are. So, the Sound Index is a music buzz index controlled entirely by the public.
As we know which artists are being enjoyed by which people, not only can you can filter the Sound Index to reflect the sites you use the most, or your favourite music styles, you can also tailor it to represent the views of people of different ages and locations.
All the demographic data we collect and use is entirely anonymous, so we can never attribute any age or location data to any specific person. So, if you are a user of any of these sites, you don't need to worry that the Sound Index has any information about you. However, if you are concerned, or want more details, please contact us at soundindex (at) bbc (dot) co (dot) uk.
You can also watch some of the newest bands hottest new tracks on Sound, the music show on BBC every Saturday.
The Sound Index is currently in a public service beta phase, with data sources being finalised. During this beta phase we shall also be implementing and tweaking the data currently in the Index, and investigating a weighting system, to allow the more active forms of interaction to contribute more heavily to the Index.
IBM's Semantic Super Computing is used to crawl and analyse our partners' sites. The Sound Index is then produced and rendered by NovaRising.

Ieri sull'Internet Blog della BBC è apparso questo post del responsabile tecnologico di SoundSwitch, Geoffrey Goodwin, che parla diffusamente dei principi ispiratori dell'algoritmo usato per creare questo indice di popolarità, basato anche sui commenti dei partecipanti ai siti di social networking.
Io sono sempre più esterrefatto e comincio a domandarmi se un giorno la BBC riuscirà a produrre qualcosa di che proprio non funziona.

The Sound Index is not meant to be a definitive chart (like a sales chart). Rather it's a gauge of who and what is currently driving conversation and interaction about music online. It's a great tool for music discovery, and to find out who's currently hot in the music world of teenagers. However, we have taken steps to ensure that our data collection is as accurate as possible, and have implemented an algorithm to help us create the most editorially relevant and robust Index.
The Sound Index is currently in a 4 month beta stage, so that (among other things) the technical, editorial and cost implications of various algorithm options can be assessed.
After viewing an Index of based on the raw data we felt an algorithm was needed to allow all the sources to contribute to the Index, and for all forms on activity on the internet - plays, comments and downloads - to affect the rankings. Without an algorithm the large volumes of the more dominant forms of interaction - mainly plays and downloads - drowned out the smaller numbers of comments, which we felt were important to reflect in the Index.
Therefore, my team has developed the following algorithm, which I feel gives an editorially relevant and justified chart, without any bespoke manipulation or input, meaning that the Sound Index can be viewed as an accurate gauge of online buzz.
For each type of interaction (play, comment, download) all the data for each artist for each individual site has been added. Then each artist is given a score depending on how popular they are on each site. This score is directly related to how many artists are on that site. For example, if there were 200 artists from MySpace, the number 1 artist (with the most counts) would have a score of 200, whereas if they had the least, they would have a score of 1.
We didn't want sites with massive amounts of only one type of data totally dominate the Sound Index. So each type of data - play, download, comment - is limited to make up a set proportion of each artist's popularity. This is determined by how many different sources there are for each type of data. So, if there were ten sources in total made up of five play counts, three download and two comments, we would multiply the ranks from each source in the following way- 5/10 for counts, 3/10 for downloads and 2/10 for comments.
These figures from each type of activity from each site for each artist multiplied by this fixed proportion are then added together, to give a total buzz score, which is used to create the Sound Index. The same method is applied separately for individual tracks. We have also put in processes with our data collection methods to reduce the impact of gaming. Our partner IBM have implemented spam filtering, porn filtering, multiple post detection and MusicBrainz verification to help the data be as clean as possible.
The Sound Index is a project based on trialing new technology. I think that in its current form it's been successful in achieving an exciting way of discovering which bands and artists are creating the most buzz. We are not using it to define any charts or create any definitive lists. Anything editorial around the Sound Index should not use it as an absolute measure - it's a gauge of what is hot. It's a great example of innovation and collaboration with major music sites. We're still learning about what the Sound Index can do.

Nessun commento: