{"id":14832,"date":"2021-12-22T10:32:19","date_gmt":"2021-12-22T08:32:19","guid":{"rendered":"https:\/\/www.dase-analytics.com\/blog\/?p=14832"},"modified":"2021-12-22T11:31:54","modified_gmt":"2021-12-22T09:31:54","slug":"velka-analyza-vianocnych-pesniciek","status":"publish","type":"post","link":"https:\/\/www.dase-analytics.com\/blog\/sk\/velka-analyza-vianocnych-pesniciek\/","title":{"rendered":"Ve\u013ek\u00e1 anal\u00fdza viano\u010dn\u00fdch pesni\u010diek"},"content":{"rendered":"<p>\u010co keby sme sa pri vytv\u00e1ran\u00ed viano\u010dn\u00fdch playlistov neriadili na\u0161imi preferenciami, ale d\u00e1tami? Rozhodol som sa, \u017ee sa na to sk\u00fasim pozrie\u0165, hoci v\u00fdsledok bol neist\u00fd. A \u00e1no, pr\u00e1ca na tomto projekte znamenala vyh\u013ead\u00e1vanie a po\u010d\u00favanie ve\u013ek\u00e9ho mno\u017estva viano\u010dn\u00fdch skladieb. A \u00e1no, to znamen\u00e1, \u017ee moje personalizovan\u00e9 odpor\u00fa\u010dania v Spotify s\u00fa t\u00fdmto experimentovan\u00edm zna\u010dne ovplyvnen\u00e9. Napriek tomu som sa obetoval pre vy\u0161\u0161ie dobro, aby ste si nemuseli prejs\u0165 rovnak\u00fdmi \u00fatrapami.\u00a0<strong>Vytvoril som pre v\u00e1s 6 data-driven viano\u010dn\u00fdch playlistov a z\u00edskal p\u00e1r zauj\u00edmav\u00fdch \u0161tatist\u00edk.<\/strong><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-14922 size-full\" src=\"https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/pesnicky.jpg\" alt=\"\" width=\"1200\" height=\"628\" srcset=\"https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/pesnicky.jpg 1200w, https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/pesnicky-300x157.jpg 300w, https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/pesnicky-1024x536.jpg 1024w, https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/pesnicky-600x314.jpg 600w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" \/><\/p>\n<p>Za\u010diatok\u00a0\u010dl\u00e1nku o zbere a vyhodnocovan\u00ed d\u00e1t je pre hardcore analytikov. Nesk\u00f4r n\u00e1jdete e\u0161te viac zauj\u00edmav\u00fdch \u0161tatist\u00edk. A koho ni\u010d z toho nezauj\u00edma, sa m\u00f4\u017ee rovno prescrollova\u0165 na koniec \u010dl\u00e1nku k playlistom. \ud83d\ude09<\/p>\n<h2>Z\u00e1zrak sa nestal<\/h2>\n<p>V k\u00fatiku du\u0161e som d\u00fafal, \u017ee tento rok pre m\u0148a Vianoce pr\u00eddu o \u010dosi sk\u00f4r a \u017ee n\u00e1jdem dataset viano\u010dn\u00fdch pesni\u010diek vhodn\u00fd na anal\u00fdzu. Nemysl\u00edm si, \u017ee som mal ve\u013ek\u00e9 n\u00e1roky. Sta\u010dilo by, aby dataset obsahoval n\u00e1zov skladby, interpreta a d\u00e1tum vydania (popr\u00edpade ak\u00e9ko\u013evek \u010fal\u0161ie zauj\u00edmav\u00e9 \u00fadaje, ktor\u00e9 by sa dali pri anal\u00fdze vyu\u017ei\u0165). Viano\u010dn\u00fd z\u00e1zrak sa v\u0161ak nestal a ni\u010d schopn\u00e9 som nena\u0161iel (teda ak sa mi nechcelo s\u0165ahova\u0165 a n\u00e1sledne preh\u013ead\u00e1va\u0165 <a href=\"http:\/\/millionsongdataset.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">300GB d\u00e1t.<\/a>\u00a0Spoiler: nie, nechcelo \ud83d\ude09 ). <strong>Po kr\u00e1tkom Googlen\u00ed som na\u0161\u0165astie natrafil na <a href=\"https:\/\/developer.spotify.com\/documentation\/web-api\/\" target=\"_blank\" rel=\"noopener noreferrer\">Spotify API<\/a><\/strong>. Navy\u0161e som zistil, \u017ee pre Spotify API existuje aj <a href=\"https:\/\/www.rcharlie.com\/spotifyr\/\" target=\"_blank\" rel=\"noopener noreferrer\">R wrapper<\/a>, v\u010faka ktor\u00e9mu je mo\u017en\u00e9 d\u00e1ta s\u0165ahova\u0165 a \u010falej s nimi pracova\u0165 v R-ku.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-full wp-image-14917\" src=\"https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/giphy-2.gif\" alt=\"Smutn\u00fd santa :(\" width=\"480\" height=\"352\" \/><\/p>\n<p>Ke\u010f\u017ee moje sk\u00fasenosti s <a href=\"https:\/\/www.dase-analytics.com\/blog\/vyraz\/r-programovaci-jazyk\/\" target=\"_blank\" rel=\"noopener noreferrer\">R-kom<\/a> s\u00fa na bode mrazu (pun intended), mus\u00edm prizna\u0165, \u017ee z\u00edskanie d\u00e1t a ich analyzovanie bolo n\u00e1ro\u010dn\u00e9. Rozhodol som sa, \u017ee viano\u010dn\u00e9 pesni\u010dky z\u00edskam zo \u0161tyroch najpo\u010d\u00favanej\u0161\u00edch viano\u010dn\u00fdch playlistov &#8211; &#8222;Christmas Hits&#8220;, &#8222;Christmas Classics&#8220;, &#8222;Christmas Pop&#8220; a &#8222;Christmas Is Coming&#8220;. Prid\u00e1vanie \u010fal\u0161\u00edch playlistov mi u\u017e ned\u00e1valo ve\u013ek\u00fd zmysel, nako\u013eko by aj tak v\u00e4\u010d\u0161inou obsahovali tie ist\u00e9 skladby. V tom hor\u0161om pr\u00edpade dokonca t\u00fa ist\u00fa skladbu naspievan\u00fa 42 r\u00f4znymi interpretmi, v 13 remixoch, vydan\u00fa na 3 r\u00f4znych albumoch.<\/p>\n<p>Zistil som, \u017ee cez Spotify API sa mo\u017eem dosta\u0165 k naozaj ve\u013ek\u00e9mu mno\u017estvu d\u00e1t. Ka\u017ed\u00e1 skladba obsahuje z\u00e1kladn\u00e9 \u00fadaje ako n\u00e1zov a id skladby, meno interpreta, rok vydania, d\u013a\u017eka skladby (v ms) a album, na ktorom vy\u0161la. Okrem t\u00fdchto z\u00e1kladn\u00fdch inform\u00e1ci\u00ed, som cez Spotify API o ka\u017edej skladbe z\u00edskal aj \u010fal\u0161ie zauj\u00edmav\u00e9 metriky (ka\u017ed\u00e1 metrika m\u00f4\u017ee nadob\u00fada\u0165 hodnoty od 0 do 1):<\/p>\n<ul>\n<li><strong>Danceability<\/strong> &#8211; t\u00e1to metrika ur\u010duje ako dobre sa na skladbu tancuje. Algoritmus na ur\u010denie tejto hodnoty zoh\u013ead\u0148uje viacero faktorov. Napr. tempo, st\u00e1los\u0165 rytmu, pravidelnos\u0165 skladby at\u010f. Na pesni\u010dky som v\u0161ak tancova\u0165 nesk\u00fa\u0161al, tak\u017ee som presnos\u0165 tejto metriky empiricky neoveril. Mo\u017eno \u00faloha pre v\u00e1s. \ud83d\ude09<\/li>\n<li><strong>Energy &#8211;\u00a0<\/strong>typicky energick\u00e9 skladby s\u00fa r\u00fdchle a hlasn\u00e9. Pod\u013ea Spotify m\u00e1 vysok\u00fa energiu napr\u00edklad Death Metal, zatia\u013e \u010do Prelude od Bacha m\u00e1 n\u00edzku energiu.<\/li>\n<li><strong>Loudness\u00a0<\/strong>&#8211; celkov\u00e1 hlasitos\u0165 skladby.<\/li>\n<li><strong>Speechineess <\/strong>&#8211; metrika predstavuje, ko\u013eko hovoren\u00e9ho slova skladba obsahuje. \u010c\u00edm vy\u0161\u0161ia hodnota, t\u00fdm viac hovoren\u00e9ho slova (ak je hodnota v\u00e4\u010d\u0161ia ako 0.6 ide s ve\u013ekou pravdepodobnos\u0165ou o podcast).<\/li>\n<li><strong>Acousticness\u00a0<\/strong>&#8211; \u010d\u00edm vy\u0161\u0161ia hodnota, t\u00fdm vy\u0161\u0161ia pravdepodobnos\u0165, \u017ee skladba je v akustickom preveden\u00ed.<\/li>\n<li><strong>Valence\u00a0<\/strong>&#8211; zauj\u00edmav\u00e1 metrika, ktor\u00e1 meria pozitivitu skladby. Skladby s vysokou hodnotou valence (bli\u017e\u0161ie k 1) s\u00fa pozit\u00edvne, \u0161\u0165astn\u00e9 a\u017e euforick\u00e9 skladby, naopak skladby s n\u00edzkou valence (bli\u017e\u0161ie k 0) znej\u00fa viac negat\u00edvne (smutne, depres\u00edvne, nahnevane).<\/li>\n<li><strong>Tempo\u00a0<\/strong>&#8211; tempo skladby (zalo\u017een\u00e9 na BPM &#8211; beats per minute).<\/li>\n<li><strong>Popularity<\/strong> &#8211; popularita skladby, hodnota od 0 po 100, kde 100 predstavuje moment\u00e1lne najpopul\u00e1rnej\u0161iu skladbu. T\u00e1to metrika je \u00fazko prepojen\u00e1 s aktu\u00e1lnou po\u010d\u00favanos\u0165ou skladby. Skladba, ktor\u00e1 je aktu\u00e1lne po\u010d\u00favan\u00e1 viac bude ma\u0165 vy\u0161\u0161iu popularitu.<\/li>\n<\/ul>\n<p>Tieto metriky s\u00fa dos\u0165 black-box (nevieme presne ako s\u00fa po\u010d\u00edtan\u00e9), no budeme predpoklada\u0165, \u017ee \u013eudia v Spotify vedia, \u010do robia. \ud83d\ude42<\/p>\n<h2>V\u00fdber pesni\u010diek<\/h2>\n<p>Po nazbieran\u00ed d\u00e1t o v\u0161etk\u00fdch pesni\u010dk\u00e1ch nasledovalo \u010distenie datasetu. Prv\u00fdm krokom bolo odstr\u00e1nenie duplicitn\u00fdch pesni\u010diek. Najsk\u00f4r som odstr\u00e1nil v\u0161etky riadky s rovnak\u00fdm track.id. Tento postup v\u0161ak neodstr\u00e1nil rovnak\u00e9 skladby, ktor\u00e9 sa nach\u00e1dzali na in\u00fdch albumoch (tieto skladby maj\u00fa r\u00f4zne track.id, hoci sa jedn\u00e1 o rovnak\u00fa pesni\u010dku od rovnak\u00e9ho interpreta). Tento postup taktie\u017e neodstr\u00e1nil r\u00f4zne &#8218;Remastered&#8216; verzie, alebo covery (v\u017edy som sa sna\u017eil ponecha\u0165 origin\u00e1lnu verziu). A ke\u010f\u017ee mi nenapadol spo\u013eahlivej\u0161\u00ed sp\u00f4sob, ako tieto duplicity odstr\u00e1ni\u0165 skriptom, pustil som sa do toho ru\u010dne. Navy\u0161e, niektor\u00e9 skladby (najm\u00e4 tie star\u0161ie) mali priraden\u00fd nespr\u00e1vny rok vydania, tak som opravil aj ten. <a href=\"https:\/\/docs.google.com\/spreadsheets\/d\/1Y922gbPVyDld0u5NAAagPoaPefmuHz55JZILEFrYWuo\/edit?usp=sharing\" target=\"_blank\" rel=\"noopener noreferrer\">Kone\u010dn\u00fd dataset som vyexportoval do Google Sheet<\/a>.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-14918\" src=\"https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/giphy-3.gif\" alt=\"Dataset kone\u010dne hotov\u00fd!\" width=\"460\" height=\"353\" \/><\/p>\n<h2>Zauj\u00edmav\u00e9 \u0161tatistiky<\/h2>\n<p><strong>Po pre\u010disten\u00ed datasetu v \u0148om ostalo 118 skladieb.<\/strong> Obsahuje tie najv\u00e4\u010d\u0161ie bangre od Franka Sinatru \u010di Justina Biebera, s rokom vydania za\u010d\u00ednaj\u00facim rokom 1942, kon\u010diac v roku 2019. Najviac skladieb v datasete m\u00e1:<\/p>\n<ul>\n<li><strong>Bing Crosby<\/strong> (9),<\/li>\n<li><strong>Mariah Carey<\/strong> (6)<\/li>\n<li>a prekvapivo <strong>Sia<\/strong> (5 &#8211; vedeli ste, \u017ee Sia vydala minul\u00fd rok viano\u010dn\u00fd album? Ja tie\u017e nie.).<\/li>\n<\/ul>\n<p>Nemus\u00edte by\u0165 zrovna data scientist, aby ste z nasleduj\u00faceho grafu vy\u010d\u00edtali, \u017ee v poslednej dek\u00e1de sa s viano\u010dn\u00fdmi skladbami akosi roztrhlo vrece.<\/p>\n<p><a href=\"https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/release-distribution-bar.png\" data-rel=\"lightbox-image-0\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-14833 size-large\" src=\"https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/release-distribution-bar-1024x391.png\" alt=\"anal\u00fdza viano\u010dn\u00fdch songov\" width=\"1024\" height=\"391\" srcset=\"https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/release-distribution-bar-1024x391.png 1024w, https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/release-distribution-bar-300x115.png 300w, https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/release-distribution-bar.png 1776w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><em>Distrib\u00facia skladieb pod\u013ea roku vydania &#8211; x: Rok vydania, y: Po\u010det skladieb v datasete.<\/em><\/p>\n<blockquote><p><strong>Najkrat\u0161ou skladbou v datasete je zn\u00e1ma koleda &#8222;We Wish You a Merry Christmas&#8220; (John Denver)<\/strong> &#8211; skladba trv\u00e1 len 65 sek\u00fand.<\/p><\/blockquote>\n<p>Naopak, <strong>najdlh\u0161ou skladbou je &#8222;You&#8217;re a Mean One, Mr. Grinch&#8220;<\/strong> (Thurl Ravenscroft), ktor\u00fa si budete u\u017e\u00edva\u0165 \u00factyhodn\u00fdch 5 min\u00fat a 16 sek\u00fand (vypo\u010dul som si ju a verzia na Spotify obsahuje ve\u013ea hovoren\u00e9ho slova, preto t\u00e1 odstra\u0161uj\u00faca stop\u00e1\u017e). Priemern\u00e1 d\u013a\u017eka skladby je v\u0161ak akceptovate\u013en\u00e9 (?) 3 min\u00faty a 12 sek\u00fand.<\/p>\n<p>Ke\u010f\u017ee sa mi d\u00e1t st\u00e1le m\u00e1lilo, rozhodol som sa z\u00edska\u0165 texty jednotliv\u00fdch skladieb. <a href=\"https:\/\/caitlinhudon.com\/2017\/12\/22\/blue-christmas\/\" target=\"_blank\" rel=\"noopener noreferrer\">V tomto skvelom blogu<\/a> Caitlin popisuje, ako r\u00fdchlo a jednoducho m\u00f4\u017eeme texty stiahnu\u0165 vyu\u017eit\u00edm Genius API. A skuto\u010dne, o p\u00e1r hod\u00edn nesk\u00f4r som mal texty v\u0161etk\u00fdch piesn\u00ed k dispoz\u00edci\u00ed. K niektor\u00fdm textom skladieb som sa cez Genius API nevedel dosta\u0165. V takom pr\u00edpade nastupovalo star\u00e9 dobr\u00e9 Googlenie a ru\u010dn\u00e9 kop\u00edrovanie a prid\u00e1vanie textov.<\/p>\n<p><strong>Po \u010fal\u0161\u00edch troch hodin\u00e1ch predspracovania textov skladieb som za\u010dal pochybova\u0165, \u010di tento \u010dl\u00e1nok re\u00e1lne vyjde pred Vianocami 2019. <\/strong>Jedn\u00fdm z\u00a0probl\u00e9mov bolo odstr\u00e1ni\u0165 tzv. stopwords. Stopwords s\u00fa slov\u00e1, ktor\u00e9 nenes\u00fa \u017eiadny v\u00fdznam (napr\u00edklad the, a, but, and, or, what&#8230;). Po odstr\u00e1nen\u00ed t\u00fdchto slov som mohol vygenerova\u0165 \u010fal\u0161ie \u0161tatistiky.<\/p>\n<blockquote><p><strong>Najmenej unik\u00e1tnych slov maj\u00fa skladby &#8222;Ho Ho Ho&#8220; (Sia) a &#8222;Holly Jolly Christmas&#8220; (Michael Bubl\u00e9) &#8211; po 30<\/strong>. Naopak, najkreat\u00edvnej\u0161ou skladbou (aspo\u0148 \u010do sa do po\u010dtu slov t\u00fdka) je skladba <strong>&#8222;Christmas Wrapping&#8220; (Kylie Minogue) so 175 unik\u00e1tnymi slovami.<\/strong><\/p><\/blockquote>\n<p><a href=\"https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/booze.png\" data-rel=\"lightbox-image-1\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-14901 size-full\" src=\"https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/booze.png\" alt=\"anal\u00fdza viano\u010dn\u00fdch pesni\u010diek\" width=\"381\" height=\"301\" srcset=\"https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/booze.png 381w, https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/booze-300x237.png 300w\" sizes=\"(max-width: 381px) 100vw, 381px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><em>Aj ke\u010f &#8222;Ho Ho Ho&#8220; obsahuje len 30 unik\u00e1tnych slov, zop\u00e1r z nich je ve\u013emi zauj\u00edmav\u00fdch.<\/em><\/p>\n<p>Asi nikoho neprekvap\u00ed, \u017ee naj\u010dastej\u0161\u00edm slovom, ktor\u00e9 sa vyskytuje vo viano\u010dn\u00fdch pesni\u010dk\u00e1ch je<strong> slovo &#8222;Christmas&#8220; &#8211; celkovo sa v 118 skladb\u00e1ch vyskytuje presne 833 kr\u00e1t<\/strong> (\u010do znamen\u00e1, \u017ee v jednej pesni\u010dke sa priemerne vyskytuje 7 kr\u00e1t). Zauj\u00edmav\u00e9 je, \u017ee slovo &#8222;Christmas&#8220; sa aspo\u0148 raz objav\u00ed v 92 skladb\u00e1ch (\u010do zvy\u0161uje primern\u00fd v\u00fdskyt slova na 9).<\/p>\n<blockquote><p>To ale znamen\u00e1, \u017ee <strong>v 26 viano\u010dn\u00fdch pesni\u010dk\u00e1ch sa slovo &#8222;Christmas&#8220; v\u00f4bec nevyskytuje!<\/strong> Nedalo mi, a musel som sa pozrie\u0165, ktor\u00e9 to s\u00fa (playlist n\u00e1jdete na konci \u010dl\u00e1nku).<\/p><\/blockquote>\n<p>55 skladieb m\u00e1 &#8222;Christmas&#8220; priamo v n\u00e1zve.\u00a0Slovo &#8222;Christmas&#8220; sa objavuje aj v zauj\u00edmav\u00fdch vari\u00e1ci\u00e1ch &#8211; ako napr\u00edklad v pesni\u010dke &#8222;Merry Christmas Darling&#8220; (Carpenters) sa objav\u00ed slovo &#8222;Christmasing&#8220; &#8211; toto slovo sa nevyskytuje v \u017eiadnej inej pesni\u010dke v datasete.<\/p>\n<p><strong>\u010eal\u0161ie ob\u013e\u00faben\u00e9 slov\u00e1 s\u00fa &#8222;love&#8220;, &#8222;time&#8220;, &#8222;year&#8220;, &#8222;merry&#8220;, &#8222;snow&#8220;, &#8222;make&#8220;, &#8222;santa&#8220;, &#8222;like&#8220;, &#8222;baby&#8220; at\u010f.<\/strong> Ostatn\u00e9 slov\u00e1 si m\u00f4\u017eete pozrie\u0165 v tejto vizualiz\u00e1ci\u00ed v tvare viano\u010dn\u00e9ho strom\u010deka (lebo pre\u010do nie).<\/p>\n<p><a href=\"https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/wordcloud.png\" data-rel=\"lightbox-image-2\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-14836 size-large\" src=\"https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/wordcloud-1024x768.png\" alt=\" data driven anal\u00fdza viano\u010dn\u00fdch pesni\u010diek - oblak zna\u010diek, wordcloud\" width=\"1024\" height=\"768\" srcset=\"https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/wordcloud.png 1024w, https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/wordcloud-300x225.png 300w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><\/p>\n<h2>Tu s\u00fa! Data-driven playlisty<\/h2>\n<p>Na za\u010diatku som nemal \u00faplne jasn\u00fa predstavu, ako chcem k anal\u00fdze vlastne prist\u00fapi\u0165. N\u00e1pady prich\u00e1dzali postupne, \u010di u\u017e pri vytv\u00e1ran\u00ed datasetu, alebo \u010d\u00edtan\u00edm blogov\u00fdch \u010dl\u00e1nkov (zoznam u\u017eito\u010dn\u00fdch \u010dl\u00e1nkov som sp\u00edsal \u00faplne nakoniec v \u010dasti Zdroje).<\/p>\n<h3>Playlist: <a href=\"https:\/\/open.spotify.com\/playlist\/7Bxj34lXvrFdmyZYBq1ppj?si=z12NeHATRhiAsCfA_FsDIQ\" target=\"_blank\" rel=\"noopener noreferrer\">Cheesiest Christmas!<\/a><\/h3>\n<p>Tento playlist obsahuje tie najg\u00fd\u010dovej\u0161ie a najneorigin\u00e1lnej\u0161ie viano\u010dn\u00e9 skladby. Na ur\u010denie najg\u00fd\u010dovej\u0161ej skladby som si definoval tzv. <em>Christmas Cheese Ratio<\/em>. Tento pomer sa po\u010d\u00edta pod\u013ea ve\u013emi jednoduch\u00e9ho vzorca:<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-full wp-image-14858\" src=\"https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/CodeCogsEqn.gif\" alt=\"\" width=\"528\" height=\"42\" \/><\/p>\n<p><em>Christmas Cheese Ratio<\/em> ur\u010duje, ko\u013eko percent z pesni\u010dky tvor\u00ed 20 najbe\u017enej\u0161\u00edch slov vo viano\u010dn\u00fdch skladb\u00e1ch. Ak je tento pomer v\u00e4\u010d\u0161\u00ed ako 33% (viac ako tretina pesni\u010dky je tvoren\u00e1 najbe\u017enej\u0161ie vyskytuj\u00facimi sa slovami), skladba je na playliste! Osemn\u00e1s\u0165 najmenej kreat\u00edvnych viano\u010dn\u00fdch skladieb &#8211; to chcete!<\/p>\n<p>Fakt, \u017ee v tomto playliste n\u00e1jdete a\u017e 3 pesni\u010dky od Mariah Carey, u\u017e pos\u00fa\u010fte sami. <strong>Najg\u00fd\u010dovej\u0161ou viano\u010dnou pesni\u010dkou je Make It To Christmas (Alessia Cara)<\/strong>.<\/p>\n<h3>Playlists: <a href=\"https:\/\/open.spotify.com\/playlist\/040Tf5OzjreX4ZXoF4TX64?si=ls54xIL-RJmeMaN47uzj-Q\" target=\"_blank\" rel=\"noopener noreferrer\">Joyful Christmas \ud83d\ude42<\/a> &amp; <a href=\"https:\/\/open.spotify.com\/playlist\/3XDJdf53Jol0ajft2qT9XV?si=dFxvX_KyTMascooqc9I3hg\" target=\"_blank\" rel=\"noopener noreferrer\">Sad Christmas \ud83d\ude41<\/a><\/h3>\n<p>Rozm\u00fd\u0161\u013eal som ako pri anal\u00fdze vyu\u017ei\u0165 text skladieb. Zaujala ma anal\u00fdza sentimentu (slov\u00e1 m\u00f4\u017eu by\u0165 pozit\u00edvne, negat\u00edvne alebo neutr\u00e1lne) a em\u00f3cie textu. V\u00e4\u010d\u0161ina slov je zaraden\u00e1 do kateg\u00f3rie (niektor\u00e9 slov\u00e1 m\u00f4\u017eu by\u0165 vo viacer\u00fdch kateg\u00f3ri\u00e1ch) pod\u013ea toho, ak\u00fa em\u00f3ciu v \u013eu\u010foch vyvol\u00e1va. Vyu\u017eil som\u00a0<a href=\"http:\/\/saifmohammad.com\/WebPages\/NRC-Emotion-Lexicon.htm\" target=\"_blank\" rel=\"noopener noreferrer\">NRC lexicon<\/a>\u00a0, ktor\u00fd <strong>slov\u00e1 kategorizuje do desiatich kateg\u00f3ri\u00ed<\/strong> &#8211;\u00a0<strong>positive and negative<\/strong> (pozit\u00edvne a negat\u00edvne), <strong>anger and anticipation<\/strong> (hnev a o\u010dak\u00e1vanie),<strong> disgust and fear<\/strong> (znechutenie a strach), <strong>joy and sadness<\/strong> (\u0161\u0165astie a sm\u00fatok) a\u00a0<strong>surprise and trust<\/strong> (prekvapenie a d\u00f4vera). V prvom kroku som sa pozrel na slov\u00e1 v\u0161etk\u00fdch pesni\u010diek. Pod\u013ea o\u010dak\u00e1vania sa na prv\u00fdch piatich miestach objavili em\u00f3cie positive, joy, anticipation, trust a surprise. Naopak, p\u00e4\u0165 najmenej pou\u017e\u00edvan\u00fdch citovo zafarben\u00fdch slov spadalo do kateg\u00f3ri\u00ed negative, sadness, anger, fear a disgust\u00a0 (hoci verte mi, po t\u00fd\u017edni kon\u0161tatn\u00e9ho po\u010d\u00favania viano\u010dn\u00fdch playlistov som bol znechuten\u00fd viac ne\u017e dos\u0165).<\/p>\n<p><a href=\"https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/sentiment.png\" data-rel=\"lightbox-image-3\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-14871 size-full\" src=\"https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/sentiment.png\" alt=\"skladby pod\u013ea sentimentu - data driven anal\u00fdza viano\u010dn\u00fdch pesni\u010diek\" width=\"797\" height=\"493\" srcset=\"https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/sentiment.png 797w, https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/sentiment-300x186.png 300w\" sizes=\"(max-width: 797px) 100vw, 797px\" \/><\/a><\/p>\n<p>Chcel som teda ur\u010di\u0165 naj\u0161\u0165astnej\u0161ie pesni\u010dky. Tie\u017e som sa chcel pozrie\u0165, \u010di by sa niektor\u00e9 z nich dali ozna\u010di\u0165 za smutn\u00e9 a\u017e depres\u00edvne, alebo s\u00fa negat\u00edvne slov\u00e1 sk\u00f4r rovnomerne rozlo\u017een\u00e9 naprie\u010d skladbami. Zadefinoval som si dva vzorce:<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-full wp-image-14898\" src=\"https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/endorphin.gif\" alt=\"\" width=\"575\" height=\"38\" \/><\/p>\n<p>Celkov\u00fa pozitivitu skladby (<em>christmasEndorphinIndex<\/em>) som ur\u010dil ako priemer troch normalizovan\u00fdch hodn\u00f4t &#8211;\u00a0<em>positiveWords<\/em>,\u00a0<em>lyricalDensity<\/em>\u00a0a\u00a0<em>energy<\/em>.\u00a0<em>positiveWords<\/em>\u00a0je pomer medzi pozit\u00edvnymi slovami a v\u0161etk\u00fdmi slovami v pesni\u010dke (bez stopwords).\u00a0<em>lyricalDensity\u00a0<\/em>je pomer po\u010dtu slov v pesni\u010dke a jej \u010dasu (tak\u017ee vyjadruje po\u010det slov za sekundu).\u00a0<em>energy\u00a0<\/em>je metrika od Spotify a mala by predstavova\u0165 energick\u00e9 skladby. Metriky <em>lyricalDensity<\/em> a <em>energy<\/em> som do vzorca zahrnul, preto\u017ee okrem pozit\u00edvnych slov by \u0161\u0165astn\u00e1 (joyful) viano\u010dn\u00e1 pesni\u010dka mala by\u0165 aj vo veselom r\u00fdchlom tempe. Ak by pesni\u010dka obsahovala ve\u013ea pozit\u00edvnych slov, ale tempo a energia by boli n\u00edzke, pesni\u010dka by p\u00f4sobila sk\u00f4r upokojuj\u00faco (napr. Tich\u00e1 noc). Tempo spievania vyjadruje metrika\u00a0<em>lyricalDensity<\/em>\u00a0a hudobn\u00e9 tempo metrika\u00a0<em>energy<\/em>. V\u0161etky tri metriky boli normalizovan\u00e9 na hodnoty od 0 po 1. <strong>Najpozit\u00edvnej\u0161ou pesni\u010dkou je &#8222;<\/strong><span data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;Merry Christmas, Happy Holidays&quot;}\" data-sheets-userformat=\"{&quot;2&quot;:515,&quot;3&quot;:{&quot;1&quot;:0},&quot;4&quot;:[null,2,16777215],&quot;12&quot;:0}\"><strong>Merry Christmas, Happy Holidays&#8220;<\/strong> (*NSYNC).\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-full wp-image-14897\" src=\"https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/depression.gif\" alt=\"\" width=\"593\" height=\"40\" \/><\/p>\n<p>Podobne som postupoval pri h\u013eadan\u00ed negat\u00edvnych pesni\u010diek. Vytvoril som ukazovate\u013e <em>christmasDepressionIndex.<\/em> Samozrejme, namiesto pozit\u00edvnych slov som v metrike\u00a0<em>negativeWords\u00a0<\/em>zis\u0165oval frekvenciu v\u00fdskytu negat\u00edvnych slov. V tomto pr\u00edpade som ale metriky <em>lyricalDensity<\/em> a <em>energy<\/em> znegoval &#8211; tzn. ak mala pesni\u010dka menej slov za sekundu a bola menej energick\u00e1, pova\u017eujem ju za smutnej\u0161iu. Vysok\u00e1 frekvencia negat\u00edvynch slov a vysok\u00e9 metriky\u00a0<em>energy<\/em>\u00a0a <em>lyricalDensity<\/em> by sk\u00f4r implikovali hnev (mysl\u00edm, \u017ee \u017eiadna metalov\u00e1 skupina sa do fin\u00e1lneho datasetu nedostala). Preto som sa s\u00fastredil na skladby s n\u00edzkou\u00a0<em>energy<\/em>\u00a0a\u00a0<em>lyricalDensity &#8211;\u00a0<\/em>tieto skladby bud\u00fa evokova\u0165 sm\u00fatok. <strong>Najsmutej\u0161ou skladbou sa stala &#8222;You&#8217;re A Mean One, Mr. Grinch&#8220;<\/strong> (Thurl Ravenscroft). Zauj\u00edmav\u00e9 je, \u017ee na playlist sa dostala aj pesni\u010dka &#8222;Let It Snow!\u00a0Let It Snow!\u00a0Let It Snow!&#8220; (Frank Sinatra), ktor\u00fa vn\u00edmam sk\u00f4r pozit\u00edvne,<strong> ale ke\u010f\u017ee sme v\u0161etci data-driven, osobn\u00e9 preferencie nikoho nezauj\u00edmaj\u00fa<\/strong>.<\/p>\n<h3>P\u00e1r bonusov\u00fdch playlistov na z\u00e1ver<\/h3>\n<p>Po\u010das predspracov\u00e1vania datasetu mi napadli \u010fal\u0161ie zauj\u00edmav\u00e9 playlisty. Za t\u00fdmito playlistmi neh\u013eadajte \u017eiadnu ve\u013ek\u00fa anal\u00fdzu (sk\u00f4r je to len zauj\u00edmav\u00e1 \u0161tatistika).<\/p>\n<h3>Playlist: <strong><a href=\"https:\/\/open.spotify.com\/playlist\/6J2L5WGxCuLVT3MGSS8b31?si=nXX80skMRcGppGwr9UCxaw\" target=\"_blank\" rel=\"noopener noreferrer\">Christmas songs without word &#8222;Christmas&#8220; in their lyrics, because why not<\/a><\/strong><\/h3>\n<p>Tento playlist je vytvoren\u00fd z viano\u010dn\u00fdch skladieb, ktor\u00e9 neobsahuj\u00fa slovo &#8222;Christmas&#8220;.*<\/p>\n<p><em>*Hoci anal\u00fdza uk\u00e1zala, \u017ee pesni\u010dka &#8222;Wonderful Christmastime&#8220; (Paul McCartney) a pesni\u010dka &#8222;Merry Xmas Everybody&#8220; (Slade) neobsahuj\u00fa slovo &#8222;Christmas&#8220; &#8211; vyskytuje sa len vo vari\u00e1ci\u00e1ch &#8222;Christmastime&#8220; resp. &#8222;X-mas&#8220; &#8211; do playlistu som ich nakoniec nezaradil (aby som sa vyhol pr\u00edpadn\u00fdm s\u0165a\u017enostiam).<\/em><\/p>\n<h3>Playlist: <a href=\"https:\/\/open.spotify.com\/playlist\/7gbTEFhyOO9ypvn85Sb2tw?si=WJOEaoOSQKGDhzn4Mqx_IA\" target=\"_blank\" rel=\"noopener noreferrer\">Let&#8217;s Dance this Christmas<\/a><\/h3>\n<p>Taktie\u017e som sa chcel pozrie\u0165 na ostatn\u00e9 metriky od Spotify a &#8222;overi\u0165&#8220; si ich presnos\u0165. V tomto playliste n\u00e1jdete energick\u00e9 skladby, na ktor\u00e9 by sa malo aj dobre tancova\u0165 (dajte mi vedie\u0165, ke\u010f odsk\u00fa\u0161ate na viano\u010dnom ve\u010dierku).<\/p>\n<h3>Playlist: <a href=\"https:\/\/open.spotify.com\/playlist\/6859MBkceFfYYy9UTuSMFy?si=jDthUE7yRx6S8hbg90V_Mw\" target=\"_blank\" rel=\"noopener noreferrer\">Is it over yet? [Christmas]<\/a><\/h3>\n<p>Tento playlist je ur\u010den\u00fd pre v\u0161etk\u00fdch, ktor\u00ed by boli najrad\u0161ej keby Vianoce pre\u0161li \u010do najr\u00fdchlej\u0161ie, no nechc\u00fa kazi\u0165 rados\u0165 t\u00fdm, ktor\u00ed sa na Vianoce te\u0161ili cel\u00fd rok. 20 pesni\u010diek do 40 min\u00fat? \u017diadny probl\u00e9m! Ani jedna pesni\u010dka na tomto playliste netrv\u00e1 dlh\u0161ie ako dve a pol min\u00faty!<\/p>\n<h2>Zdroje<\/h2>\n<p>Ako som u\u017e spom\u00ednal vy\u0161\u0161ie, moje znalosti v R boli pred za\u010dat\u00edm tohto projektu takmer nulov\u00e9 a preto by tento \u010dl\u00e1nok nikdy nevznikol bez t\u00fdchto skvel\u00fdch \u010dl\u00e1nkov:<\/p>\n<p><a href=\"https:\/\/www.rcharlie.com\/spotifyr\/\" target=\"_blank\" rel=\"noopener noreferrer\">SpotifyR<\/a> &#8211; Charlie Thompson, Josiah Parry, Donal Phipps, Tom Wolff<\/p>\n<p><a href=\"https:\/\/caitlinhudon.com\/2017\/12\/22\/blue-christmas\/\" target=\"_blank\" rel=\"noopener noreferrer\">Blue Christmas: A data-driven search for the most depressing song<\/a>, Caitlin Hudon<\/p>\n<p><a href=\"https:\/\/towardsdatascience.com\/angriest-death-grips-data-anger-502168c1c2f0\" target=\"_blank\" rel=\"noopener noreferrer\">Using Data to Find the Angriest Death Gristp Song<\/a>, Evan\u00a0Oppenheimer<\/p>\n<h2 style=\"text-align: center;\">Prajeme Vesel\u00e9 Vianoce<br \/>\na ve\u013ea data-driven rozhodnut\u00ed v Novom Roku!<\/h2>\n<p><em><strong>Upozornenie na z\u00e1ver:<\/strong> Po\u010d\u00favanie v\u0161etk\u00fdch playlistov je len na vlastn\u00e9 riziko! Autor \u010dl\u00e1nku Pe\u0165o, u\u017e nie t\u00fdm \u010dlovekom, ktor\u00fdm b\u00fdval pred nap\u00edsan\u00edm tohto \u010dl\u00e1nku&#8230;<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u010co keby sme sa pri vytv\u00e1ran\u00ed viano\u010dn\u00fdch playlistov neriadili na\u0161imi preferenciami, ale d\u00e1tami? Rozhodol som sa, \u017ee sa&#8230;<\/p>\n","protected":false},"author":62,"featured_media":18062,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[200],"tags":[603],"_links":{"self":[{"href":"https:\/\/www.dase-analytics.com\/blog\/sk\/wp-json\/wp\/v2\/posts\/14832"}],"collection":[{"href":"https:\/\/www.dase-analytics.com\/blog\/sk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dase-analytics.com\/blog\/sk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dase-analytics.com\/blog\/sk\/wp-json\/wp\/v2\/users\/62"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dase-analytics.com\/blog\/sk\/wp-json\/wp\/v2\/comments?post=14832"}],"version-history":[{"count":46,"href":"https:\/\/www.dase-analytics.com\/blog\/sk\/wp-json\/wp\/v2\/posts\/14832\/revisions"}],"predecessor-version":[{"id":18064,"href":"https:\/\/www.dase-analytics.com\/blog\/sk\/wp-json\/wp\/v2\/posts\/14832\/revisions\/18064"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.dase-analytics.com\/blog\/sk\/wp-json\/wp\/v2\/media\/18062"}],"wp:attachment":[{"href":"https:\/\/www.dase-analytics.com\/blog\/sk\/wp-json\/wp\/v2\/media?parent=14832"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dase-analytics.com\/blog\/sk\/wp-json\/wp\/v2\/categories?post=14832"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dase-analytics.com\/blog\/sk\/wp-json\/wp\/v2\/tags?post=14832"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}