{"id":15258,"date":"2020-02-12T10:07:30","date_gmt":"2020-02-12T08:07:30","guid":{"rendered":"https:\/\/www.dase-analytics.com\/blog\/?p=15258"},"modified":"2020-02-14T09:54:57","modified_gmt":"2020-02-14T07:54:57","slug":"potrebujete-spracovat-velke-mnozstvo-dat-bigquery-je-riesenie","status":"publish","type":"post","link":"https:\/\/www.dase-analytics.com\/blog\/sk\/potrebujete-spracovat-velke-mnozstvo-dat-bigquery-je-riesenie\/","title":{"rendered":"Potrebujete spracova\u0165 ve\u013ek\u00e9 mno\u017estvo d\u00e1t? BigQuery je rie\u0161enie"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">Prepojenie online spr\u00e1vania pou\u017e\u00edvate\u013eov s offline n\u00e1kupmi je snom mnoh\u00fdch firiem a analytikov. Existuje viacero mo\u017enost\u00ed, ktor\u00e9 s\u00fa viac \u010di menej spo\u013eahliv\u00e9. Jednou z nich je BigQuery, data warehouse platforma od Googlu, ktor\u00e1 sl\u00fa\u017ei ako automatizovan\u00fd centr\u00e1lny syst\u00e9m prep\u00e1jaj\u00faci viacer\u00e9 d\u00e1tov\u00e9 zdroje. Aj v\u00e1m to znie tak sexi, ako n\u00e1m? Tak \u010d\u00edtajte \u010falej. \ud83d\ude09\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Pri analyzovan\u00ed n\u00e1kupn\u00e9ho spr\u00e1vania sa \u010dasto zab\u00fada na fakt, \u017ee pou\u017e\u00edvatelia interaguj\u00fa so zna\u010dkou aj inak, ako len v \u201conline svete\u201d. Pri\u010dom tieto interakcie (na webe alebo prostredn\u00edctvom aplik\u00e1cie) m\u00f4\u017eu pre niektor\u00e9 firmy predstavova\u0165<\/span><b> iba mal\u00fd zlomok z celkovej n\u00e1kupnej cesty z\u00e1kazn\u00edka.<\/b><span style=\"font-weight: 400;\"> Hlavne, pokia\u013e sa bav\u00edme o lead generation weboch alebo in\u00fdch \u201comni channel\u201d biznisoch, kedy z\u00e1kazn\u00edci vyh\u013ead\u00e1vaj\u00fa a porovn\u00e1vaj\u00fa produkty online, no transakcia sa nakoniec odohr\u00e1 v kamennom obchode. Spolieha\u0165 sa v takomto pr\u00edpade len na webov\u00e9 d\u00e1ta m\u00f4\u017ee by\u0165 ve\u013emi zav\u00e1dzaj\u00face. Existuj\u00fa <\/span><a href=\"https:\/\/www.dase-analytics.com\/blog\/ako-dostat-offline-interakcie-do-google-analytics\/\"><span style=\"font-weight: 400;\">mo\u017enosti ako posiela\u0165 offline interakcie do Google Analytics<\/span><\/a><span style=\"font-weight: 400;\">, no \u010dastokr\u00e1t je tak\u00e1to implement\u00e1cia komplikovan\u00e1 a pr\u00edli\u0161 n\u00e1chyln\u00e1 na chybu.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Ak s\u00fa v\u0161ak d\u00e1ta z webu doplnen\u00e9 aj o d\u00e1ta z \u010fal\u0161\u00edch zdrojov, ste schopn\u00ed vytvori\u0165 omnoho re\u00e1lnej\u0161\u00ed obraz o celkovej n\u00e1kupnej ceste z\u00e1kazn\u00edkov. \u010ci u\u017e sa jedn\u00e1 o:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">CRM d\u00e1ta,\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">d\u00e1ta z mobiln\u00fdch aplik\u00e1ci\u00ed,<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">d\u00e1ta z\u00edskan\u00fdch z r\u00f4znych API (napr\u00edklad d\u00e1ta o po\u010das\u00ed),<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">a mnoh\u00e9 \u010fal\u0161ie.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Ak sa sna\u017e\u00edte prep\u00e1ja\u0165 d\u00e1ta z viacer\u00fdch zdrojov,<\/span><b> je potrebn\u00e9 vytvori\u0165 jeden centr\u00e1lny syst\u00e9m<\/b><span style=\"font-weight: 400;\">, ktor\u00fd ich bude (ide\u00e1lne automaticky) uchov\u00e1va\u0165, upravova\u0165 a navz\u00e1jom prep\u00e1ja\u0165. V neposledom rade v\u00e1m syst\u00e9m mus\u00ed umo\u017eni\u0165 <\/span><b>r\u00fdchlo a jednoducho<\/b><span style=\"font-weight: 400;\"> z d\u00e1t vytiahnu\u0165 to, \u010do moment\u00e1lne potrebujete pre lep\u0161ie informovan\u00e9 rozhodnutie.<\/span><\/p>\n<h2><span style=\"font-weight: 400;\">Je vhodn\u00e9 vlastn\u00e9 rie\u0161enie?<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Pre ve\u013ek\u00e9 firmy so \u0161ikovn\u00fdm IT oddelen\u00edm a analytikmi mo\u017eno \u00e1no. Pre v\u00e4\u010d\u0161inu firiem tu v\u0161ak vznik\u00e1 mno\u017estvo probl\u00e9mov:\u00a0<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Vybudovanie tak\u00e9hoto syst\u00e9mu na vlastn\u00fdch serveroch a s vyu\u017eit\u00edm vlastnej v\u00fdpo\u010dtovej sily je<\/span><b> \u010dasovo a finan\u010dne ve\u013emi n\u00e1ro\u010dn\u00e9.\u00a0<\/b><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">To sa net\u00fdka len po\u010diato\u010dn\u00e9ho nastavenia, ale aj <\/span><b>n\u00e1slednej \u00fadr\u017eby<\/b><span style=\"font-weight: 400;\">, ktor\u00e1 s vybudovan\u00edm takejto infra\u0161trukt\u00fary s\u00favis\u00ed.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Navy\u0161e, ak chcete tak\u00fdto syst\u00e9m<\/span><b> \u0161k\u00e1lova\u0165,<\/b><span style=\"font-weight: 400;\"> vznikaj\u00fa \u010fal\u0161ie komplik\u00e1cie. Jednoducho povedan\u00e9: v\u00fdpo\u010dtov\u00e1 sila, ktor\u00fa m\u00e1te k dispoz\u00edci\u00ed u\u017e neposta\u010duje a potrebujete znova zainvestova\u0165. S t\u00fdm s\u00fa spojen\u00e9 \u010fal\u0161ie \u010dasov\u00e9 a finan\u010dn\u00e9 n\u00e1roky.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Opa\u010dn\u00fd probl\u00e9m nast\u00e1va, ak ste investovali do vybavenia, ktor\u00e9 moment\u00e1lne vyu\u017e\u00edvate tak na 30%. <\/span><b>Syst\u00e9m mus\u00edte udr\u017eiava\u0165 rovnako,<\/b><span style=\"font-weight: 400;\"> ako keby ste ho vyu\u017e\u00edvali na 100% a rovnako mus\u00edte plati\u0165 zbyto\u010dne vysok\u00e9 \u00fa\u010dty za elektrinu.<\/span><\/li>\n<\/ol>\n<h2><span style=\"font-weight: 400;\">\u010co je BigQuery?<\/span><\/h2>\n<p><a href=\"https:\/\/cloud.google.com\/bigquery\/\"><span style=\"font-weight: 400;\">BigQuery<\/span><\/a><span style=\"font-weight: 400;\"> je vysoko \u0161k\u00e1lovate\u013en\u00fd data warehouse umiestnen\u00fd na Google Cloud-e. P\u00f4vodne bolo BigQuery vyv\u00edjan\u00e9 pre intern\u00e9 potreby Googlu, no nesk\u00f4r ho spr\u00edstupnil verejnosti prostredn\u00edctvom Google Cloud. Hlavnou \u00falohou BigQuery je vybudovanie jedn\u00e9ho centr\u00e1lneho miesta pre zber, modifikovanie a prep\u00e1janie d\u00e1t z r\u00f4znych zdrojov.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Ak pozn\u00e1te datab\u00e1zov\u00e9 syst\u00e9my, tak viete, \u017ee s\u00fa zalo\u017een\u00e9 na tabu\u013ek\u00e1ch. V BigQuery s\u00fa taktie\u017e d\u00e1ta ulo\u017een\u00e9 vo forme tabuliek. St\u013apce predstavuj\u00fa atrib\u00faty a riadky jednotliv\u00e9 z\u00e1znamy. V pr\u00edpade, \u017ee chceme z tabu\u013eky vytiahnu\u0165 \u00fadaje, mus\u00edme nap\u00edsa\u0165 dopyt (alebo query). Dopyty sa v BigQuery zad\u00e1vaj\u00fa v dopytovacom jazyku SQL. Ako to potom re\u00e1lne vyzer\u00e1 v praxi?<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Uk\u00e1\u017eme si to na ve\u013emi jednoduchom pr\u00edklade. Predstavme si, \u017ee m\u00e1me tabu\u013eku s objedn\u00e1vkami z n\u00e1\u0161ho eshopu. Tabu\u013eka obsahuje id transakcie (transaction_id), id pou\u017e\u00edvate\u013ea (user_id) a celkov\u00fa hodnotu transakcie (revenue). Ak by sme chceli ur\u010di\u0165 lifetime value (LTV) hodnotu na\u0161ich z\u00e1kazn\u00edkov, nap\u00edsali by sme dopyt:<\/span><\/p>\n<p><code>SELECT SUM(revenue) AS ltv, user_id<\/code><br \/>\n<code>FROM `ourdatasource.ecommerce`<\/code><br \/>\n<code>GROUP BY user_id<\/code><br \/>\n<code>ORDER BY ltv DESC<\/code><\/p>\n<p><span style=\"font-weight: 400;\">Samozrejme, s rast\u00facou n\u00e1ro\u010dnos\u0165ou ot\u00e1zok rastie aj komplexnos\u0165 dopytu. Av\u0161ak v\u00fdhodou tak\u00e9hoto priameho dopytovanie je, \u017ee ot\u00e1zky vieme priamo pretransformova\u0165 do formy SQL dopytu. V\u00fdsledky je mo\u017en\u00e9 bu\u010f vyexportova\u0165 (CSV, Google Sheets), alebo ulo\u017ei\u0165 vo forme tabu\u013eky priamo v BigQuery. Nako\u013eko Google Data Studio obsahuje BigQuery connector, v\u00fdsledky anal\u00fdz m\u00f4\u017eeme jednoducho vizualizova\u0165 napr\u00edklad pr\u00e1ve v tomto n\u00e1stroji.<\/span><\/p>\n<h2><span style=\"font-weight: 400;\">Hlavn\u00e9 v\u00fdhody BigQuery<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Ak\u00e9 s\u00fa teda v\u00fdhody vytvorenia centr\u00e1lneho d\u00e1tov\u00e9ho hubu na cloud-e oproti vytvoreniu tak\u00e9hoto hubu vo forme vlastn\u00fdch fyzick\u00fdch serverov?<\/span><b>\u00a0<\/b><\/p>\n<h3><span style=\"font-weight: 400;\">Lacn\u00e9 \u00falo\u017eisko<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">\u00dalo\u017eisko Cloud Storage je ve\u013emi lacn\u00e9, navy\u0161e prv\u00fdch 10 GB je zdarma. V \u010dase p\u00edsania tohto \u010dl\u00e1nku (janu\u00e1r 2020) stoj\u00ed 1 GB \u00falo\u017eiska 2 centy. V pr\u00edpade, \u017ee ste d\u00e1ta posledn\u00fdch 90 dn\u00ed nemodifikovali, cena dokonca pad\u00e1 na 1 cent za GB d\u00e1t. To znamen\u00e1, \u017ee <\/span><b>jeden terabajt d\u00e1t v\u00e1s stoj\u00ed na akt\u00edvnom \u00falo\u017eisku 20$ mesa\u010dne.<\/b><span style=\"font-weight: 400;\"> Ak ste ulo\u017een\u00e9 tabu\u013eky nezmenili po dobu 90 dn\u00ed, cena padne na polovicu, tzn. 10$.<\/span><\/p>\n<h3><span style=\"font-weight: 400;\">Menej infra\u0161trukt\u00fary a \u00fadr\u017eby = viac pr\u00e1ce s d\u00e1tami.<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Ak chcete vybudova\u0165 svoje vlastn\u00e9 Big Data rie\u0161enie, mus\u00edte investova\u0165 do infra\u0161trukt\u00fary. Okrem \u00falo\u017eiska (fyzick\u00fdch diskov) na uchov\u00e1vanie d\u00e1t, je potrebn\u00e1 v\u00fdpo\u010dtov\u00e1 sila, ktor\u00e1 bude schopn\u00e1 nad d\u00e1tami dostato\u010dne r\u00fdchlo vykon\u00e1va\u0165. Samozrejme, s t\u00fdm v\u0161etk\u00fdm je spojen\u00e1 \u010dasovo n\u00e1ro\u010dn\u00e1 \u00fadr\u017eba, aby v\u0161etko fungovalo tak ako m\u00e1. Pri BigQuery v\u0161etky tieto starosti odpad\u00e1vaj\u00fa. V\u0161etky zdroje s\u00fa v\u00e1m priraden\u00e9 pod\u013ea potreby a po skon\u010den\u00ed pr\u00e1ce s\u00fa v\u00e1m zase odobran\u00e9. V\u0161etko sa deje automaticky na pozad\u00ed, tak\u017ee sa m\u00f4\u017eete s\u00fastredi\u0165 \u010disto na pr\u00e1cu s d\u00e1tami.\u00a0<\/span><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-15256 aligncenter\" src=\"https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/image1-24.png\" alt=\"Porovnanie rie\u0161enia na Cloude (Big Query) s fyzick\u00fdm vlastn\u00fdm rie\u0161en\u00edm.\" width=\"933\" height=\"526\" srcset=\"https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/image1-24.png 933w, https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/image1-24-300x169.png 300w, https:\/\/www.dase-analytics.com\/blog\/wp-content\/uploads\/image1-24-600x338.png 600w\" sizes=\"(max-width: 933px) 100vw, 933px\" \/><\/p>\n<h3><span style=\"font-weight: 400;\">V\u00fdhodn\u00fd model spoplatnenia<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">V\u00fdhodou je aj model, ak\u00fdm je BigQuery spoplaten\u00e9. A to sa net\u00fdka len BigQuery, ale v\u0161etk\u00fdch komponentov, ktor\u00e9 s\u00fa dostupn\u00e9 v r\u00e1mci Google Cloud. <\/span><b>Najlep\u0161ie pomenovanie je asi \u201cpay as you go\u201d,<\/b><span style=\"font-weight: 400;\"> to znamen\u00e1, \u017ee celkov\u00fd \u00fa\u010det sa bude odv\u00edja\u0165 od toho, ak\u00e9 produkty a ako intenz\u00edvne ich vyu\u017e\u00edvate.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">BigQuery je spoplatnen\u00e9 pod\u013ea mno\u017estva spracovan\u00fdch d\u00e1t. Pred vykonan\u00edm ak\u00e9hoko\u013evek dopytu (query) sa zobraz\u00ed inform\u00e1cia o tom, ak\u00e9 mno\u017estvo d\u00e1t sa spracuje (v MB alebo GB). <\/span><b>Prv\u00fd terabajt je ka\u017ed\u00fd mesiac zdarma.<\/b><span style=\"font-weight: 400;\"> Je to viac ne\u017e dos\u0165, pokia\u013e si chcete pr\u00e1cu s BigQuery vysk\u00fa\u0161a\u0165 na niektorom z mno\u017estva <\/span><a href=\"https:\/\/cloud.google.com\/bigquery\/public-data\"><span style=\"font-weight: 400;\">verejn\u00fdch datasetov<\/span><\/a><span style=\"font-weight: 400;\">. Po vy\u010derpan\u00ed prv\u00e9ho terabajtu je ka\u017ed\u00fd \u010fal\u0161\u00ed spoplatnen\u00fd sumou 5$. Samozrejme, d\u00f4le\u017eit\u00e1 je optimaliz\u00e1cia, aby dopyty do datab\u00e1zy boli \u010do najefekt\u00edvnej\u0161ie (sprocesovali len nevyhnutn\u00e9 mno\u017estvo d\u00e1t).<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Viac podrobnost\u00ed o cene m\u00f4\u017eete n\u00e1js\u0165 <\/span><a href=\"https:\/\/cloud.google.com\/bigquery\/pricing\"><span style=\"font-weight: 400;\">tu<\/span><\/a><span style=\"font-weight: 400;\">, popr\u00edpade vyu\u017ei\u0165 <\/span><a href=\"https:\/\/cloud.google.com\/products\/calculator\"><span style=\"font-weight: 400;\">online kalkula\u010dku<\/span><\/a><span style=\"font-weight: 400;\">, ktor\u00e1 odhadne va\u0161e mesa\u010dn\u00e9 n\u00e1klady.\u00a0<\/span><\/p>\n<h3><span style=\"font-weight: 400;\">\u0160k\u00e1lovate\u013enos\u0165<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">V\u00fdhodou rie\u0161enia na Google Cloud-e je, \u017ee zdroje s\u00fa v\u00e1m priraden\u00e9 pod\u013ea toho, ko\u013eko v danej chv\u00edli re\u00e1lne potrebujete. Ke\u010f sa sna\u017e\u00edte spracova\u0165 v jednom dopyte ve\u013ek\u00e9 mno\u017estvo d\u00e1t, na pozad\u00ed je v\u00e1\u0161mu dopytu priraden\u00e1 v\u00e4\u010d\u0161ia v\u00fdpo\u010dtov\u00e1 sila, aby ste na v\u00fdsledok nemuseli \u010daka\u0165 ve\u010dnos\u0165. A ke\u010f s pr\u00e1cou skon\u010d\u00edte, v\u00fdpo\u010dtov\u00e9 jednotky s\u00fa v\u00e1m znova odobrat\u00e9.\u00a0<\/span><\/p>\n<h2><span style=\"font-weight: 400;\">Ste pripraven\u00ed za\u010da\u0165 s BigQuery?<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">BigQuery pom\u00e1ha firm\u00e1m bezpe\u010dnej\u0161ie, r\u00fdchlej\u0161ie a efekt\u00edvnej\u0161ie spravova\u0165 a spracova\u0165 d\u00e1ta. Je s\u00fa\u010das\u0165ou ekosyst\u00e9mu Google Cloud, v\u010faka ktor\u00e9mu viete d\u00e1ta priamo prepoji\u0165 s ostatn\u00fdmi Google cloudov\u00fdmi komponentami, ako napr\u00edklad:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Dataprep na r\u00fdchlu a jednoduch\u00fa pr\u00edpravu va\u0161ich d\u00e1t,\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Cloud Storage ako lacn\u00e9 cloudov\u00e9 \u00falo\u017eisko,\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">AI a Machine Learning modely vytvoren\u00e9 priamo Google-om.\u00a0<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Toto v\u0161etko m\u00e1te okam\u017eite k dispoz\u00edcii bez nutnosti investovania ve\u013ek\u00e9ho mno\u017estva \u010dasu a pe\u0148az\u00ed do budovania a udr\u017eovania komplikovanej infra\u0161trukt\u00fary.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">T\u00e9me BigQuery a Google Cloud sa na na\u0161om blogu budeme venova\u0165 viac. Ak u\u017e teraz m\u00e1te ak\u00e9ko\u013evek ot\u00e1zky, dajte n\u00e1m vedie\u0165 a pok\u00fasime sa ich zodpoveda\u0165 v \u010fal\u0161\u00edch \u010dl\u00e1nkoch \ud83d\ude42<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Prepojenie online spr\u00e1vania pou\u017e\u00edvate\u013eov s offline n\u00e1kupmi je snom mnoh\u00fdch firiem a analytikov. Existuje viacero mo\u017enost\u00ed, ktor\u00e9 s\u00fa&#8230;<\/p>\n","protected":false},"author":62,"featured_media":15262,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[640,200],"tags":[672,673,671],"_links":{"self":[{"href":"https:\/\/www.dase-analytics.com\/blog\/sk\/wp-json\/wp\/v2\/posts\/15258"}],"collection":[{"href":"https:\/\/www.dase-analytics.com\/blog\/sk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dase-analytics.com\/blog\/sk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dase-analytics.com\/blog\/sk\/wp-json\/wp\/v2\/users\/62"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dase-analytics.com\/blog\/sk\/wp-json\/wp\/v2\/comments?post=15258"}],"version-history":[{"count":6,"href":"https:\/\/www.dase-analytics.com\/blog\/sk\/wp-json\/wp\/v2\/posts\/15258\/revisions"}],"predecessor-version":[{"id":15265,"href":"https:\/\/www.dase-analytics.com\/blog\/sk\/wp-json\/wp\/v2\/posts\/15258\/revisions\/15265"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.dase-analytics.com\/blog\/sk\/wp-json\/wp\/v2\/media\/15262"}],"wp:attachment":[{"href":"https:\/\/www.dase-analytics.com\/blog\/sk\/wp-json\/wp\/v2\/media?parent=15258"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dase-analytics.com\/blog\/sk\/wp-json\/wp\/v2\/categories?post=15258"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dase-analytics.com\/blog\/sk\/wp-json\/wp\/v2\/tags?post=15258"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}