Niba warigeze kureba moderi y’icyitegererezo igabanya umutwaro muto w’igerageza hanyuma igahagarara igihe abakoresha nyabo bagaragaye, uba wahuye n’umugome: gupima. Ubuhanga bwo gukora imibonano mpuzabitsina (AI) burarikira amakuru, kubara, kwibuka, uburyo bwo gukora igenzura (bandwidth) n’uburyo budasanzwe. None se koko, ubushobozi bwo gukora igenzura (AI Scalability) ni iki, kandi wabubona ute udakoresheje uburyo bwo kongera kwandika ibintu byose buri cyumweru?
Ingingo ushobora gukunda gusoma nyuma y'iyi:
🔗 Gusobanura mu buryo bworoshye ni iki?
Menya uburyo urwikekwe rwihishe rugira uruhare mu byemezo bya AI n'ibisubizo by'icyitegererezo.
🔗 Ubuyobozi bw'abatangira: ubwenge bw'ubukorano ni iki?
Incamake y'ubukorano bushingiye ku bwenge, ibitekerezo by'ingenzi, ubwoko, n'ikoreshwa rya buri munsi.
🔗 Ni iki gisobanuro cy'ubukorano bushingiye ku ikoranabuhanga (AI) n'impamvu ari ingenzi?
Menya uburyo ubuhanga bwo gukora imibonano mpuzabitsina (AI) busobanuka bwongera ubwisanzure, icyizere, no kubahiriza amategeko.
🔗 Ni iki gisobanura imikorere y’ubuhanga bwo gukora ibintu (AI) n’uko ikora?
Sobanukirwa ubuhanga bwo gukora imibonano mpuzabitsina (AI) buteganyijwe mbere y'igihe, uburyo busanzwe bwo gukoreshwa, inyungu, n'imbogamizi.
Uburyo bwo gukora ikoranabuhanga rya AI (AI Scalability) ni iki? 📈
bwo kugenzura amakuru (AI Scalability) ni ubushobozi bwa sisitemu ya AI bwo gucunga amakuru menshi, ubusabe, abakoresha, n'ikoreshwa ryayo mu gihe ikomeza gukora neza, kwizerwa, n'ibiciro mu buryo buciriritse. Si seriveri nini gusa, ahubwo ni imiterere y'ikoranabuhanga ikomeza gutinda hasi, umusaruro urushaho kwiyongera, kandi ireme rigakomeza kumera neza uko umurongo ugenda uzamuka. Tekereza ku bikorwa remezo biramba, moderi nziza, no kureba neza bikubwira ibirimo gushya.

Ni iki gituma AI Scalability iba nziza ✅
Iyo AI Scalability ikozwe neza, ubona:
-
Gutinda guteganyijwe munsi y'umutwaro ukabije cyangwa uhoraho 🙂
-
Umusaruro ukura hafi ugereranije n'ibikoresho byongewemo cyangwa kopi
-
Ikiguzi gikoreshwa neza kidahinduka uko umuntu abisabye
-
Ubwiza buhamye uko ibikoresho bihinduka kandi ingano igakomeza kwiyongera
-
Ituze mu mikorere riterwa no gupima, gukurikirana no gukoresha neza ibyuma bitanga serivisi nziza
Munsi y'ibanga, ibi bikunze kuvanga urwego rw'ibipimo, gukusanya, gushyingura, gupima, gutanga serivisi zikomeye, na politiki zo kurekura ibintu zijyanye n'ingengo y'imari y'amakosa [5].
Ubushobozi bwo gukora neza (AI Scalability) ugereranije n'imikorere ugereranije n'ubushobozi 🧠
-
Imikorere ni uburyo ubusabe bumwe burangira vuba mu buryo bwihariye.
-
Ubushobozi ni umubare w'ibyo byifuzo ushobora gukemura icyarimwe.
-
AI Scalability ni nko kongeramo ibikoresho cyangwa gukoresha ubuhanga buhanitse byongera ubushobozi no gutuma imikorere ihora ihindagurika - nta gukurura inyemezabuguzi yawe cyangwa pager yawe.
Itandukaniro rito, ingaruka zikomeye.
Impamvu urwego rukora muri AI na gato: igitekerezo cy'amategeko agenga urwego rukora 📚
Igisobanuro gikoreshwa cyane muri ML ya none ni uko igihombo kirushaho kwiyongera mu buryo buteganywa uko upima ingano y'icyitegererezo, amakuru, n'uburyo bwo kubara - mu buryo bufatika. Hariho kandi uburinganire bwiza bwo kubara hagati y'ingano y'icyitegererezo n'ibimenyetso by'amahugurwa; gupima byombi hamwe biruta gupima kimwe gusa. Mu by'ukuri, ibi bitekerezo bitanga amakuru ku ngengo y'imari y'amahugurwa, igenamigambi ry'amakuru, no gutanga inyungu [4].
Guhindura byihuse: kinini gishobora kuba cyiza kurushaho, ariko gusa iyo ukoresheje uburyo bwo gupima no kubara mu buryo bungana - bitabaye ibyo ni nko gushyira amapine ya traktori ku igare. Bisa cyane, ntaho bijya.
Utambitse ugereranije n'uhagaze: ibyuma bibiri byo gupima 🔩
-
Gupima uburebure : udusanduku tunini, GPU nziza, ububiko bwinshi. Byoroshye, rimwe na rimwe birahenze. Ni byiza ku myitozo ya node imwe, gupima ubukererwe buke, cyangwa iyo moderi yawe yanze gukata neza.
-
Gupima mu buryo butambitse : kopi nyinshi. Bikora neza hamwe na autoscalers bongeramo cyangwa bakuraho pods hashingiwe kuri CPU/GPU cyangwa ibipimo bya porogaramu byihariye. Muri Kubernetes, HorizontalPodAutoscaler ipima pods mu buryo bukurikije icyifuzo - kugenzura urwego rw'ibanze rw'abantu ku kwiyongera kw'imodoka [1].
Inkuru y'ingenzi (iy'ibice): Mu gihe cyo gutangiza porogaramu ikomeye, gutuma seriveri ihuza urubuga rwayo kandi ikareka autoscaler igakora ku burebure bw'umurongo wa queue, p95 nta gihindutse ku mukiriya. Intsinzi zitagaragara ziracyari intsinzi.
Urusobe rwuzuye rwa AI Scalability 🥞
-
Urukurikirane rw'amakuru : kubika ibintu byihuse, ibipimo by'amakuru bya vector, no gushyira amashusho kuri interineti bitazahungabanya abarimu bawe.
-
Icyiciro cy'amahugurwa : imiterere n'abagena gahunda bakwirakwiza amakuru/icyitegererezo, kugenzura, kongera kugerageza.
-
Uburyo bwo gutanga serivisi : igihe cyo gukoresha neza, uburyo bwo guhuza ibintu (dynamic batching) , uburyo bwo kwita ku mapaji ya LLM, uburyo bwo kubika amakuru, uburyo bwo kuyakoresha mu gushakisha no kuyakoresha. Triton na vLLM ni intwari zikunze kugaragara hano [2][3].
-
Orchestration : Kubernetes zo gukurura hakoreshejwe HPA cyangwa autoscalers zihariye [1].
-
Uburyo bwo kwitegereza : ibimenyetso, ibipimo, n'inyandiko zikurikirana ingendo z'abakoresha n'imyitwarire y'icyitegererezo mu musaruro; shushanya ibi bikurikira SLO zawe [5].
-
Imiyoborere n'ikiguzi : ubukungu, ingengo y'imari, n'impinduka ku mirimo isabwa.
Imbonerahamwe yo kugereranya: ibikoresho n'imiterere ya AI Scalability 🧰
Kutagera ku ntego bihari - kuko ubuzima nyabwo ari bwo.
| Igikoresho / Ishusho | Abareba | Igiciro gikwiye | Impamvu bikora | Inyandiko |
|---|---|---|---|---|
| Kubernetes + HPA | Amakipe yo kuri platform | Isoko rifunguye + infra | Ibipimo bipima imiterere y'ibipimo mu buryo butambitse uko ibipimo bizamuka | Ibipimo byihariye ni zahabu [1] |
| NVIDIA Triton | Igitekerezo SRE | Seriveri y'ubuntu; GPU $ | Dynamic batching yongera umusaruro | Gushyiraho ukoresheje config.pbtxt [2] |
| vLLM (PagedAttention) | Amakipe ya LLM | Isoko ifunguye | Uburyo bworoshye bwo gukoresha KV-cache paging | Ni byiza cyane ku byifuzo birebire [3] |
| ONNX Runtime / TensorRT | Abahanga mu by'ubwenge | Ibikoresho by'ubuntu / by'abacuruzi | Gutunganya urwego rwa kernel bigabanya gutinda | Inzira zo kohereza hanze zishobora kuba mbi cyane |
| Imiterere ya RAG | Amakipe ya porogaramu | Infra + igipimo | Ishyira ubumenyi mu buryo bwo kubushakisha; igabanya umubare w'ibipimo | Ni nziza cyane kugira ngo irusheho kuba nshya |
Kwibira mu buryo bwimbitse 1: Gutanga amayeri yo kwimura urushinge 🚀
-
uburyo bwa Dynamic batching groups small inference calls mu matsinda manini kuri seriveri, byongera cyane ikoreshwa rya GPU nta mpinduka z'umukiriya [2].
-
Kwita ku rubuga bikomeza kwibuka ibiganiro byinshi binyuze mu gushyira kuri page ya KV caches, ibyo bikaba byongera umusaruro ukurikije uburyo busanzwe [3].
-
Saba guhuza no kubika amakuru kugira ngo ubone ubutumwa bumwe cyangwa uburyo bwo kuyashyiramo amakuru budasa.
-
Gusobanura amakuru mu buryo bw’ibitekerezo no kuyakoresha mu buryo bw’amajwi bigabanya igihe umuntu ategereje, nubwo isaha yo ku rukuta igabanuka cyane.
Kwibira mu buryo bwimbitse 2: Uburyo bwiza bwo gukora neza - gupima, gukamura, gukata 🧪
-
Gupima bigabanya ubuziranenge bwa parameter (urugero, 8-bit/4-bit) kugira ngo bigabanya ububiko bw'amakuru no kwihutisha icyerekezo; buri gihe ongera usuzume ireme ry'imirimo nyuma y'impinduka.
-
Guhindura ubumenyi kuva ku mwarimu munini kugera ku munyeshuri muto igikoresho cyawe gikunda.
-
Gukata ibiti mu buryo bw'imiterere bigabanya uburemere/imitwe idafite uruhare runini.
Tuvugishe ukuri, ni nko kugabanya ivalisi yawe hanyuma ugasaba ko inkweto zawe zose zikomeza kumera neza. Mu buryo bumwe na bumwe, cyane cyane.
Kwibira mu buryo bwimbitse 3: Gupima amakuru n'amahugurwa nta gucika intege 🧵
-
Koresha amahugurwa akwirakwijwe ahisha ibice bito by'ubusumbane kugira ngo ubashe kohereza igerageza vuba.
-
Wibuke ayo mategeko yo kugabanya ingano : shyira ingengo y'imari mu bunini bw'icyitegererezo n'amatoni witonze; gushyira hamwe byombi hamwe ni byiza mu kubara [4].
-
Integanyanyigisho n'ubuziranenge bw'amakuru akenshi bihindura umusaruro kurusha uko abantu babyemera. Hari igihe amakuru meza aruta amakuru menshi - nubwo waba waramaze gutumiza itsinda rinini.
Kwinjira mu buryo bwimbitse 4: RAG nk'ingamba zo kwagura ubumenyi 🧭
Aho kongera gutoza moderi kugira ngo ijyane n'ibintu bihinduka, RAG yongeraho intambwe yo kugarura ibintu mu buryo bw'ikigereranyo. Ushobora gukomeza moderi ihamye no kwagura urutonde n'ibisubizo uko umubiri wawe ugenda ukura. Ni nziza kandi akenshi ihendutse kuruta gusubiramo ibintu byose kuri porogaramu zikoresha ubumenyi bwinshi.
Kwitegereza byishyura 🕵️♀️
Ntushobora gupima icyo utabona. Ibintu bibiri by'ingenzi:
-
Ibipimo byo gutegura ubushobozi no gupima ikoranabuhanga: percentiles za latency, uburebure bw'umurongo, ububiko bwa GPU, ingano y'itsinda, uburyo bwo gushyiramo token, igipimo cyo gukubita cache.
-
Imiterere ikurikira ubusabe bumwe mu nzira → gushaka → icyitegererezo → gutunganya nyuma. Huza ibyo upima n'ibigo byawe bya SLO kugira ngo dashboard zisubize ibibazo mu gihe kiri munsi y'umunota umwe [5].
Iyo utubati dusubiza ibibazo mu munota umwe, abantu baradukoresha. Iyo batabikoze, bahita biyita ko babikoze.
Uburyo bwo kurinda umutekano: SLO, ingengo y'imari y'amakosa, ishyirwa mu bikorwa ry'ibikorwa neza 🧯
-
Sobanura SLOs ku gutinda, kuboneka, n'ubwiza bw'umusaruro, kandi ukoreshe ingengo y'imari y'amakosa kugira ngo uhuze ubwizerwe n'umuvuduko wo kurekura [5].
-
Shyira mu bikorwa byo gutandukanya imodoka, kora inyoni zo mu bwoko bwa canaries, kandi ukore ibizamini by'igicucu mbere yuko ugabanya imodoka ku isi. Ejo hazaza hawe hazaza hazakoherereza utuntu two kurya.
Kugenzura ibiciro nta kibazo 💸
Gupima si tekiniki gusa; ni imari. Fata amasaha ya GPU na tokeni nk'umutungo wo mu rwego rwa mbere ukoresheje ubukungu bw'ibice (ikiguzi kuri tokeni ya 1k, kuri buri embedding, kuri buri vector query). Ongeraho ingengo y'imari no gutanga integuza; wishimire gusiba ibintu.
Inzira yoroshye yo kugera ku bushobozi bwo gukora neza (AI Scalability) 🗺️
-
Tangira na SLOs kugira ngo urebe ko p95 yatinze, ko iboneka, n'uko akazi kagenda neza; ibipimo by'umurongo/ibipimo ku munsi wa mbere [5].
-
Hitamo agace ko gutanga gashyigikira gukusanya no gukusanya buri gihe: Triton, vLLM, cyangwa ibingana nabyo [2][3].
-
Ongera uburyo bwo gukora icyitegererezo : pima aho gifasha, shyiramo utubuto twihuse, cyangwa ukoreshe imirimo runaka; genzura ubuziranenge ukoresheje evals nyazo.
-
Umuhanga mu by'ubwubatsi mu bijyanye no guhindagurika : Kubernetes HPA ifite ibimenyetso bikwiye, inzira zitandukanye zo gusoma no kwandika, hamwe n'amakopi adafite imiterere [1].
-
Jya ukoresha uburyo bwo kugarura ibintu mu gihe ubushya ari ngombwa kugira ngo wongere urugero rwawe aho kongera imyitozo buri cyumweru.
-
Funga umurongo ujyanye n'ikiguzi : shyiraho ubukungu bw'ibice n'isuzuma rya buri cyumweru.
Uburyo busanzwe bwo gutsindwa no gukosora byihuse 🧨
-
GPU ikoreshwa ku kigero cya 30% mu gihe gutinda ari bibi
-
Fungura uburyo bwo gukusanya amakuru (dynamic batching) , uzamura imitwe y'amatsinda witonze, hanyuma wongere urebe uko seriveri imeze [2].
-
-
Uburyo bwo gukoresha ibintu busenyuka bitewe n'amabwiriza maremare
-
Koresha serivisi ishyigikira uburyo bwo kwita ku rubuga no gutunganya urutonde ntarengwa rw'ibintu [3].
-
-
Udupira tw'imashini zikoresha ikoranabuhanga rya Autoscaler
-
Ibipimo byoroshye ukoresheje windows; pima uburebure bw'umurongo cyangwa tokens zihariye-kuri-segonda aho gukoresha CPU yonyine [1].
-
-
Ibiciro byariyongereye cyane nyuma yo gutangira
-
Ongeraho ibipimo by'ikiguzi ku rwego rw'ubusabe, ohereza ingero aho zitekanye, fata ibibazo by'ingenzi mu bubiko, kandi ushyireho igiciro ntarengwa ku barenganyijwe cyane.
-
Igitabo cy'imikino cya AI Scalability: urutonde rwihuse ✅
-
Ingengo y'imari ya SLO n'amakosa birahari kandi biragaragara
-
Ibipimo: gutinda, tps, GPU mem, ingano ya batch, token/s, cache hit
-
Ibimenyetso kuva ku kwinjira kugeza ku gishushanyo mbonera kugeza ku nyuma y'igikorwa
-
Gutanga: gushyiramo ibice, gutunganya ibintu bihuye, gushyushya ububiko
-
Icyitegererezo: cyapimwe cyangwa cyahinduwe aho gifasha
-
Infra: HPA yashyizweho n'ibimenyetso bikwiye
-
Inzira yo kugarura ubumenyi bushya
-
Ubukungu bw'ishami bukunze gusuzumwa
Igihe kirekire cyane sinagisomye n'amagambo ya nyuma 🧩
Uburyo bwo gupima amakuru bwa AI si ikintu kimwe cyangwa uburyo bwo guhindura amakuru mu ibanga. Ni ururimi rw'icyitegererezo: gupima amakuru mu buryo butambitse hamwe na autoscalers, gukusanya amakuru ku ruhande rwa seriveri kugira ngo ikoreshwe, gukoresha neza urwego rw'icyitegererezo, kubona amakuru kugira ngo ushyire ahagaragara ubumenyi, no kureba ibintu bituma ikoreshwa ry'amakuru rirambirana. Shyiramo SLOs n'isuku ihendutse kugira ngo abantu bose bakomeze kumera neza. Ntabwo uzabigeraho neza ku nshuro ya mbere - ntawe ubikora - ariko hamwe n'uburyo bwo gutanga ibitekerezo bukwiye, sisitemu yawe izakura nta kintu cyo kwishima cyane saa mbiri za mu gitondo 😅
Amareferensi
[1] Kubernetes Docs - Horizontal Pod Autoscaling - soma byinshi
[2] NVIDIA Triton - Dynamic Batcher - soma byinshi
[3] Inyandiko za vLLM - Kwita ku rupapuro - soma byinshi
[4] Hoffmann et al. (2022) - Guhugura Ingero z'Ururimi Runini rw'Ikoranabuhanga - soma byinshi
[5] Igitabo cy'akazi cya Google SRE - Gushyira mu bikorwa SLOs - soma byinshi