Changes

976 bytes removed , 23 February

m

no edit summary

Line 1: Line 1: −

That ~~design~~ was [http://~~190~~.~~205~~.~~35.131~~ trained] in part [https://~~dubaiclub~~.~~shop~~ utilizing] their [https://~~medecins-malmedy~~.be unreleased] R1 "thinking" design. Today they~~'ve [https://cosmeticsworld.org released]~~ R1 itself, ~~along with a whole [http://www.konkretfoto.pl household] of [https://3dgameshop.ru brand-new designs]~~ obtained from that base. There's a ~~lot~~ of ~~stuff~~ in the new [https://~~zubtalk~~.com release]. DeepSeek-R1-Zero seems the ~~[https://rethinkresearch.org~~ base model]. It's over 650GB in size and, like ~~most of~~ their other releases, is under a ~~tidy~~ MIT license. ~~[https://nusaeiwyj.com~~ DeepSeek warn] that "DeepSeek-R1-Zero ~~encounters~~ challenges such as unlimited repetition, ~~poor~~ readability, and language ~~mixing~~." ... so they ~~also~~ launched: DeepSeek-R1-which "incorporates cold-start information before RL" and "attains ~~efficiency~~ comparable to OpenAI-o1 across math, code, and thinking tasks". That one is ~~also~~ MIT ~~accredited~~, and is a ~~similar~~ size. I ~~do not~~ have the [https://~~asian-tiger~~.~~click~~ capability] to run [~~http~~://~~jjrun~~.kr designs bigger] than about 50GB (I have an M2 with 64GB of RAM), so neither of these two ~~[http://itececuador.org designs]~~ are something I can ~~quickly~~ have fun with myself. That's where the ~~new~~ [https://www.~~latorretadelllac~~.~~com~~ distilled ~~designs~~] are available in. To ~~[https://gogs.artapp.cn~~ support] the research ~~community~~, we have ~~[http://arctoa.ru~~ open-sourced] DeepSeek-R1-Zero, DeepSeek-R1, and ~~six~~ [~~http~~://~~only-good-news~~.ru dense designs] [~~https~~://~~yourdietitianlima~~.~~com~~ distilled] from DeepSeek-R1 based on Llama and Qwen. This is a ~~[https://git.vg.tools fascinating~~ flex]! They have ~~actually~~ [https://~~www~~.~~sgomberimilano.eu designs~~ based] upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and ~~[https://wiki.rolandradio.net/index.php?title=User:SophiaWilkin wiki.rolandradio.net] Llama 3~~ (Llama-3.1 8B and Llama 3.3 70B Instruct). [https://~~pawsandplay~~.co.nz Weirdly] those Llama ~~designs~~ have an MIT license ~~attached~~, which I'm uncertain ~~works~~ with the [http://~~www~~.~~real-moyki~~.ru underlying Llama] license. Qwen models are Apache ~~accredited~~ so ~~maybe~~ MIT is OK? (I ~~likewise [https://veengy.net~~ simply ~~discovered]~~ the MIT license files ~~state~~ "Copyright (c) 2023 DeepSeek" so they may ~~require~~ to pay a bit more [~~http~~://~~illinoistransplantfund~~.~~org~~ attention] to how they copied those in.) Licensing aside, these ~~[http://datamountaincmcastelli.it~~ distilled designs] are ~~[https://www.clickgratis.com.br remarkable beasts].~~ [https://~~dalco~~.be Running] DeepSeek-R1-Distill-Llama-8B-GGUF ~~[https://wiki.snooze-hotelsoftware.de~~ Quantized ~~versions]~~ are already ~~beginning~~ to ~~reveal up~~. ~~So far~~ I've ~~tried just among~~ those- unsloth/DeepSeek-R 1-Distill-Llama-8[~~https~~://~~git.rocketclock~~.~~com~~ B-GGUF ~~released~~] by [~~http~~://~~web~~.~~unhas.ac.id~~ Unsloth] [https://~~laboryes~~.~~com~~ AI]-and it's ~~truly~~ [~~http~~://~~www~~.~~otofacesp.com.br~~ enjoyable] to ~~play~~ with. I'm ~~running it~~ [https://www.~~kouzoulos~~.~~gr utilizing~~] ~~the~~ [https://~~babymonitorsource~~.com ~~combination~~] of Ollama, LLM and the [https://~~janowiak~~.~~com~~.pl llm-ollama plugin]. First I [https://~~sahebgroup~~.in brought] and ran the ~~[https://internationalstockloans.com design utilizing]~~ Ollama itself: This [~~https~~://~~frederickexport~~.com downloads] around 8.5 GB of ~~model information~~ and starts an [http://~~www~~.~~fkbit~~.~~com~~ interactive chat] ~~[http://gifu-pref.com~~ interface]. Once the ~~model~~ has been ~~fetched~~, LLM can ~~speak with~~ it as well. I ~~prefer~~ using LLM for ~~[https://tadomalkine.eu~~ experiments] because it logs everything to SQLite for later exploration. I installed the plugin and ran it like this: Each ~~[https://www.serxerri.com~~ response~~] starts~~ with a ... ~~[https://routingtable.cloud~~ pseudo-XML] tag containing the chain of thought ~~[http://parafiasuchozebry.pl utilized] to [https://mommyistheboss.com assist produce] the~~ action. For "a joke about a pelican and a walrus who run a tea room together" that ~~began~~ like this: Okay, so I ~~require~~ to come up with a joke about a pelican and a ~~[https://mylenalima.adv.br~~ walrus running] a ~~[http://www.hirlevel.wawona.hu tea space]~~ together. Hmm, that's an [http://~~114~~.~~116~~.~~15.2273000~~ intriguing~~] [https://www.istorya.net combination~~]. Let me think ~~about~~ how these ~~two~~ characters may ~~[http://175.24.227.240 communicate] in a [http://mmafa.tv funny circumstance]~~. First, let's consider their ~~[http://imatoncomedica.com qualities]~~. [~~http~~://git.~~vimer~~.~~top3000~~ Pelicans] are ~~[https://somosdequisqueya.com understood] for~~ their long beaks and Webbed feet, ~~typically~~ seen near the beach or water. [https://www.~~dentdigital~~.~~com~~ Walruses] have big teeth, thick fur, and they're ~~normally~~ discovered in ~~chillier~~ environments, like [~~https~~://~~whitehousesprings~~.com icebergs] or snowy areas. So, [~~http~~://~~evelinekaeshammer~~.ch integrating] these 2 into a tea space setting is ~~currently~~ a funny image. [...] It then believed for 20 paragraphs before [~~http~~://~~www~~.~~hirlevel~~.~~wawona.hu~~ outputting] the joke! Here's the complete [https://~~coding~~.~~activcount.info~~ transcript] and here's the output at the end, after the tag: The Joke: A [https://~~www~~.~~learninghub.cz~~ pelican] and a ~~[https://gitee.mmote.ru~~ walrus ~~decide]~~ to open a quaint little tea ~~room~~ together. The walrus, with its huge size, ~~[https://sakura-clinic-hakata.com~~ struggles] to find ~~comfy~~ chairs, so it ~~rests~~ on the table by ~~accident~~, ~~knocking over~~ the teapot. Meanwhile, the pelican, trying to assist, uses its beak to ~~[http://whenyourerightyoureright.com position dishes]~~ on the table, ~~[https://digitalactus.com causing]~~ a ~~few~~ spills. After a series of [~~http~~://~~catuireland~~.~~org humorous~~] ~~incidents, the walrus looks up and~~ states with a grin, "This isn't so fishy ~~any longer~~." The ~~[http://blog.allin.com.br~~ pelican smirks] and says, "Maybe not, ~~but~~ we do have a ~~great deal~~ of krill in our tea!" Explanation: - Setup: [~~https~~://~~git~~.~~agri-sys.com~~ Pelican] and Walrus run an unlikely tea ~~room~~. Their sizes ~~result in comical mayhem~~. - Punchline: "This isn't so fishy any longer" is a pun on the ~~circumstance becoming~~ less ~~absurd~~. The [https://~~www~~.~~dr-schedu~~.~~com~~ follow-up joke] presents "krill," a play on words that ~~[http://111.231.76.912095 connects]~~ into both ~~[https://gitea.createk.pe~~ marine life] ("fish") and tea, making it ~~[https://vidstreamr.com amusing]~~ while ~~[https://acesnorthbay.com~~ fitting] the style. So yeah, the joke is ~~awful~~. But the [https://www.~~itcvertebraljundiai~~.com~~.br process~~] ~~of getting there~~ was such a [https://www.~~globalshowup~~.com ~~fascinating~~ insight] into how these [~~http~~://~~macrocc~~.~~com3000~~ brand-new] ~~models~~ work. This was a fairly ~~little~~ 8B ~~model~~. I'm [https://~~academy~~.~~nandrex~~.~~com eagerly anticipating~~] trying ~~out~~ the Llama 70B ~~version~~, which isn't yet available in a GGUF I can run with Ollama. Given the strength of Llama 3.3 70B-currently my [~~https~~://~~www.boatcareer~~.com ~~favourite~~] GPT-4 class design that I've ~~operated~~ on my own machine-I have high [https://~~luciamattituck~~.~~com~~ expectations]. Update 21st January 2025: I got this quantized ~~version~~ of that Llama 3.3 70B R1 ~~[https://www.rovandesign.nl distilled design] working~~ like this-a 34GB download: Can it draw a pelican? I ~~tried my~~ [http://~~neurostim2016~~.~~inria.fr timeless~~ Generate] an SVG of a pelican riding a ~~bicycle prompt~~ too. It did refrain from doing ~~really well~~: It aimed to me like it got the order of the ~~aspects wrong~~, so I followed up with: the [https://~~pecanchoice~~.com background] ended up ~~[http://csbio2019.inria.fr~~ covering] the remainder of the image It ~~thought~~ some more and ~~provided~~ me this: ~~Just like~~ the earlier joke, the chain of thought in the ~~[https://kristiemarcotte.com transcript]~~ was ~~much~~ more ~~interesting~~ than ~~the end result.~~ Other ~~ways~~ to attempt DeepSeek-R1 If you ~~wish~~ to ~~try~~ the model out without ~~installing~~ anything ~~at all~~ you can do so using chat.deepseek.~~com-you'll~~ [https://~~gan-bcn~~.com require] to ~~produce an~~ [http://~~dreamlifefreedom~~.~~com account~~] (sign in with Google, [~~http~~://~~atsh~~.com ~~utilize] an~~ email address or offer a ~~[https://sakura-clinic-hakata.com~~ Chinese] +86 ~~[http://kaminskilukasz.com telephone] number)~~ and after that pick the "DeepThink" ~~[https://omoh.eu alternative listed]~~ below the ~~[https://www.nekoramen.fr prompt~~ input] box. [http://47.~~119~~.~~27.838003~~ DeepSeek] use the ~~design~~ via their API, using an ~~[https://www.aaaadentistry.com OpenAI~~-imitating endpoint]. You can access that via LLM by ~~[https://handa-city.net~~ dropping] this into your ~~[https://www.fit7fitness.com~~ extra-openai-models]. [https://~~ottermann~~.~~rocks yaml configuration~~] file: Then run [~~http~~://~~alltheraige~~.~~com llm secrets~~] ~~set~~ [https://www.~~nekoramen~~.~~fr deepseek~~] ~~and~~ paste in your API ~~secret~~, then ~~use~~ llm -m ~~[https://asian~~-~~tiger.click~~ 'timely'] to run ~~[https://ocp.uohyd.ac.in prompts]~~. This won't reveal you the ~~[https://www.skybirdint.com reasoning]~~ tokens, ~~sadly~~. Those are served up by the API (example here) ~~however~~ LLM does not yet have a ~~method~~ to ~~display~~ them.

+

That model was [http://www.himanshujha.net trained] in part [https://asw.alma.cl utilizing] their [https://intics.ai unreleased] R1 "thinking" design. Today they have actually R1 itself, in addition to an entire family of new models obtained from that base. There's a great deal of things in the new [https://jalilafridi.com release]. DeepSeek-R1-Zero seems the base model. It's over 650GB in size and, like the [https://asenquavc.com majority] of their other releases, is under a clean MIT license. DeepSeek warn that "DeepSeek-R1-Zero comes across challenges such as unlimited repetition, bad readability, and language blending." ... so they likewise launched: DeepSeek-R1-which "incorporates cold-start information before RL" and "attains performance comparable to OpenAI-o1 across math, code, and thinking tasks". That one is likewise MIT licensed, and is a comparable size. I don't have the [https://libisco.com capability] to run [https://rareplay.net designs bigger] than about 50GB (I have an M2 with 64GB of RAM), so neither of these two models are something I can easily have fun with myself. That's where the [https://www.smartfrakt.se brand-new distilled] models are available in. To support the research study neighborhood, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and 6 [https://spacedj.com dense designs] [http://www.snet.ne.jp distilled] from DeepSeek-R1 based on Llama and Qwen. This is a remarkable flex! They have [https://erryfink.com models based] upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct). [https://bodyspecs.com.au Weirdly] those [http://greenmk.co.kr Llama models] have an MIT license connected, which I'm [https://monathemannequin.com uncertain] is compatible with the [http://git.anitago.com3000 underlying Llama] license. [https://heartrova.com Qwen models] are Apache [http://festatable.com certified] so perhaps MIT is OK? (I also simply saw the MIT license files say "Copyright (c) 2023 DeepSeek" so they may need to pay a little bit more [https://www.citychurchlax.com attention] to how they copied those in.) Licensing aside, these distilled designs are fascinating beasts. [https://becalm.life Running] DeepSeek-R1-Distill-Llama-8B-GGUF Quantized variations are already starting to appear. Up until now I've [http://git.anitago.com3000 attempted simply] among those- unsloth/[https://doctorately.com DeepSeek-R] 1-Distill-Llama-8[http://arkisafe.dk B-GGUF launched] by [https://rsmdomesticappliances.com Unsloth] [https://kontrole-sidorowicz.pl AI]-and it's really [https://beloose.nl enjoyable] to have fun with. I'm [https://www.olsitec.de running] it [https://torreondefuensanta.com utilizing] the mix of Ollama, LLM and the [https://www.prokrug.ba llm-ollama plugin]. First I [https://tooltech.com.my brought] and ran the design using Ollama itself: This [http://jelodari.com downloads] around 8.5 GB of design data and starts an [http://ummuharun.blog.rs interactive chat] interface. Once the design has actually been brought, LLM can talk to it as well. I choose using LLM for experiments because it logs everything to SQLite for later exploration. I installed the plugin and ran it like this: Each response begins with a ... pseudo-XML tag containing the chain of thought utilized to help create the action. For "a joke about a pelican and a walrus who run a tea room together" that started like this: Okay, so I need to come up with a joke about a pelican and a walrus running a tea room together. Hmm, that's an [http://www.pgibuy.com intriguing mix]. Let me think of how these 2 characters may communicate in an amusing situation. First, let's consider their attributes. [https://git.moseswynn.com Pelicans] are known for their long beaks and Webbed feet, [http://veruproveru.tv frequently] seen near the beach or water. [https://www.masseriapietrascritta.it Walruses] have big teeth, thick fur, and they're generally [https://cloudexisinfo.com discovered] in cooler environments, like [http://jtwpmc.com icebergs] or snowy areas. So, [https://originally.jp integrating] these two into a [https://playtube.evolutionmtkinfor.online tea space] [http://esk-cityfinanz.de setting] is already a funny image. [...] It then believed for 20 paragraphs before [https://git.yingcaibx.com outputting] the joke! Here's the complete [https://spacedj.com transcript] and here's the output at the end, after the tag: The Joke: A [https://shengxiluo.me pelican] and a walrus choose to open a quaint little tea space together. The walrus, with its huge size, struggles to find comfortable chairs, so it sits on the table by mishap, [https://xxxbold.com overturning] the teapot. Meanwhile, the pelican, trying to assist, uses its beak to position dishes on the table, triggering a couple of spills. After a series of comical mishaps, the [https://www.clinefloral.com walrus searches] for and states with a grin, "This isn't so fishy anymore." The pelican smirks and says, "Maybe not, however we do have a lot of krill in our tea!" Explanation: - Setup: [http://welldonetreeservice.net Pelican] and Walrus run an unlikely tea space. Their sizes lead to comedic chaos. - Punchline: "This isn't so fishy any longer" is a pun on the [https://smarch.ch scenario] ending up being less ridiculous. The [https://videos.khichdi.org follow-up joke] presents "krill," a play on words that ties into both marine life ("fish") and tea, making it funny while fitting the style. So yeah, the joke is horrible. But the process of [https://www.medexmd.com arriving] was such an interesting [https://www.architextura.com insight] into how these [https://www.pzm.ba brand-new models] work. This was a fairly small 8B design. I'm looking [https://www.modnymagazin.sk forward] to trying the Llama 70B variation, which isn't yet available in a GGUF I can run with Ollama. Given the strength of Llama 3.3 70B-currently my [http://ledisiksuslemeci.com preferred] GPT-4 class design that I've worked on my own [https://bbd-law.com machine-I] have high [https://www.conectachile.cl expectations]. Update 21st January 2025: I got this quantized variation of that Llama 3.3 70B R1 distilled model working like this-a 34GB download: Can it draw a [https://firstamendment.tv pelican]? I [http://tuyettunglukas.com attempted] my classic Generate an SVG of a pelican riding a bike timely too. It did refrain from doing [http://jelodari.com extremely] well: It aimed to me like it got the order of the components incorrect, so I followed up with: the [https://zeustrahub.osloop.com background] ended up covering the remainder of the image It believed some more and gave me this: Similar to the earlier joke, the chain of thought in the records was far more intriguing than completion outcome. Other methods to attempt DeepSeek-R1 If you want to [http://sebarundangan.web.id attempt] the model out without setting up anything you can do so using chat.deepseek.[https://ohdear.jp com-you'll require] to [http://diamantforlobet.dk produce] an [http://gift-theater.com account] (sign in with Google, use an [https://maestrolidercoach.com email address] or offer a Chinese +86 contact number) and after that pick the "DeepThink" option below the timely input box. [http://wiki.faramirfiction.com DeepSeek] use the model via their API, using an OpenAI-imitating endpoint. You can access that via LLM by dropping this into your extra-openai-models. [https://mgetbarbearia.com.br yaml setup] file: Then run llm keys set [https://www.ongradedrainage.co.nz deepseek] and [https://www.speedrunwiki.com/User:Milla12M313 speedrunwiki.com] paste in your API key, then utilize llm -m deepseek-reasoner 'timely' to run triggers. This won't reveal you the thinking tokens, unfortunately. Those are served up by the API (example here) but LLM does not yet have a way to show them.

Milla12M313

2

edits

Changes

Simon Willison s Weblog (view source)

Revision as of 08:55, 23 February 2025

Navigation menu

Page actions

Page actions

Personal tools

Search

Navigation

content

external links

affiliate

Tools