Subscribe
Sign in
Home
Notes
Chat
Free Courses
Archive
About
AI Foundations
Latest
Top
Discussions
Local LLM Inference : llama.cpp, GGUF, Quantizations and GGML Explained
Learn how the llama.cpp runtime, GGML backend concepts, and GGUF model format fit together for fast local inference across devices.
Mar 3
•
Alex Razvant
31
2
Upcoming Livestream: GPUs for AI (Shaped by You)
You can choose the topics for a Live Session on GPUs in AI
Jan 27
•
Alex Razvant
14
3
1
The Smartest AI Engineers Will Bet on This in 2026
A no-BS breakdown of where to invest your time, backed by real industry insights.
Jan 13
•
Alex Razvant
26
4
3
An AI Engineer's Guide To Choosing GPUs
A deep dive on technical Hardware and Software details of NVIDIA GPUs for AI Workloads.
Dec 7, 2025
•
Alex Razvant
30
7
My Best Recent Guides for AI Engineers
A curated list of the most actionable guides I’ve published in the past months.
Nov 29, 2025
•
Alex Razvant
22
3
Video Lesson on Advanced Multimodal AI Concepts
Low-level details about the Video Format. Contrastive Learning, CLIP Model, How VLMs work, Transformers vs CNNs and Context Learning
Sep 13, 2025
•
Alex Razvant
20
9
How to add structure to your LLM Applications using SGLang
Unpacking SGLang technicals, RadixAttention and fast decoding for Structured Output.
Apr 3, 2025
•
Alex Razvant
11
How does vLLM serve LLMs at scale?
The Online/Offline API modes, PagedAttention and distributed inference with Ray.
Mar 27, 2025
•
Alex Razvant
34
2
7
Understanding LLM Optimization Techniques
Weights quantization using GPTQ, BitsAndBytes. Parallelism techniques, KV-caching, Flash Attention and Speculative Decoding.
Mar 1, 2025
•
Alex Razvant
41
2
6
Understanding LLM Inference
Explaining LLM pre-fill and generation phases, unpacking model configuration files from HuggingFace.
Feb 20, 2025
•
Alex Razvant
125
2
19
The AI/ML Engineer's starter guide to GPU Programming
#1 Programming on GPUs from scratch by implementing CUDA Kernels in C++, CuPy Python and OpenAI Triton.
Jan 30, 2025
•
Alex Razvant
128
5
17
Complete Overview in Vision AI 2025
From Pixels to Vision Language Models. Updated periodically.
Jan 21, 2025
•
Alex Razvant
51
2
14
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts