TL;DR
A software engineer demonstrates running smaller AI models locally on a 24GB M4 MacBook Pro. While not comparable to state-of-the-art models, it offers a usable experience for basic tasks, reducing reliance on cloud services.
A software engineer has demonstrated that it is possible to run certain smaller AI models locally on a 24GB M4 MacBook Pro, enabling basic tasks without internet access.The engineer experimented with various models and setup options, ultimately achieving a workable configuration with Qwen 3.5 9B (Q4) running on LM Studio. This model can perform tasks such as code suggestions and research, but it does not match the capabilities of larger, state-of-the-art models. The setup involves complex configuration, including model selection, inference settings, and enabling features like ‘thinking’ mode. Performance is limited to approximately 40 tokens per second, and the model sometimes gets distracted or loops. Despite these limitations, the setup allows users to reduce dependence on cloud-based AI services and operate offline, which is especially relevant for privacy-conscious users or those with limited internet bandwidth.
Why It Matters
This development shows that accessible hardware like a 24GB M4 MacBook Pro can handle smaller AI models effectively, opening possibilities for offline AI use, privacy preservation, and reduced reliance on large cloud providers. While not replacing high-end AI, it democratizes basic AI tasks for more users, especially developers and researchers, and highlights the growing viability of local AI deployment.
MacBook Pro 24GB RAM external GPU
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Background
Recent advances in AI model compression and optimization have made it feasible to run smaller models locally. Previous efforts focused on cloud-based solutions; however, hardware improvements and software tools now enable more capable local inference. The experiment aligns with ongoing trends toward edge AI and privacy-focused computing. The engineer’s setup builds on existing open-source tools like llama.cpp, LM Studio, and Pi, which facilitate local model deployment. Prior to this, most users relied on remote cloud services for AI tasks, with local options limited to very small models or requiring specialized hardware.
“It’s surprisingly good for something that can run on a 24GB MacBook Pro while leaving space for other applications.”
— Johanna Larsson, Software Engineer
“While it’s not as powerful as SOTA models, it encourages a more engaged workflow and reduces dependency on big tech cloud services.”
— Johanna Larsson
local AI model deployment software
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
What Remains Unclear
It is not yet clear how well this setup will scale with different models or tasks, or how it performs over extended use. The performance and stability may vary depending on hardware, configuration, and model choice, and further testing is needed to establish broader applicability.
AI inference hardware for MacBook
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
What’s Next
Further experimentation will focus on optimizing configurations, testing additional models, and assessing long-term stability. Developers may explore automating setup processes and improving model performance, with potential community sharing of best practices. Future updates could include support for larger context windows or more advanced model features as hardware and software tools evolve.
edge AI development tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Key Questions
Can I run larger models on my MacBook Pro with 24GB RAM?
Currently, only smaller models like Qwen 3.5 9B (Q4) are feasible. Larger models require more memory and computational power, making them impractical on this hardware.
What are the main challenges in setting up local models?
Configuring the model, enabling features like ‘thinking’ mode, and optimizing inference settings require technical expertise and trial-and-error. Compatibility issues and performance tuning are common hurdles.
Does running models locally compromise their capabilities compared to cloud-based models?
Yes, smaller local models typically lack the complexity and long-term reasoning abilities of state-of-the-art cloud models. They are suitable for basic tasks and research but not for solving complex, multi-step problems.
What are the benefits of running models locally?
Offline operation, enhanced privacy, reduced reliance on internet connectivity, and potential cost savings are key benefits. It also allows more control over the environment and data.