Run your own LLM on ...
Run your own LLM on Hugging Face in 7 steps
![Warren Bickley Avatar](/_next/image/?url=https%3A%2F%2Fmedia.eject.tech%2Fwarren-crayola-red.png&w=3840&q=75)
Warren Bickley 27 August 2023 | 27 August 2023
![Screenshot of the Hugging Face hero banner on the home page](/_next/image/?url=https%3A%2F%2Fmedia.eject.tech%2FScreenshot%202023-08-27%20at%2015.27.11.png&w=3840&q=75)
TL;DR
Running Open Source LLMs is easy with Hugging Face's inference endpoints. The future of both software and work incorporates AI in various capacities.
Step 1: Create a Hugging Face account
To get started, you will need to create an account on the Hugging Face website or log in if you already have an account.
![Screenshot of the Hugging Face sign up page](/_next/image/?url=https%3A%2F%2Fmedia.eject.tech%2FScreenshot%202023-08-15%20at%2008.21.07.png&w=3840&q=75)
Step 2: Choose a pre-trained model
Hugging Face hosts a variety of pre-trained models that you can run with. If this is your first time working with LLMs, we’d recommend any of the Meta Llama 2 models - with the 7b being the most cost efficient to run.
You can find models in the Hugging Face model repository, once you have chosen one you should copy the model name by clicking the clipboard icon next to the name.
![Screenshot of the Llama-2-7b model on the Hugging Face model repository](/_next/image/?url=https%3A%2F%2Fmedia.eject.tech%2FScreenshot%202023-08-15%20at%2008.21.54.png&w=3840&q=75)
Step 3: Create a new inference endpoint
With the model name you want to run now on your clipboard, you can go to Hugging Face’s Inference Endpoints by clicking “Solutions” in the top menu and then “Inference Endpoints”. From there you can start the process of creating your own inference endpoint.
![Screenshot of the menu with a link to the inference endpoints management page.](/_next/image/?url=https%3A%2F%2Fmedia.eject.tech%2FScreenshot%202023-08-15%20at%2008.22.05.png&w=3840&q=75)
Step 4: Enter your model details
In the “model repository” field you can now paste in the model name you chose earlier. You should give your endpoint a unique name too.
![Screenshot of the endpoint creation screen.](/_next/image/?url=https%3A%2F%2Fmedia.eject.tech%2FScreenshot%202023-08-15%20at%2008.22.53.png&w=3840&q=75)
Step 5: Choose an instance type
You can choose the cloud platform you wish to run the model on, but we’d recommend sticking with AWS us-east-1 for now due to the wider choice of instance types to run on. This usually only matters when running massive models (~70B for example).
GPU [medium] should be enough for you to run a 7B model on. If the platform doesn’t think your instance type is capable of running the model then it will present you with a helpful warning too so you can click through and experiment with what works.
![Screenshot of the select instance type menu.](/_next/image/?url=https%3A%2F%2Fmedia.eject.tech%2FScreenshot%202023-08-15%20at%2008.23.10.png&w=3840&q=75)
Step 6: Enable scale-to-zero (optional)
Hugging Face have a really handy “scale-to-zero” option which we highly recommend for non-production environments (and even production environments if handled correctly!). What this does is shut your instance down if there have been no requests for a specified period of time (currently only 15 minutes available). If you try to make a request whilst it is down, the gateway will return a status code of 502 and start the instance. This takes some time, but if handled well in the UI you can present the user with this information and auto-retry until it comes back online.
![Screenshot of the scale-to-zero options in Hugging Face.](/_next/image/?url=https%3A%2F%2Fmedia.eject.tech%2FScreenshot%202023-08-15%20at%2008.23.26.png&w=3840&q=75)
Step 7: Make a request to your model
With your model now running you can scroll to the bottom of the model page and it will give you call examples in different languages for you to experiment with.
If you’re using a tool like Postman, you can copy the cURL request and import it (File → Import…) for easy experimentation. You can also click “Add API token” within the Hugging Face to automatically add your API to the request code.
![Screenshot of the call examples.](/_next/image/?url=https%3A%2F%2Fmedia.eject.tech%2FScreenshot%202023-08-26%20at%2010.44.33.png&w=3840&q=75)
Running your own LLM couldn’t be easier thanks to Hugging Face’s intuitive platform, it’s a great way to get started with AI and massively reduces the barrier to entry.