Huggingface Chat API Adapter

[Introduction]
When using Huggingface's Serverless Inference API for a conversation, by default 100 new tokens are output and a cache is used.
This API changes these two default settings, and other parameters are consistent with the official API.

[How to use]
1. Create a token with the "Make calls to the serverless Inference API" permission as an API key.
2. Set the Base URL of the OpenAI compatible client to "https://tastypear-sia-chat-adapter.hf.space/api".
3. Use the full name of the model (e.g. mistralai/Mistral-Nemo-Instruct-2407)

[Supported models]
Most of the available models can be found HERE.
Some "cold" models may also be supported (e.g. meta-llama/Meta-Llama-3.1-405B-Instruct), please test it yourself.
Some models require a token created by a PRO user to use.

[Avoid reaching the call limit]
If you have multiple tokens, you can connect them with a semicolon (";") and the API will use a random one (e.g. "hf_aaaa;hf_bbbb;hf_...")