Liu Song’s Projects

~/Projects/ChatGLM3

git clone https://code.lsong.org/ChatGLM3

Commit

Commit

4d1c78a6a58ebe214912a4da75a4394336d5d96d

Author

Longin-Yu <[email protected]>

Date

2023-10-30 15:23:56 +0800 +0800

Diffstat

 DEPLOYMENT_en.md | 42 ++++++++
 PROMPT_en.md | 198 ++++++++++++++++++++++++++++++++++++++
 README_en.md | 200 +++++++++++++++++++++++++++++++++++++++
 composite_demo/README_en.md | 85 ++++++++++++++++
 tool_using/README_en.md | 75 ++++++++++++++

Add en doc

diff --git a/DEPLOYMENT_en.md b/DEPLOYMENT_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..46513279d8d8fa027b527c3ba911358d293342a2
--- /dev/null
+++ b/DEPLOYMENT_en.md
@@ -0,0 +1,42 @@
+## Low-Cost Deployment
+
+### Model Quantization
+
+By default, the model is loaded with FP16 precision, running the above code requires about 13GB of VRAM. If your GPU's VRAM is limited, you can try loading the model quantitatively, as follows:
+
+```python
+model = AutoModel.from_pretrained("THUDM/chatglm3-6b",trust_remote_code=True).quantize(4).cuda()
+```
+
+Model quantization will bring some performance loss. Through testing, ChatGLM3-6B can still perform natural and smooth generation under 4-bit quantization.
+
+### CPU Deployment
+
+If you don't have GPU hardware, you can also run inference on the CPU, but the inference speed will be slower. The usage is as follows (requires about 32GB of memory):
+
+```python
+model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).float()
+```
+
+### Mac Deployment
+
+For Macs equipped with Apple Silicon or AMD GPUs, the MPS backend can be used to run ChatGLM3-6B on the GPU. Refer to Apple's [official instructions](https://developer.apple.com/metal/pytorch) to install PyTorch-Nightly (the correct version number should be 2.x.x.dev2023xxxx, not 2.x.x).
+
+Currently, only [loading the model locally](README_en.md#load-model-locally) is supported on MacOS. Change the model loading in the code to load locally and use the MPS backend:
+
+```python
+model = AutoModel.from_pretrained("your local path", trust_remote_code=True).to('mps')
+```
+
+Loading the half-precision ChatGLM3-6B model requires about 13GB of memory. Machines with smaller memory (such as a 16GB memory MacBook Pro) will use virtual memory on the hard disk when there is insufficient free memory, resulting in a significant slowdown in inference speed.
+
+### Multi-GPU Deployment
+
+If you have multiple GPUs, but each GPU's VRAM size is not enough to accommodate the complete model, then the model can be split across multiple GPUs. First, install accelerate: `pip install accelerate`, and then load the model through the following methods:
+
+```python
+from utils import load_model_on_gpus
+model = load_model_on_gpus("THUDM/chatglm3-6b", num_gpus=2)
+```
+
+This allows the model to be deployed on two GPUs for inference. You can change `num_gpus` to the number of GPUs you want to use. It is evenly split by default, but you can also pass the `device_map` parameter to specify it yourself.
\ No newline at end of file




diff --git a/PROMPT_en.md b/PROMPT_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..46963605061450600736cb76f1d25274a82d54ab
--- /dev/null
+++ b/PROMPT_en.md
@@ -0,0 +1,198 @@
+## ChatGLM3 Chat Format
+To avoid injection attacks from user input, and to unify the input of Code Interpreter, Tool & Agent and other tasks, ChatGLM3 adopts a brand-new dialogue format.
+
+### Regulations
+#### Overall Structure
+The format of the ChatGLM3 dialogue consists of several conversations, each of which contains a dialogue header and content. A typical multi-turn dialogue structure is as follows:
+```text
+<|system|>
+You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.
+<|user|>
+Hello
+<|assistant|>
+Hello, I'm ChatGLM3. What can I assist you today?
+```
+
+#### Chat Header
+The chat header occupies a complete line, formatted as:
+```text
+<|role|>{metadata}
+```
+Where `<|role|>` part is represented in a special token,  which can’t be encoded by the tokenizer from the text form to prevent injection attacks. The `metadata` part is represented in plain texts and is opetional content.
+* `<|system|>`: System information, which can be interspersed in the dialogue in design, **but currently only appears at the beginning**
+* `<|user|>`: User
+  - Multiple messages from `<|user|>` will not appear continuously
+* `<|assistant|>`: AI assistant
+  - There must be a message from `<|user|>` before it appears
+* `<|observation|>`: External return result
+  - Must be after the message from `<|assistant|>`
+
+### Example Scenarios
+#### Multi-turn Dialogue
+* There are only three roles: `<|user|>`, `<|assistant|>`, and `<|system|>`.
+```text
+<|system|>
+You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.
+<|user|>
+Hello
+<|assistant|>
+Hello, I'm ChatGLM3. What can I assist you today?
+```
+
+#### Tool Calling
+````
+<|system|>
+Answer the following questions as best as you can. You have access to the following tools:
+[
+    {
+        "name": "get_current_weather",
+        "description": "Get the current weather in a given location",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "location": {
+                    "type": "string",
+                    "description": "The city and state, e.g. San Francisco, CA",
+                },
+                "unit": {"type": "string"},
+            },
+            "required": ["location"],
+        },
+    }
+]
+<|user|>
+What's the weather in Beijing today?
+<|assistant|>
+Okay, let's look up the weather in Bejing today.
+<|assistant|>get_current_weather
+```python
+tool_call(location="beijing", unit="celsius")
+```
+<|observation|>
+{"temperature": 22}
+<|assistant|>
+According to the query results, the temperature in Beijing today is 22 degrees Celsius.
+````
+
+#### Code Execution
+* Four available roles: `<|user|>`, `<|assistant|>`, `<|system|>`, and `<|observation|>`, where `metadata` of `<|assistant|>` can only be interpreter.
+
+`````text
+<|system|>
+You are an intelligent AI assistant, named ChatGLM3. You are connected to a computer, but please note that you cannot connect to the internet. When using Python to solve tasks, you can run the code and get the results. If there are errors in the running results, you need to improve the code as much as possible. You can process files uploaded by users to your computer, and the default storage path for files is /mnt/data/.
+<|user|>
+#File: /mnt/data/metadata.jsonl
+#Size: 35380
+#File uploaded
+Are there missing or abnormal values in the file?
+<|assistant|>
+Firstly, I will read the file you provided and review its contents to determine if there are missing or abnormal values.
+
+We start by loading the file:
+<|assistant|>interpreter
+```python
+import json
+
+# Load the uploaded file
+data = []
+with open("/mnt/data/metadata.jsonl", 'r') as file:
+    for line in file:
+        data.append(json.loads(line))
+
+# Display the first few records to get a sense of the data
+data[:5]
+```
+<|observation|>
+```result
+[{'file_name': 'aceinthehole.png',
+  'name': 'Ace in the Hole',
+  'type': 'survivor',
+  'description': 'Lady Luck always seems to be throwing something good your way.'},
+ {'file_name': 'adrenaline.png',
+  'name': 'Adrenaline',
+  'type': 'survivor',
+  'description': 'You are fuelled by unexpected energy when on the verge of escape.'},
+ {'file_name': 'aftercare.png',
+  'name': 'Aftercare',
+  'type': 'survivor',
+  'description': "Unlocks potential in one's aura reading ability."},
+ {'file_name': 'agitation.png',
+  'name': 'Agitation',
+  'type': 'killer',
+  'description': 'You get excited in anticipation of hooking your prey.'},
+ {'file_name': 'alert.png',
+  'name': 'Alert',
+  'type': 'survivor',
+  'description': 'Your acute senses are on high alert.'}]
+```
+<|assistant|>
+This file appears to contain metadata about certain items, each with the following fields:
+
+- `file_name`
+- `name`
+- `type` (e.g. "survivor" or "killer")
+- `description`
+
+Next, I will check for missing values in these fields.
+<|assistant|>interpreter
+```python
+# Check for missing values in each column
+missing_values = {key: sum(1 for item in data if key not in item) for key in data[0].keys()}
+
+missing_values
+```
+<|observation|>
+```result
+{'file_name': 0, 'name': 0, 'type': 0, 'description': 0}
+```
+<|assistant|>
+Each field has no missing values. This means that each entry is complete.
+
+Next, we can check for any potential outliers. This usually requires deeper analysis and understanding of the data, but I can start by examining some basic statistics and distributions. To do this, we can check the value distribution of the "type" field to see if there are any unexpected categories.
+<|assistant|>interpreter
+```python
+# Check the distribution of values in the 'type' column
+type_distribution = {key: sum(1 for item in data if item['type'] == key) for key in set(item['type'] for item in data)}
+
+type_distribution
+```
+<|observation|>
+```result
+{'survivor': 116, 'killer': 103}
+```
+<|assistant|>
+The 'type' field has two values: 'survivor' and 'killer'. Among them, 'survivor' has 116 entries, while 'killer' has 103 entries.
+
+From this preliminary examination, we did not find any obvious abnormal values.
+
+If you have any specific standards or other fields that need to be checked, I can provide you with a more detailed analysis. What else do you want me to check?
+<|user|>
+Help me draw a heart
+<|assistant|>interpreter
+```python
+import numpy as np
+import matplotlib.pyplot as plt
+
+# The parametric equation for the shape of a heart
+def heart(t):
+    x = 16 * np.sin(t) ** 3
+    y = 13 * np.cos(t) - 5 * np.cos(2 * t) - 2 * np.cos(3 * t) - np.cos(4 * t)
+    return x, y
+
+t = np.linspace(0, 2 * np.pi, 1000)
+x, y = heart(t)
+
+plt.figure(figsize=(6, 6))
+plt.plot(x, y, color='red')
+plt.axis('equal')
+plt.axis('off')
+plt.show()
+```
+<|observation|>
+```result
+【image】
+```
+<|assistant|>
+This is a shape of heart. I used parametric equations to describe this shape and drew it using Matplotlib. If you have any other needs or questions, please feel free to let me know.
+<|user|>  # End
+````
\ No newline at end of file




diff --git a/README_en.md b/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..52deef92efd996d3dcd9743c630e914bf3d13fa0
--- /dev/null
+++ b/README_en.md
@@ -0,0 +1,200 @@
+# ChatGLM3
+
+<p align="center">
+🤗 <a href="https://huggingface.co/THUDM/chatglm3-6b" target="_blank">HF Repo</a> • 🤖 <a href="https://modelscope.cn/models/ZhipuAI/chatglm3-6b" target="_blank">ModelScope</a>  • 🐦 <a href="https://twitter.com/thukeg" target="_blank">Twitter</a> • 📃 <a href="https://arxiv.org/abs/2103.10360" target="_blank">[GLM@ACL 22]</a> <a href="https://github.com/THUDM/GLM" target="_blank">[GitHub]</a> • 📃 <a href="https://arxiv.org/abs/2210.02414" target="_blank">[GLM-130B@ICLR 23]</a> <a href="https://github.com/THUDM/GLM-130B" target="_blank">[GitHub]</a> <br>
+</p>
+<p align="center">
+    👋 Join our <a href="https://join.slack.com/t/chatglm/shared_invite/zt-25ti5uohv-A_hs~am_D3Q8XPZMpj7wwQ" target="_blank">Slack</a> and <a href="resources/WECHAT.md" target="_blank">WeChat</a>
+</p>
+<p align="center">
+📍Experience the larger-scale ChatGLM model at <a href="https://www.chatglm.cn">chatglm.cn</a>
+</p>
+
+## Introduction
+
+ChatGLM3 is a new generation of pre-trained dialogue models jointly released by Zhipu AI and Tsinghua KEG. ChatGLM3-6B is the open-source model in the ChatGLM3 series, maintaining many excellent features of the first two generations such as smooth dialogue and low deployment threshold, while introducing the following features:
+
+1. **Stronger Base Model:** The base model of ChatGLM3-6B, ChatGLM3-6B-Base, adopts a more diverse training dataset, more sufficient training steps, and a more reasonable training strategy. Evaluations on datasets from various perspectives such as semantics, mathematics, reasoning, code, and knowledge show that **ChatGLM3-6B-Base has the strongest performance among base models below 10B**.
+
+2. **More Complete Function Support:** ChatGLM3-6B adopts a newly designed [Prompt format](PROMPT_en.md), supporting multi-turn dialogues as usual. It also natively supports [tool invocation](tool_using/README_en.md) (Function Call), code execution (Code Interpreter), and Agent tasks in complex scenarios.
+
+3. **More Comprehensive Open-source Series:** In addition to the dialogue model [ChatGLM3-6B](https://huggingface.co/THUDM/chatglm3-6b), the basic model [ChatGLM3-6B-Base](https://huggingface.co/THUDM/chatglm3-6b-base), and the long-text dialogue model [ChatGLM3-6B-32K](https://huggingface.co/THUDM/chatglm3-6b-32k) have also been open-sourced. All these weights are **fully open** for academic research, and **free commercial use is also allowed** after registration via a [questionnaire](https://open.bigmodel.cn/mla/form).
+
+-----
+
+The ChatGLM3 open-source model aims to promote the development of large-model technology together with the open-source community. Developers and everyone are earnestly requested to comply with the [open-source protocol](MODEL_LICENSE), and not to use the open-source models, codes, and derivatives for any purposes that might harm the nation and society, and for any services that have not been evaluated and filed for safety. Currently, no applications, including web, Android, Apple iOS, and Windows App, have been developed based on the **ChatGLM3 open-source model** by our project team.
+
+Although every effort has been made to ensure the compliance and accuracy of the data at various stages of model training, due to the smaller scale of the ChatGLM3-6B model and the influence of probabilistic randomness factors, the accuracy of output content cannot be guaranteed. The model output is also easily misled by user input. **This project does not assume risks and liabilities caused by data security, public opinion risks, or any misleading, abuse, dissemination, and improper use of open-source models and codes.**
+
+## Model List
+
+| Model | Seq Length |                                                              Download                                                               
+| :---: |:---------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------:
+| ChatGLM3-6B | 8k |      [HuggingFace](https://huggingface.co/THUDM/chatglm3-6b) \| [ModelScope](https://modelscope.cn/models/ZhipuAI/chatglm3-6b)      
+| ChatGLM3-6B-Base | 8k | [HuggingFace](https://huggingface.co/THUDM/chatglm3-6b-base) \| [ModelScope](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-base) 
+| ChatGLM3-6B-32K | 32k |                                   [HuggingFace](https://huggingface.co/THUDM/chatglm3-6b-32k) \| [ModelScope](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-32k)                                    
+
+## Evaluation Results
+
+### Typical Tasks
+
+We selected 8 typical Chinese-English datasets and conducted performance tests on the ChatGLM3-6B (base) version.
+
+| Model            | GSM8K | MATH | BBH  | MMLU | C-Eval | CMMLU | MBPP | AGIEval |
+|------------------|:-----:|:----:|:----:|:----:|:------:|:-----:|:----:|:-------:|
+| ChatGLM2-6B-Base | 32.4  | 6.5  | 33.7 | 47.9 |  51.7  | 50.0  |  -   |    -    |
+| Best Baseline    | 52.1  | 13.1 | 45.0 | 60.1 |  63.5  | 62.2  | 47.5 |  45.8   |
+| ChatGLM3-6B-Base | 72.3  | 25.7 | 66.1 | 61.4 |  69.0  | 67.5  | 52.4 |  53.7   |
+> "Best Baseline" refers to the pre-trained models that perform best on the corresponding datasets with model parameters below 10B, excluding models that are trained specifically for a single task and do not maintain general capabilities.
+
+> In the tests of ChatGLM3-6B-Base, BBH used a 3-shot test, GSM8K and MATH that require inference used a 0-shot CoT test, MBPP used a 0-shot generation followed by running test cases to calculate Pass@1, and other multiple-choice type datasets all used a 0-shot test.
+
+We have conducted manual evaluation tests on ChatGLM3-6B-32K in multiple long-text application scenarios. Compared with the second-generation model, its effect has improved by more than 50% on average. In applications such as paper reading, document summarization, and financial report analysis, this improvement is particularly significant. In addition, we also tested the model on the LongBench evaluation set, and the specific results are shown in the table below.
+
+| Model                |  平均 |  Summary | Single-Doc QA |  Multi-Doc QA | Code | Few-shot | Synthetic | 
+|----------------------|:-----:|:----:|:----:|:----:|:------:|:-----:|:-----:|
+| ChatGLM2-6B-32K   |  41.5 | 24.8 | 37.6 | 34.7 |  52.8  |  51.3 | 47.7 | 
+| ChatGLM3-6B-32K   |  50.2 | 26.6 | 45.8 | 46.1 |  56.2  |  61.2 | 65 |
+
+
+## How to Use
+
+### Environment Installation
+First, you need to download this repository:
+```shell
+git clone https://github.com/THUDM/ChatGLM3
+cd ChatGLM3
+```
+
+Then use pip to install the dependencies:
+```
+pip install -r requirements.txt
+```
+It is recommended to use version `4.30.2` for the `transformers` library, and version 2.0 or above for `torch`, to achieve the best inference performance.
+
+### Integrated Demo
+
+We provide an integrated demo that incorporates the following three functionalities. Please refer to [Integrated Demo](composite_demo/README_en.md) for how to run it.
+
+- Chat: Dialogue mode, where you can interact with the model.
+- Tool: Tool mode, where in addition to dialogue, the model can also perform other operations using tools.
+    ![tool](resources/tool.png)
+- Code Interpreter: Code interpreter mode, where the model can execute code in a Jupyter environment and obtain results to complete complex tasks.
+    ![code](resources/heart.png)
+
+### 代码调用 
+
+可以通过如下代码调用 ChatGLM 模型来生成对话：
+
+```python
+>>> from transformers import AutoTokenizer, AutoModel
+>>> tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
+>>> model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True, device='cuda')
+>>> model = model.eval()
+>>> response, history = model.chat(tokenizer, "Hello", history=[])
+>>> print(response)
+Hello 👋! I'm ChatGLM3-6B, the artificial intelligence assistant, nice to meet you. Feel free to ask me any questions.
+>>> response, history = model.chat(tokenizer, "What should I do if I can't sleep at night", history=history)
+>>> print(response)
+If you're having trouble sleeping at night, here are a few suggestions that might help:
+
+1. Create a relaxing sleep environment: Make sure your bedroom is cool, quiet, and dark. Consider using earplugs, a white noise machine, or a fan to help create an optimal environment.
+2. Establish a bedtime routine: Try to go to bed and wake up at the same time every day, even on weekends. A consistent routine can help regulate your body's internal clock.
+3. Avoid stimulating activities before bedtime: Avoid using electronic devices, watching TV, or engaging in stimulating activities like exercise or puzzle-solving, as these can interfere with your ability to fall asleep.
+4. Limit caffeine and alcohol: Avoid consuming caffeine and alcohol close to bedtime, as these can disrupt your sleep patterns.
+5. Practice relaxation techniques: Try meditation, deep breathing, or progressive muscle relaxation to help calm your mind and body before sleep.
+6. Consider taking a warm bath or shower: A warm bath or shower can help relax your muscles and promote sleep.
+7. Get some fresh air: Make sure to get some fresh air during the day, as lack of vitamin D can interfere with sleep quality.
+
+If you continue to have difficulty sleeping, consult with a healthcare professional for further guidance and support.
+```
+
+#### Load Model Locally
+The above code will automatically download the model implementation and parameters by `transformers`. The complete model implementation is available on [Hugging Face Hub](https://huggingface.co/THUDM/chatglm3-6b). If your network environment is poor, downloading model parameters might take a long time or even fail. In this case, you can first download the model to your local machine, and then load it from there.
+
+To download the model from Hugging Face Hub, you need to [install Git LFS](https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage) first, then run
+```Shell
+git clone https://huggingface.co/THUDM/chatglm3-6b
+```
+
+If the download from HuggingFace is slow, you can also download it from [ModelScope](https://modelscope.cn/models/ZhipuAI/chatglm3-6b).
+
+### Web-based Dialogue Demo
+![web-demo](resources/web-demo.gif)
+You can launch a web-based demo using Gradio with the following command:
+```shell
+python web_demo.py
+```
+
+![web-demo](resources/web-demo2.png)
+
+You can launch a web-based demo using Streamlit with the following command:
+```shell
+streamlit run web_demo2.py
+```
+
+The web-based demo will run a Web Server and output an address. You can use it by opening the output address in a browser. Based on tests, the web-based demo using Streamlit runs more smoothly.
+
+### Command Line Dialogue Demo
+
+![cli-demo](resources/cli-demo.png)
+
+Run [cli_demo.py](cli_demo.py) in the repository:
+
+```shell
+python cli_demo.py
+```
+
+The program will interact in the command line, enter instructions in the command line and hit enter to generate a response. Enter `clear` to clear the dialogue history, enter `stop` to terminate the program.
+
+### API Deployment
+Thanks to [@xusenlinzy](https://github.com/xusenlinzy) for implementing the OpenAI format streaming API deployment, which can serve as the backend for any ChatGPT-based application, such as [ChatGPT-Next-Web](https://github.com/Yidadaa/ChatGPT-Next-Web). You can deploy it by running [openai_api.py](openai_api.py) in the repository:
+```shell
+python openai_api.py
+```
+The example code for API calls is as follows:
+```python
+import openai
+if __name__ == "__main__":
+    openai.api_base = "http://localhost:8000/v1"
+    openai.api_key = "none"
+    for chunk in openai.ChatCompletion.create(
+        model="chatglm3-6b",
+        messages=[
+            {"role": "user", "content": "你好"}
+        ],
+        stream=True
+    ):
+        if hasattr(chunk.choices[0].delta, "content"):
+            print(chunk.choices[0].delta.content, end="", flush=True)
+```
+
+### Tool Invocation
+
+For methods of tool invocation, please refer to [Tool Invocation](tool_using/README_en.md).
+
+## Low-cost Deployment
+
+Please see [DEPLOYMENT_en.md](DEPLOYMENT_en.md).
+
+## Citation
+
+If you find our work helpful, please consider citing the following papers.
+
+```
+@article{zeng2022glm,
+  title={Glm-130b: An open bilingual pre-trained model},
+  author={Zeng, Aohan and Liu, Xiao and Du, Zhengxiao and Wang, Zihan and Lai, Hanyu and Ding, Ming and Yang, Zhuoyi and Xu, Yifan and Zheng, Wendi and Xia, Xiao and others},
+  journal={arXiv preprint arXiv:2210.02414},
+  year={2022}
+}
+```
+```
+@inproceedings{du2022glm,
+  title={GLM: General Language Model Pretraining with Autoregressive Blank Infilling},
+  author={Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie},
+  booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
+  pages={320--335},
+  year={2022}
+}
+```




diff --git a/composite_demo/README_en.md b/composite_demo/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..67d6d5a5e04f0c271703ff2939a48f99fbe1db60
--- /dev/null
+++ b/composite_demo/README_en.md
@@ -0,0 +1,85 @@
+# ChatGLM3 Web Demo
+
+![Demo webpage](assets/demo.png)
+
+## Installation
+
+We recommend managing environments through [Conda](https://docs.conda.io/en/latest/).
+
+Execute the following commands to create a new conda environment and install the necessary dependencies:
+
+```bash
+conda create -n chatglm3-demo python=3.10
+conda activate chatglm3-demo
+pip install -r requirements.txt
+```
+
+Please note that this project requires Python 3.10 or higher.
+
+Additionally, installing the Jupyter kernel is required for using the Code Interpreter:
+
+```bash
+ipython kernel install --name chatglm3-demo --user
+```
+
+## Execution
+
+Run the following command to load the model locally and start the demo:
+
+```bash
+streamlit run main.py
+```
+
+Afterward, the address of the demo can be seen from the command line; click to access. The first visit requires the download and loading of the model, which may take some time.
+
+If the model has already been downloaded locally, you can specify to load the model locally through `export MODEL_PATH=/path/to/model`. If you need to customize the Jupyter kernel, you can specify it through `export IPYKERNEL=<kernel_name>`.
+
+## Usage
+
+ChatGLM3 Demo has three modes:
+
+- Chat: Dialogue mode, where you can interact with the model.
+- Tool: Tool mode, where the model, in addition to dialogue, can perform other operations through tools.
+- Code Interpreter: Code interpreter mode, where the model can execute code in a Jupyter environment and obtain results to complete complex tasks.
+
+### Dialogue Mode
+
+In dialogue mode, users can directly modify parameters such as top_p, temperature, System Prompt in the sidebar to adjust the behavior of the model. For example,
+
+![The model responses following system prompt](assets/emojis.png)
+
+### Tool Mode
+
+You can enhance the model's capabilities by registering new tools in `tool_registry.py`. Just use the `@register_tool` decorator to complete the registration. For tool declarations, the function name is the name of the tool, and the function docstring is the description of the tool; for tool parameters, use `Annotated[typ: type, description: str, required: bool]` to annotate the type, description, and whether it is necessary of the parameters.
+
+For example, the registration of the `get_weather` tool is as follows:
+
+```python
+@register_tool
+def get_weather(
+    city_name: Annotated[str, 'The name of the city to be queried', True],
+) -> str:
+    """
+    Get the weather for `city_name` in the following week
+    """
+    ...
+```
+
+![The model uses tool to query the weather of pairs.](assets/tool.png)
+
+Additionally, you can enter the manual mode through `Manual mode` on the page. In this mode, you can directly specify the tool list through YAML, but you need to manually feed back the tool's output to the model.
+
+### Code Interpreter Mode
+
+Due to having a code execution environment, the model in this mode can perform more complex tasks, such as drawing charts, performing symbolic operations, etc. The model will automatically execute multiple code blocks in succession based on its understanding of the task completion status until the task is completed. Therefore, in this mode, you only need to specify the task you want the model to perform.
+
+For example, we can ask ChatGLM3 to draw a heart:
+
+![The code interpreter draws a heart according to the user's instructions.](assets/heart.png)
+
+### Additional Tips
+
+- While the model is generating text, it can be interrupted by the `Stop` button at the top right corner of the page.
+- Refreshing the page will clear the dialogue history.
+
+# Enjoy!
\ No newline at end of file




diff --git a/tool_using/README_en.md b/tool_using/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..0e53479e781ac878629526b156b1ee98404b318f
--- /dev/null
+++ b/tool_using/README_en.md
@@ -0,0 +1,75 @@
+# Tool Invocation
+This document will introduce how to use the ChatGLM3-6B for tool invocation. Currently, only the ChatGLM3-6B model supports tool invocation, while the ChatGLM3-6B-Base and ChatGLM3-6B-32K models do not support it.
+
+## Building System Prompt
+Here are two examples of tool invocation. First, prepare the description information of the data to be built.
+
+```python
+tools = [
+    {
+        "name": "track",
+        "description": "Track the real-time price of a specified stock",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "symbol": {
+                    "description": "The stock code that needs to be tracked"
+                }
+            },
+            "required": ['symbol']
+        }
+    },
+    {
+        "name": "text-to-speech",
+        "description": "Convert text to speech",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "text": {
+                    "description": "The text that needs to be converted into speech"
+                },
+                "voice": {
+                    "description": "The type of voice to use (male, female, etc.)"
+                },
+                "speed": {
+                    "description": "The speed of the speech (fast, medium, slow, etc.)"
+                }
+            },
+            "required": ['text']
+        }
+    }
+]
+system_info = {"role": "system", "content": "Answer the following questions as best as you can. You have access to the following tools:", "tools": tools}
+```
+
+Please ensure that the definition format of the tool is consistent with the example to obtain optimal performance.
+
+## Asking Questions
+Note: Currently, the tool invocation of ChatGLM3-6B only supports the `chat` method and does not support the `stream_chat` method.
+```python
+history = [system_info]
+query = "Help me inquire the price of stock 10111"
+response, history = model.chat(tokenizer, query, history=history)
+print(response)
+```
+The expected output here is
+```json
+{"name": "track", "parameters": {"symbol": "10111"}}
+```
+This indicates that the model needs to call the tool `track`, and the parameter `symbol` needs to be passed in.
+
+## Invoke Tool, Generate Response
+Here, you need to implement the logic of calling the tool yourself. Assuming that the return result has been obtained, return the result to the model in json format and get a response.
+```python
+result = json.dumps({"price": 12412}, ensure_ascii=False)
+response, history = model.chat(tokenizer, result, history=history, role="observation")
+print(response)
+```
+Here `role="observation"` indicates that the input is the return value of the tool invocation rather than user input, and it cannot be omitted.
+
+The expected output is
+```
+Based on your query, after the API call, the price of stock 10111 is 12412.
+```
+
+This indicates that this tool invocation has ended, and the model generates a response based on the return result. For more complex questions, the model may need to make multiple tool invocations. At this time, you can judge whether the returned `response` is `str` or `dict` to determine whether the return is a generated response or a tool invocation request.
\ No newline at end of file