ActGPT

notion image
We built a chatbot that can take human instructions and perform tasks on browser to validate the concept of LLM in action. AGI (artificial general intelligence) with internet can benefit human with real-time information, higher productivity, and wider source of truth, and will potentially be the next trend in AIGC. The chatbot is built using GPT-3, connected to act in browsers via Selenium the automation tool. Our demo will show how ActGPT searches Wikipedia to compose and post about “ChatGPT” on Twitter.
Video preview
Video preview

Methodology

Navigating web browser is hard for LLM as it has never been trained on it before. However, we can leveraging its experience on writing code. LLM has great generalizability on writing new browser automation code for various different websites. For example, when searching for specific element on a webpage, it can come up with xpath even though I never taught it.
You should also checkout
natbot
natUpdated Aug 30, 2023
which takes a different approach.
notion image
 
As a CV researcher, I’m new to LLM. A few things I learned:
  • End the prompt with backquote ```python, so GPT understands it's writing code. credits to @goodside
  • When the prompt becomes longer and longer, sometimes GPT gets confused. It’s better to dynamically load relevant rules or use chain of thoughts.
  • Set constraints helps reducing hallucinations such as “Does not call any functions besides those given above and those defined by the base language spec.”
notion image
You have an instance `env` with the following methods: - `env.driver.find_elements(by='id', value=None)` which finds and returns list of WebElement. The arguement `by` is a string that specifies the locator strategy. The arguement `value` is a string that specifies the locator value. `by` is usually `xpath` and `value` is the xpath of the element. - `env.find_nearest(e, xpath)` can only be used to locate an element that matches the xpath near element e. - `env.send_keys(text)` is only used to type in string `text`. string ENTER is Keys.ENTER - `env.get(url)` goes to url. - `env.get_openai_response(text)` that ask AI about a string `text`. - `env.click(element)` clicks the element. WebElement has functions: 1. `element.text` returns the text of the element 2. `element.get_attribute(attr)` returns the value of the attribute of the element. If the attribute does not exist, it returns ''. 3. `element.find_elements(by='id', value=None)` it's the same as `env.driver.find_elements()` except that it only searches the children of the element. 4. `element.is_displayed()` returns if the element is visible The xpath of a textbox is usually "//div[@role = 'textarea']|//div[@role = 'textbox']|//input". The xpath of text is usually "//*[string-length(text()) > 0]". The xpath for a button is usually "//div[@role = 'button']|//button". The xpath for an element whose text is "text" is "//*[text() = 'text']". The xpath for the tweet is "//span[contains(text(), '')]". The xpath for the like button is "//div[@role != '' and @data-testid='like']|//button". The xpath for the unlike button is "//div[@role != '' and @data-testid='unlike']|//button". Your code must obey the following constraints: 1. respect the lowercase and uppercase letters in the instruction. 2. Does not call any functions besides those given above and those defined by the base language spec. 3. has correct indentation. 4. only write code 5. only do what I instructed you to do. write code: 1. go to www.wikipedia.org 2. find all textboxes. find one from them that is visible 3. click on the textbox 4. type in "ChatGPT" + Keys.ENTER 5. sleep 3 seconds 6. find all elements that contains text longer than 50 characters 7. combine their text in a string `text` and print it 8. go to url 'www.twitter.com' 9. ask AI about "write a tagline tweet given:" + `text` and store it in variable `response` 10. find an element whose text is Tweet 11. find a textbox near the element 12. click the textbox 13. type in existing variable `response` + Keys.ENTER 14. click the element whose text is Tweet. 18. wait 3 seconds 19. find an element that has text longer than 50 characters 20. click on the nearest like button ```python
 
This project is built at Scale AI hackathon with https://twitter.com/Tiancaixinxin
The code is available on GitHub:
ActGPT
ethanhe42Updated Apr 22, 2024