Deep Learning models for large language processing
Another instalment in our AI model development and this is by far the biggest project yet. The problem we are working on this time, is being able to read medical literature with ease. SkimLit model is specifically designed to skim through the literature and classify it as well as summarize. Let me give you a graphical example:
SkimLit
So essentially, we want the text from left to become like the one on the right. Which becomes so much easier to read and we specific sections in the text that we skim through to understand the main meaning of the research paper.
So this time we used 5 different models and evaluated their accuracy how this problem can be solved best. As always we started with baseline model, the most important lesson here is that, not all problems require deep learning models, we start with Bayesian model, which is type of machine learning model, that do not possess the deep learning properties. So the 5 models that we built are:
AI Models comparison
Model 0: TF-IDF Multinomial Naive Bayes
Model 1: Conv1D model with token embeddings
Model 2: Feature extraction model with pretrained token embeddings
Model 3: Conv1D model with character embeddings
Model 4: Combining pretrained token embeddings + character embeddings (hybrid embedding layer)
Model 5: Transfer Learning with pretrained token embeddings + character embeddings + positional embeddings
As you can see from the image below, the most successful model was Model 5: Transfer Learning with pretrained token embeddings + character embeddings + positional embeddings
Deep learning model comparison
So let me show graphically show does the model layers look like:
Model 5: Transfer Learning with pretrained token embeddings + character embeddings + positional embeddings
So as you can see we started with simple machine learning model and iterated through the problem by first embedding the token (individual words), then we embedded the individual characters and then we combined those two models into one, but the biggest success was when we added the line position embedding as well as total line embedding into the tribrid embedding model..
If you wish to explore how organizations are solving problems with embedding models, there is great article below how AirBnb uses this type of model based in image, text and dense layers to classify rooms:
WIDeText: A Multimodal Deep Learning Framework | by Wayne Zhang | The Airbnb Tech Blog | Medium
As usual, the caveat is, do not use production data to experiment with. Hopefully you learned something new again. If you wish to explore or learn more, do not hesitate to reach out.