Generate plots corresponding to where recent (mid-2020+) research on a given topic / related to a given paper has appeared on ArXiv.¶
The examples below use the pre-trained astro-ph-GA-23May2021
model along with a compilation of author affiliations from ADS to find relevant papers and from that, use author affiliations to find how strongly a certain place/institute contributes to research on the given topic/paper. This extension to the project was undertaken largely to be useful for prospective grad students and postdocs to help better find places to apply to.
Available options are:
return_n
: to specify how deep the search should go. ~3000 is the full dataset, generally numbers in the 3-100 range return useful results depending on how broad you want the search to be.doc_id
andinput_type
: can be keywords, or an ArXiv id (see examples below for usage)plt_radius
: sets the radius of circles corresponding to each point. change in concert with return_n.
Note
This tutorial uses a very small list of affiliations (corresponding to ~2500 recent papers) for this exercise, so the results may not necessarily generalise well beyond that. If you’re interested in expanding this, please get in touch with me.
[1]:
import chaotic_neural as cn
[2]:
#mapper_model_data = cn.load_trained_doc2vec_model('galaxies_all', cn_dir = '../../chaotic_neural/')
model_data = cn.load_trained_doc2vec_model('astro-ph-GA-23May2021', cn_dir = '../../chaotic_neural/')
model, all_titles, all_abstracts, all_authors, train_corpus, test_corpus = model_data
with open("../../chaotic_neural/data/astro-ph-GA-23May2021_recent_affils.pkl", "rb") as fp: #Pickling
recent_affils = cn.pickle.load(fp)
with open("../../chaotic_neural/data/astro-ph-GA-23May2021_recent_latlon.pkl", "rb") as fp: #Pickling
[place_names, place_locs, all_ids] = cn.pickle.load(fp)
mapper_model_data = [model, all_titles, all_abstracts, all_authors, all_ids, train_corpus, test_corpus, recent_affils, place_names, place_locs]
keyword search example¶
[3]:
cn.list_similar_locations(mapper_model_data, doc_id = ['sed','fitting'],
input_type='keywords',
return_n=100)
Keyword(s): ['sed', 'fitting']
multi-keyword
----
ArXiv ID search example¶
[4]:
cn.list_similar_locations(mapper_model_data, doc_id = 2001.00952,
input_type='arxiv_id',
return_n=10)
ArXiv id: 2001.00952
Title: The First Habitable Zone Earth-sized Planet from TESS. I: Validation of
the TOI-700 System
----
Showing (roughly) the full sample, to get an idea of the implicit prior.¶
[5]:
cn.list_similar_locations(mapper_model_data, doc_id = ['galaxy'],
input_type='keywords',
return_n=3000, plt_radius = 3)
Keyword(s): ['galaxy']
----
[ ]: