Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias (2024)

Andres Algaba1  Carmen Mazijn1  Vincent Holst1
Floriano Tori1  Sylvia Wenmackers2  Vincent Ginis1,3
Corresponding author. Data and code are available at
https://github.com/AndresAlgaba/LLM_citation_patterns.

Abstract

Citation practices are crucial in shaping the structure of scientific knowledge, yet they are often influenced by contemporary norms and biases. The emergence of Large Language Models (LLMs) introduces a new dynamic to these practices. Interestingly, the characteristics and potential biases of references recommended by LLMs that entirely rely on their parametric knowledge, and not on search or retrieval-augmented generation, remain unexplored. Here, we analyze these characteristics in an experiment using a dataset from AAAI, NeurIPS, ICML, and ICLR, published after GPT-4’s knowledge cut-off date. In our experiment, LLMs are tasked with suggesting scholarly references for the anonymized in-text citations within these papers. Our findings reveal a remarkable similarity between human and LLM citation patterns, but with a more pronounced high citation bias, which persists even after controlling for publication year, title length, number of authors, and venue. The results hold for both GPT-4, and the more capable models GPT-4o and Claude 3.5 where the papers are part of the training data. Additionally, we observe a large consistency between the characteristics of LLM’s existing and non-existent generated references, indicating the model’s internalization of citation patterns. By analyzing citation graphs, we show that the references recommended are embedded in the relevant citation context, suggesting an even deeper conceptual internalization of the citation networks. While LLMs can aid in citation generation, they may also amplify existing biases, such as the Matthew effect, and introduce new ones, potentially skewing scientific knowledge dissemination.

Introduction

Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias (1)

Large Language Models (LLMs) have revolutionized natural language understanding and generation, driving scientific research forward by assisting in all steps of the scientific process, ranging from identifying research gaps to accelerating complex data analysis(Boiko, MacKnight, and Gomes 2023; Merchant etal. 2023; Romera-Paredes etal. 2024; Zheng etal. 2023). One particularly interesting application is the generation of suggestions for appropriate scholarly references(Qureshi etal. 2023; Walters and Wilder 2023). Yet, without the aid of web browsing or retrieval-augmented generation, these models rely entirely on their parametric knowledge encapsulated during their (pre-)training(Brown etal. 2020; Bubeck etal. 2023; Kaddour etal. 2023; Wei etal. 2022a). Our research focuses on this intrinsic citation behavior of GPT-4, exploring how the model recommends references based on its training data, and highlighting the potential biases that arise from this internalized knowledge(Acerbi and Stubbersfield 2023; Manerba etal. 2023).

Biases in citation practices have long been a subject of scrutiny in the scientific community(Fortunato etal. 2018; Smith 2012). Common biases include a preference for recent publications(Bornmann and Daniel 2008), shorter titles(Letchford, Moat, and Preis 2015), and high-profile publication venues(Lawrence 2003). Moreover, a well-documented phenomenon is the “Matthew effect,” where highly cited papers tend to accumulate even more citations(Wang 2014). The number of authors on a paper can also influence its likelihood of being cited, with solo and small author groups often cited less frequently than larger teams(Gazni and Didegah 2011). By examining how these biases manifest in LLM generated references, we aim to uncover the underlying patterns and their potential to amplify existing biases and introduce new ones, potentially skewing scientific knowledge dissemination.

In our experiment, we let GPT-4, GPT-4o and Claude 3.5 suggest scholarly references for anonymized in-text citations within a paper and compare the characteristics and citation networks of the LLM generated references against the ground truth. We provide a comprehensive analysis of 166166166166 papers which are published in the main tracks of AAAI, NeurIPS, ICML, and ICLR, encompassing 3,066 references in total. All the papers are only first available online on arXiv after GPT-4-0613’s knowledge cut-off date and belong to the cs.LG category. While this experimental setup may not fully reflect real-world usage of LLMs for citation generation, which often involves more interactivity and reliance on external data sources, it provides a controlled laboratory setting to assess the parametric knowledge and inherent biases of LLMs. Furthermore, our focused sample of papers ensures a hom*ogeneous dataset, which allows us to minimize confounding factors that could arise from cross-disciplinary differences in citation practices.

Our setting differs from previous work which either let the LLM generate short papers or literature reviews, or is prompted for the most important papers on a certain topic(Walters and Wilder 2023). We argue that these methods are more susceptible to the LLM’s memorization capabilities(Chen etal. 2024; Kadavath etal. 2022). Moreover, the evaluation of the suggested references mostly focuses on their existence, bibliometric accuracy, or qualitative judgement by domain experts(Qureshi etal. 2023). Finally, another strand of the literature focuses on improving LLMs via search and retrieval-augmented generation(Lewis etal. 2020) or to reduce their hallucination rate via self-consistency(Agrawal etal. 2024) to enhance their capabilities in systematic literature reviews(Susnjak etal. 2024).

In our experiment, we find that GPT-4 exhibits strong preferences for highly cited papers, which persists even after controlling for multiple confounding factors such as publication year, title length, venue, and number of authors. Additionally, we observe a large consistency between GPT-4’s existing and non-existent generated references, indicating the model’s internalization of citation patterns. The same results hold for the more capable models GPT-4o and Claude 3.5 where the papers are part of the training data. By analyzing citation graphs, we show that the references recommended by GPT-4 are embedded in the relevant citation context, suggesting an even deeper conceptual internalization of the citation networks. While LLMs can aid in citation generation, we find they may also amplify existing biases and introduce new ones, potentially skewing scientific discourse. Our results underscore the need for identifying the model’s biases and for developing balanced methods to interact with LLMs in general(Navigli, Conia, and Ross 2023).

Generating Citations with LLMs

Our data consists of 166166166166 papers published at AAAI (25252525), NeurIPS (72727272), ICML (38383838), and ICLR (31313131) for a total of 3,06630663,0663 , 066 references. Our data collection process is depicted in Figure1 (see Appendix B for more details) and begins by retrieving all the relevant papers from arXiv, focusing on those within the machine learning category (cs.LG) and posted between March 2022 and October 2023 (after GPT-4-0613’s knowledge cut-off date). The papers are verified on Semantic Scholar where we store additional metadata, such as all the reference titles with corresponding Semantic Scholar IDs to construct the citation networks (see Appendix Table LABEL:tab:paper_details for a full list of all the included papers).

We split the main content, which includes the author information, conference information, abstract, and introduction, from the ground truth references. Next, we prompt GPT-4, GPT-4o and Claude 3.5 as follows:

We then post-process the responses to extract the title, venue, publication year, author names, and number of authors for each generated reference (see Appendix B for more details). To assess the robustness of this approach, we repeat this “vanilla” approach three to five times for all 166166166166 papers.

A well-known issue in text generation by LLMs are hallucinations or confabulations, which refer to generated content that is nonsensical or untruthful in relation to certain sources, i.e., factual mistakes about historical events(Zhang etal. 2023). This is particularly problematic for the generation of scholarly references, as LLMs can fabricate references that do not exist or introduce subtle errors, making it impossible to retrieve the actual references(Walters and Wilder 2023). There are two main approaches to verify the existence of LLM-generated references: one involves asking additional questions to the LLM to verify its self-consistency(Agrawal etal. 2024), and the second approach utilizes external databases to verify a reference’s existence(Fabiano etal. 2024). In our experiment, we opt for the latter and determine via title and author names matching with Semantic Scholar entries whether the generated references exist (see Appendix B for more details). Finally, we also build on our “vanilla” approach, by introducing an “iterative” approach where we continue to prompt GPT-4 after having indicated which generated references do not exist and ask to replace those with existing ones (see Appendix B for more details). The previously existing generated and the newly generated references are then merged.

In Table1, we report the GPT-4 summary statistics for each of the five vanilla (and iterative: results between brackets) runs. On average, 65%percent6565\%65 % (86%percent8686\%86 %) of the generated references match with an entry in Semantic Scholar, while 13%percent1313\%13 % (14%percent1414\%14 %) and 17%percent1717\%17 % (20%percent2020\%20 %) of them appear in the introduction or paper itself, respectively. We further show that about 7%percent77\%7 % (7%percent77\%7 %) of the generated and ground truth references match pairwise. However, this number grows to 13%percent1313\%13 % (14%percent1414\%14 %) if we only consider the uniquely identifiable references (i.e., omitting references included in [4–8] as there is no one-to-one correspondence). In Appendix Table C1, we show that the average overlap between generated sets is 17%percent1717\%17 %.

Vanilla (Iterative)Run 1Run 2Run 3Run 4Run 5
Existence64.363.362.864.267.6
(87.0)(85.5)(88.0)(86.8)(86.3)
Cited in paper17.517.115.716.818.0
(20.0)(20.1)(18.4)(19.2)(20.8)
Cited in introduction13.413.212.212.913.9
(14.5)(15.0)(13.5)(14.3)(15.3)
Pairwise Match (PM)7.07.26.36.96.7
for all references(7.1)(7.3)(6.6)(7.0)(7.1)
PM for uniquely12.513.712.513.713.3
identifiable references(12.5)(14.1)(12.9)(13.7)(14.0)
Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias (2)

Reflecting Human Citation Patterns

Figure2 displays the characteristics of the ground truth and GPT-4 generated references, and separately the characteristics of the generated references which match with a Semantic Scholar entry, and those which do not exist according to this database. Overall, we observe a remarkable similarity between human and LLM citation patterns and a large consistency between GPT-4’s existing and non-existent generated references, indicating the model’s internalization of citation patterns. In Appendix FigureB1, we also show that the newly generated papers from the “iterative” approach show nearly identical distributions.

The distributions of the title lengths show that existing generated reference titles tend to be the shortest, while non-existent generated reference titles are more similar in length to the ground truth, which indicates a learned pattern. Overall, the first effect dominates, so the average is skewed to shorter titles for generated references (Figure2b). The temporal analysis reveals a similar pattern where non-existent generated references follow a distribution that is more similar to the ground truth than the existent ones (Figure2c).

The distribution of the number of authors highlights a notable difference, with ground truth references typically involving three authors versus two for generated references, though the frequent use of “et al.” in the generated references complicates exact author counts (Figure2d). To further examine the potential impact of the “et al.” problem, we only consider the existing generated references and their ground truth counterpart in Appendix FigureB2. There, we compare the characteristics of the references between two data sources, namely the original source (the paper or GPT generation) and the available information on Semantic Scholar. The similarity between the distributions of all characteristics shows that the data source has no impact and “et al.” does not cause this observation.

The publication venue distributions show that for most venues the ground truth has the highest relative representation, followed closely by existing generated references, with non-existent generated references displaying the largest proportion of “Others” (Figure2e). In Appendix FigureB3, we observe that the distributions of publication venues for both ground truth and generated references are very similar across the various conferences, i.e., AAAI, NeurIPS, ICML, and ICLR. The pairwise transition matrix from ground truth to generated publication venues at the reference level indicates a large overall agreement, but with a strong preference in GPT-4 generated references for arXiv, NeurIPS, and “Others” in the case of disagreement. The preference for NeurIPS may be due to the relatively large number of NeurIPS papers in our sample and the large share of arXiv and “Others” points to favoring a wider array of venues which may potentially dilute the perceived relevance of key conferences. Finally, the scatter plot affirms the strong pairwise correlation between the ground truth and generated references to the top conferences at the individual paper level.

Most prominently, we observe a significant citation bias in the existing generated references, which have a median citation count of 1,32613261,3261 , 326 higher than ground truth references (Figure2f). In Appendix FigureB6, we compare the characteristics for the corresponding ground truth references of existing and non-existent references, and for the existing references which also appear in the paper itself. We observe that the ground truth papers which correspond to existing references that appear in the paper itself have by far the most citations, followed by the existing references, and the ground truth papers corresponding to non-existent references have the lowest numbers of citations. These findings further indicate the tendency for GPT-4 to more easily generate references to highly cited papers. Finally, the distribution of references indicates that ground truth references cite slightly more papers than existing generated references (Figure2g).

In Appendix Figures B4 and B5 and Table C2, we find similar results for three GPT-4o and Claude 3.5 runs, but with a higher existence rate which may be due to the models’ capabilities or the papers being part of the training data.

Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias (3)

Heightened Citation Bias

Figure3 demonstrates that the citation bias observed in GPT-4 generated references is not merely a consequence of the recency of ground truth references. Specifically, the existing generated references show consistently higher citation counts compared to their ground truth counterparts across various subperiods. Figure3a illustrates that ground truth references, particularly the most recent ones, tend to have lower citation counts. Despite the ground truth references being more recent on average, the citation counts of existing generated references remain significantly higher. Figure3b further breaks down the citation distributions by subperiods, reaffirming that generated references consistently have higher citation counts than their corresponding ground truth references. Figure3c highlights that this citation discrepancy is most pronounced in both the earliest (\leq1988) and the most recent (2010-2016 and 2017-2023) subperiods, indicating that the citation bias persists across different time frames.

In Appendix FigureB7, we find that the heightened citation bias in generated references remains also after controlling for other possible confounding factors, such as title length, number of authors, and publication venue. In Appendix FigureB8 and Appendix FigureB9, we confirm that our findings are robust for the influential citation count which can be retrieved from Semantic Scholar(Valenzuela-Escarcega, Ha, andEtzioni 2015). This consistency across multiple factors underscores the inherent bias of LLMs towards generating references to highly cited papers, irrespective of other characteristics of the references.

Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias (4)

LLMs and Human Citation Networks

Figure4 displays the properties of the ground truth and GPT-4 generated citation networks. In Figure4a, we identify the focal paper (in blue), generated references that appear in the introduction (in green) or later in the paper (in yellow), generated references that are linked to ground truth or other generated references (in orange), generated references that are completely isolated (in purple), and ground truth references that are not cited by GPT-4 (in gray). The majority of generated references (>50%absentpercent50>50\%> 50 %) is non-isolated, i.e., linked to the ground truth or generated references but not present in the focal paper itself, followed by a substantial amount of generated references appearing in the introduction and only a small fraction that do not appear in the introduction but still within the focal paper (Figure4b). The remainder of generated references is completely isolated from the citation network. If GPT-4 did not pick up on human citation patterns, the generated citation network would resemble a random network containing only isolated citations. The heightened citation bias is also most pronounced for references that appear within the introduction or paper, with isolated generated references having the lowest number of citations (Figure4c). This finding further indicates the tendency for GPT-4 to more easily identify and generate references to highly cited papers. The number of references is similar across all categories, except for the isolated generated references which have substantially less references (Figure4f).

The normalized average clustering coefficients of the ground truth (green and grey nodes) and the existing generated references (green, yellow, orange, and purple nodes) indicate that GPT-4’s internalization of citation patterns extends to citation network properties (Figure4d). This internalization is also reflected by the tight connection between the non-isolated generated and ground truth references. The connection appears on an individual level as measured by the Boolean edge density, as well as on the aggregate level as measured by the edge expansion. For instance, in the central graph shown in Figure4a, a Boolean edge density of 2323\frac{2}{3}divide start_ARG 2 end_ARG start_ARG 3 end_ARG suggests one non-isolated generated reference links only within its group, while an edge expansion of 2132132\frac{1}{3}⁤ 2 divide start_ARG 1 end_ARG start_ARG 3 end_ARG indicates strong connections between the other two non-isolated generated references and the actual ground truth references. So, we can exclude the possibility of GPT-4 generating suggestions of scholarly references that are connected to each other but move further and further away from actual content of the introduction. Regardless, three of the four categories (green, yellow and orange) are well embedded in the given citation context. It reflects how tight the connection between the non-isolated generations to the ground truth references is and the deeper conceptual internalization of the citation networks.

Discussion

We present an experiment to explore the intrinsic citation behavior of LLMs and their potential biases when generating scholarly references. Whereas, previous work focuses on LLMs generating short papers or literature reviews(Qureshi etal. 2023; Walters and Wilder 2023), we let GPT-4, GPT-4o and Claude 3.5 generate suggestions of scholarly references for anonymized in-text citations. Importantly, we do not enhance the LLM through search and retrieval-augmented generation, but evaluate the model’s internalization of citation patterns in its parametric knowledge obtained during training. While, our experimental setup may not fully reflect real-world usage of LLMs for citation generation, which often involves more interactivity and reliance on external data sources, it provides a controlled laboratory setting to assess the parametric knowledge and inherent biases of LLMs.

Our findings are significant as they represent a first step towards understanding the real-world impact of LLMs in scientific research(Boiko, MacKnight, and Gomes 2023; Lu etal. 2024; Zheng etal. 2023). By highlighting the heightened citation bias in LLM generated references, we demonstrate the models’ tendency to favor highly cited papers, which could exacerbate existing biases in scientific discourse. This evaluation moves beyond more traditional LLM benchmarks(Srivastava etal. 2022), emphasizing the practical implications of deploying these models in academic contexts(Jimenez etal. 2024). The results suggest that while LLMs have the potential to streamline various aspects of research, careful consideration is needed to mitigate the amplification of biases, such as the “Matthew effect.”

One plausible hypothesis for the heightened citation bias observed in LLMs is the increased frequency of citations to heavily cited papers within the model’s training data. This prevalence makes these references more likely to be generated accurately and recognized as existent. Additionally, such biases may stem from generic training effects, where models preferentially learn patterns that are more common in the data, leading to biases towards shorter titles, more heavily cited, and slightly less recent works(Kandpal etal. 2023). These tendencies may persist despite improvements in data quantity or model sophistication as indicated by our experiments with GPT-4o and Claude 3.5.

We develop and open-source an extensible, automated pipeline to systematically analyze the references generated by LLMs. Although our methodology is robust, it is not without limitations. The use of simple prompts and the zero-shot setting(Kojima etal. 2022) aims to minimize bias in the generation process, but this simplicity might not capture the full spectrum of potential LLM capabilities. There are numerous alternative approaches and prompt designs that future research can explore to enhance the accuracy and relevance of generated references(Wang etal. 2022; Wei etal. 2022b; Yao etal. 2024). However, our iterative approach indicates that biases remain inherent in these generations. Additionally, future research can also extend the experiment beyond our specific sample of papers and observe the impact of cross-disciplinary differences in citation practices.

In conclusion, while LLMs can significantly aid in citation generation, they also risk amplifying existing biases and introducing new ones, potentially skewing the structuring and the dissemination of scientific knowledge. Our study underscores the necessity for developing balanced methods to interact with LLMs, incorporating diverse datasets, and implementing bias mitigation strategies. Fair prompting techniques(Ma etal. 2023), for instance, can be employed to reduce bias, but continuous vigilance and methodological innovation are required to ensure that the integration of LLMs into academic workflows promotes accurate knowledge dissemination.

Acknowledgements

Andres Algaba acknowledges a fellowship from the Research Foundation Flanders under Grant No.1286924N. Vincent Ginis acknowledges support from Research Foundation Flanders under Grant No.G032822N and G0K9322N.

References

  • Acerbi and Stubbersfield (2023)Acerbi, A.; and Stubbersfield, J.M. 2023.Large language models show human-like content biases in transmissionchain experiments.Proceedings of the National Academy of Sciences, 120(44):e2313790120.
  • Agrawal etal. (2024)Agrawal, A.; Suzgun, M.; Mackey, L.; and Kalai, A. 2024.Do Language Models Know When They’re Hallucinating References?In Graham, Y.; and Purver, M., eds., Findings of theAssociation for Computational Linguistics: EACL 2024, 912–928. St.Julian’s, Malta: Association for Computational Linguistics.
  • Boiko, MacKnight, and Gomes (2023)Boiko, D.A.; MacKnight, R.; and Gomes, G. 2023.Emergent autonomous scientific research capabilities of largelanguage models.arXiv preprint arXiv:2304.05332.
  • Bornmann and Daniel (2008)Bornmann, L.; and Daniel, H.-D. 2008.What do citation counts measure? A review of studies on citingbehavior.Journal of documentation, 64(1): 45–80.
  • Brown etal. (2020)Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.;Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; etal. 2020.Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901.
  • Bubeck etal. (2023)Bubeck, S.; Chandrasekaran, V.; Eldan, R.; Gehrke, J.; Horvitz, E.; Kamar, E.;Lee, P.; Lee, Y.T.; Li, Y.; Lundberg, S.; etal. 2023.Sparks of artificial general intelligence: Early experiments withgpt-4.arXiv preprint arXiv:2303.12712.
  • Chen etal. (2024)Chen, J.; Lin, H.; Han, X.; and Sun, L. 2024.Benchmarking Large Language Models in Retrieval-Augmented Generation.Proceedings of the AAAI Conference on Artificial Intelligence,38(16): 17754–17762.
  • Fabiano etal. (2024)Fabiano, N.; Gupta, A.; Bhambra, N.; Luu, B.; Wong, S.; Maaz, M.; Fiedorowicz,J.G.; Smith, A.L.; and Solmi, M. 2024.How to optimize the systematic review process using AI tools.JCPP Advances, e12234.
  • Fortunato etal. (2018)Fortunato, S.; Bergstrom, C.T.; Börner, K.; Evans, J.A.; Helbing, D.;Milojević, S.; Petersen, A.M.; Radicchi, F.; Sinatra, R.; Uzzi, B.;etal. 2018.Science of science.Science, 359(6379): eaao0185.
  • Gazni and Didegah (2011)Gazni, A.; and Didegah, F. 2011.Investigating different types of research collaboration and citationimpact: a case study of Harvard University’s publications.Scientometrics, 87(2): 251–265.
  • Jimenez etal. (2024)Jimenez, C.E.; Yang, J.; Wettig, A.; Yao, S.; Pei, K.; Press, O.; andNarasimhan, K.R. 2024.SWE-bench: Can Language Models Resolve Real-world Github Issues?In The Twelfth International Conference on LearningRepresentations.
  • Kadavath etal. (2022)Kadavath, S.; Conerly, T.; Askell, A.; Henighan, T.; Drain, D.; Perez, E.;Schiefer, N.; Hatfield-Dodds, Z.; DasSarma, N.; Tran-Johnson, E.; etal.2022.Language models (mostly) know what they know.arXiv preprint arXiv:2207.05221.
  • Kaddour etal. (2023)Kaddour, J.; Harris, J.; Mozes, M.; Bradley, H.; Raileanu, R.; and McHardy, R.2023.Challenges and applications of large language models.arXiv preprint arXiv:2307.10169.
  • Kandpal etal. (2023)Kandpal, N.; Deng, H.; Roberts, A.; Wallace, E.; and Raffel, C. 2023.Large language models struggle to learn long-tail knowledge.In International Conference on Machine Learning, 15696–15707.PMLR.
  • Kinney etal. (2023)Kinney, R.; Anastasiades, C.; Authur, R.; Beltagy, I.; Bragg, J.; Buraczynski,A.; Cachola, I.; Candra, S.; Chandrasekhar, Y.; Cohan, A.; etal. 2023.The semantic scholar open data platform.arXiv preprint arXiv:2301.10140.
  • Kojima etal. (2022)Kojima, T.; Gu, S.S.; Reid, M.; Matsuo, Y.; and Iwasawa, Y. 2022.Large language models are zero-shot reasoners.Advances in neural information processing systems, 35:22199–22213.
  • Lawrence (2003)Lawrence, P.A. 2003.The politics of publication.Nature, 422(6929): 259–261.
  • Letchford, Moat, and Preis (2015)Letchford, A.; Moat, H.S.; and Preis, T. 2015.The advantage of short paper titles.Royal Society open science, 2(8): 150266.
  • Lewis etal. (2020)Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.;Küttler, H.; Lewis, M.; Yih, W.-t.; Rocktäschel, T.; etal. 2020.Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in Neural Information Processing Systems, 33:9459–9474.
  • Lu etal. (2024)Lu, C.; Lu, C.; Lange, R.T.; Foerster, J.; Clune, J.; and Ha, D. 2024.The AI Scientist: Towards Fully Automated Open-Ended ScientificDiscovery.arXiv preprint arXiv:2408.06292.
  • Ma etal. (2023)Ma, H.; Zhang, C.; Bian, Y.; Liu, L.; Zhang, Z.; Zhao, P.; Zhang, S.; Fu, H.;Hu, Q.; and Wu, B. 2023.Fairness-guided Few-shot Prompting for Large Language Models.In Oh, A.; Naumann, T.; Globerson, A.; Saenko, K.; Hardt, M.; andLevine, S., eds., Advances in Neural Information Processing Systems,volume36, 43136–43155. Curran Associates, Inc.
  • Manerba etal. (2023)Manerba, M.M.; Stańczak, K.; Guidotti, R.; and Augenstein, I. 2023.Social bias probing: Fairness benchmarking for language models.arXiv preprint arXiv:2311.09090.
  • Merchant etal. (2023)Merchant, A.; Batzner, S.; Schoenholz, S.S.; Aykol, M.; Cheon, G.; and Cubuk,E.D. 2023.Scaling deep learning for materials discovery.Nature, 624(7990): 80–85.
  • Navigli, Conia, and Ross (2023)Navigli, R.; Conia, S.; and Ross, B. 2023.Biases in large language models: origins, inventory, and discussion.ACM Journal of Data and Information Quality, 15(2): 1–21.
  • Qureshi etal. (2023)Qureshi, R.; Shaughnessy, D.; Gill, K.A.; Robinson, K.A.; Li, T.; and Agai,E. 2023.Are ChatGPT and large language models “the answer” to bringing uscloser to systematic review automation?Systematic Reviews, 12(1): 72.
  • Romera-Paredes etal. (2024)Romera-Paredes, B.; Barekatain, M.; Novikov, A.; Balog, M.; Kumar, M.P.;Dupont, E.; Ruiz, F.J.; Ellenberg, J.S.; Wang, P.; Fawzi, O.; etal. 2024.Mathematical discoveries from program search with large languagemodels.Nature, 625(7995): 468–475.
  • Smith (2012)Smith, D.R. 2012.Impact factors, scientometrics and the history of citation-basedresearch.Scientometrics, 92(2): 419–427.
  • Srivastava etal. (2022)Srivastava, A.; Rastogi, A.; Rao, A.; Shoeb, A. A.M.; Abid, A.; Fisch, A.;Brown, A.R.; Santoro, A.; Gupta, A.; Garriga-Alonso, A.; etal. 2022.Beyond the imitation game: Quantifying and extrapolating thecapabilities of language models.arXiv preprint arXiv:2206.04615.
  • Susnjak etal. (2024)Susnjak, T.; Hwang, P.; Reyes, N.H.; Barczak, A.L.; McIntosh, T.R.; andRanathunga, S. 2024.Automating research synthesis with domain-specific large languagemodel fine-tuning.arXiv preprint arXiv:2404.08680.
  • Valenzuela-Escarcega, Ha, andEtzioni (2015)Valenzuela-Escarcega, M.A.; Ha, V.A.; and Etzioni, O. 2015.Identifying Meaningful Citations.In AAAI Workshop: Scholarly Big Data.
  • Walters and Wilder (2023)Walters, W.H.; and Wilder, E.I. 2023.Fabrication and errors in the bibliographic citations generated byChatGPT.Scientific Reports, 13(1): 14045.
  • Wang (2014)Wang, J. 2014.Unpacking the Matthew effect in citations.Journal of Informetrics, 8(2): 329–339.
  • Wang etal. (2022)Wang, X.; Wei, J.; Schuurmans, D.; Le, Q.; Chi, E.; Narang, S.; Chowdhery, A.;and Zhou, D. 2022.Self-consistency improves chain of thought reasoning in languagemodels.arXiv preprint arXiv:2203.11171.
  • Wei etal. (2022a)Wei, J.; Tay, Y.; Bommasani, R.; Raffel, C.; Zoph, B.; Borgeaud, S.; Yogatama,D.; Bosma, M.; Zhou, D.; Metzler, D.; etal. 2022a.Emergent abilities of large language models.arXiv preprint arXiv:2206.07682.
  • Wei etal. (2022b)Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q.V.;Zhou, D.; etal. 2022b.Chain-of-thought prompting elicits reasoning in large languagemodels.Advances in neural information processing systems, 35:24824–24837.
  • Yao etal. (2024)Yao, S.; Yu, D.; Zhao, J.; Shafran, I.; Griffiths, T.; Cao, Y.; and Narasimhan,K. 2024.Tree of thoughts: Deliberate problem solving with large languagemodels.Advances in Neural Information Processing Systems, 36.
  • Zhang etal. (2023)Zhang, Y.; Li, Y.; Cui, L.; Cai, D.; Liu, L.; Fu, T.; Huang, X.; Zhao, E.;Zhang, Y.; Chen, Y.; etal. 2023.Siren’s song in the AI ocean: a survey on hallucination in largelanguage models.arXiv preprint arXiv:2309.01219.
  • Zheng etal. (2023)Zheng, Y.; Koh, H.Y.; Ju, J.; Nguyen, A.T.; May, L.T.; Webb, G.I.; and Pan,S. 2023.Large language models for scientific synthesis, inference andexplanation.arXiv preprint arXiv:2310.07984.

Appendix A Appendix

We detail our data processing in Appendix B and show supplementary figures and tables.

Appendix B Data

We describe the steps of our automated pipeline to retrieve all the necessary information for our analysis. Our data collection resulted in 166166166166 papers published at AAAI (25252525), NeurIPS (72727272), ICML (38383838), and ICLR (31313131) for a total of 3,066 references (see Appendix Table LABEL:tab:paper_details for a full list of included papers). The data collection pipeline uses GPT-4-0613 to postprocess parts of the data, which costs approximately 14 dollars for our experiment. Note that these steps only have to be carried out once for the data collection. However, steps 4 and 5 are also used to postprocess and enrich the information of the generated references and will need to be carried out for each run. The experiment was run on 4 November 2023 and each step was manually verified and tested. Besides using GPT-4-0613, we also ran steps 6 and 7 for GPT-4o-2024-05-13 and Claude-3-5-sonnet-20240620 on 27 July 2024.

Step 1. ArXiv

We search for all papers on arXiv originally posted between 1 March 2022 and 31 October 2023 in the machine learning (cs.LG) category which refer to AAAI, NeurIPS, ICLR, or ICML in their journal reference. Note that we also verify whether we can use all these arXiv papers given their data licenses and attribute their participation in Appendix Table LABEL:tab:paper_details. We use keywords (i.e., workshop, tiny paper, 2020, 2021, track on datasets and benchmarks, and bridge) to remove papers that do not appear in the conference proceedings or earlier than 2022. We download and unzip the tar.gz file provided by the authors to arXiv and check whether the paper exists on Semantic Scholar via title matching. We store the title, ID, and date from arXiv and Semantic Scholar. Additionally, we store all the reference titles with their corresponding ID from Semantic Scholar(Kinney etal. 2023).

Step 2. Tex

We check whether there is a main tex file in the unzipped paper folder by looking for a single file that contains \begin{document} and \end{document}. If we find a main tex file we start the cleaning process, otherwise, we exclude the paper from our analysis. The cleaning process consists of three steps. First, we remove everything except for the author information, conference information, abstract, introduction, and references. Second, we remove figures, tables, references to sections and appendices, … Finally, we transform all citations to numbers between square brackets. After the cleaning, we look at whether there is a bib or bbl file available and compile the tex to PDF. If neither file is available or the paper has compilation errors, we exclude the paper from our analysis (Appendix Table C4). Note that a bib file allows for both PDFLatex and bibtex compilation, while only a bbl file does not allow for bibtex compilation. As a consequence papers with only a bbl file may potentially contain papers in their reference list that are not cited in the introduction of the paper. We solve this issue in the next step.

Step 3. PDF

We transform the PDF to txt and split the main content of the paper (author information, conference information, abstract, and introduction) from the references. We then look for all in-text citations by using a regex pattern to capture numbers in between square brackets and match them with the reference list. This approach ensures that we only keep references that are cited in the introduction. We store the main content of the paper and the references cited in the introduction in separate txt files.

Step 4. Postprocessing

A large number of variations and inconsistencies in the reference lists makes it difficult to structurally extract and analyze all the author information, title, publication venue, and year. We noticed that this behavior was even more outspoken in the LLM-generated references. Therefore, we examine the capabilities of GPT-4 to impose a structure on the reference list by postprocessing the data. We feed GPT-4 the reference list in txt accompanied by the default system message: “You are a helpful assistant” and the following postprocessing prompt:

We then store the markdown table in a csv. GPT-4 successfully structures the information and makes it more consistent, for example, by removing syllable hyphens. Sometimes a small hick-up is introduced (e.g., adding a final row with “…”), but these are manually solved in the verification process. Note that we also prompt for the number of authors. While we can easily compute the number of authors via the meta-data from Semantic Scholar, it allows us to verify the accuracy of GPT-4 on this task as we will use it later on to postprocess the generated references where a ground truth may be unavailable.

Step 5. Semantic Scholar

We enrich the information from the introduction references by matching the extracted title from the csv file in the previous step with the reference titles that we extracted from Semantic Scholar in step 1. This approach provides an extra check that GPT-4 does not change the title information in Step 4. After matching, we can use the Semantic Scholar ID to retrieve the publication venue, year, authors, citation count, influential citation count, and reference count(Kinney etal. 2023). Additionally, we store the IDs of the papers to which the introduction references themselves refer.

Step 6. “Vanilla” prompting

We prompt GPT-4-0613 with the main content, which includes the author information, conference information, abstract, and introduction, accompanied by the default system message: “You are a helpful assistant” and the following prompt:

We then post-process GPT-4’s response to extract the title, venue, publication year, author names, and number of authors for each generated reference using the same approach as described in step 4. We repeat this “vanilla” approach five times for all 166166166166 papers.

Step 7. Existence check

We determine whether the generated references exist via title and author names matching with Semantic Scholar entries(Kinney etal. 2023). We search Semantic Scholar for the three best matches based on the reference’s title and then compute the title and author names similarity. For titles, we measure the similarity between the Semantic Scholar match and the generated reference by comparing the best matching substring. For authors, we compare them by splitting into tokens (words), removing duplicates, and then calculating the similarity based on the best partial match of the sets of tokens. In case of “et al.,” we only consider the first author. The similarity is computed by character-level comparison. We determined the thresholds for the title and authors scores by manually labelling 100100100100 matches as true or false and minimizing the false positive rate. We obtain on this sample an accuracy of 95%percent9595\%95 % with 5555 false positives, i.e. generated references falsely classified as non-existent.

Step 8. “Iterative” prompting

We also build on our “vanilla” approach, by introducing an “iterative” approach where we prompt GPT-4-0613 with the main content accompanied by the default system message: “You are a helpful assistant” and the following prompt:

We again postprocess GPT-4’s response using the same approach as described in steps 4, 5, and 7. The previously existing generated and the newly generated references are then merged.

Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias (5)
Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias (6)
Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias (7)
Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias (8)
Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias (9)
Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias (10)
Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias (11)
Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias (12)
Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias (13)
VanillaRun 1Run 2Run 3Run 4
Run 217.90
Run 317.1117.30
Run 417.7316.6916.35
Run 518.2617.7817.0618.44
VanillaGPT-4oClaude 3.5
Run 1Run 2Run 3Run 1Run 2Run 3
Existence74.3473.1071.9290.2589.5383.66
Cited in paper32.1633.2833.7142.0841.5039.26
Cited in introduction24.1526.2325.8134.2333.5331.85
Pairwise Match (PM)10.3410.6210.3918.0417.2414.74
PM for uniquely17.4919.1018.2829.7529.2525.64
ConferenceAuthorsTitle
AAAIJakob Weissteiner, Jakob Heiss, Julien Siems, Sven SeukenBayesian Optimization-based Combinatorial Assignment
AAAIGobinda Saha, Kaushik RoyContinual Learning with Scaled Gradient Projection
AAAIRuizhe Zheng, Jun Li, Yi Wang, Tian Luo, Yuguo YuScatterFormer: Locally-Invariant Scattering Transformer for Patient-Independent Multispectral Detection of Epileptiform Discharges
AAAISahil Manchanda, Sayan RanuLifelong Learning for Neural powered Mixed Integer Programming
AAAIJoris Guérin, Kevin Delmas, Raul Sena Ferreira, Jérémie GuiochetOut-Of-Distribution Detection Is Not All You Need
AAAITaha Belkhouja, Yan Yan, Janardhan Rao DoppaTraining Robust Deep Models for Time-Series Domain: Novel Algorithms and Theoretical Analysis
AAAIMinsoo Kang, Suhyun KimGuidedMixup: An Efficient Mixup Strategy Guided by Saliency Maps
AAAISu Kim, Dongha Lee, SeongKu Kang, Seonghyeon Lee, Hwanjo YuLearning Topology-Specific Experts for Molecular Property Prediction
AAAIDaniel Silver, Tirthak Patel, Devesh TiwariQUILT: Effective Multi-Class Classification on Quantum Computers Using an Ensemble of Diverse Quantum Classifiers
AAAIKevin Osanlou, Jeremy Frank, Andrei Bursuc, Tristan Cazenave, Eric Jacopin, Christophe Guettier, J. BentonSolving Disjunctive Temporal Networks with Uncertainty under Restricted Time-Based Controllability using Tree Search and Graph Neural Networks
AAAIJoar Skalse, Alessandro AbateMisspecification in Inverse Reinforcement Learning
AAAIEdward Ayers, Jonathan Sadeghi, John Redford, Romain Mueller, Puneet K. DokaniaQuery-based Hard-Image Retrieval for Object Detection at Test Time
AAAIShubham Gupta, Sahil Manchanda, Srikanta Bedathur, Sayan RanuTIGGER: Scalable Generative Modelling for Temporal Interaction Graphs
AAAIFanchen Bu, Dong Eui ChangFeedback Gradient Descent: Efficient and Stable Optimization with Orthogonality for DNNs
AAAIShota SaitoHypergraph Modeling via Spectral Embedding Connection: Hypergraph Cut, Weighted Kernel k𝑘kitalic_k-means, and Heat Kernel
AAAIHaoran Luo, Haihong E, Ling Tan, Gengxian Zhou, Tianyu Yao, Kaiyang WanDHGE: Dual-View Hyper-Relational Knowledge Graph Embedding for Link Prediction and Entity Typing
AAAIYujin Kim, Dogyun Park, Dohee Kim, Suhyun KimNaturalInversion: Data-Free Image Synthesis Improving Real-World Consistency
AAAITairan He, Weiye Zhao, Changliu LiuAutoCost: Evolving Intrinsic Cost for Zero-violation Reinforcement Learning
AAAIShijie Liu, Andrew C. Cullen, Paul Montague, Sarah M. Erfani, Benjamin I. P. RubinsteinEnhancing the Antidote: Improved Pointwise Certifications against Poisoning Attacks
AAAIFan Zhou, Chen Pan, Lintao Ma, Yu Liu, Shiyu Wang, James Zhang, Xinxin Zhu, Xuanwei Hu, Yunhua Hu, Yangfei Zheng, Lei Lei, Yun HuSLOTH: Structured Learning and Task-based Optimization for Time Series Forecasting on Hierarchies
AAAIChristopher W. F. Parsonson, Alexandre Laterre, Thomas D. BarrettReinforcement Learning for Branch-and-Bound Optimisation using Retrospective Trajectories
AAAISourya Basu, Prasanna Sattigeri, Karthikeyan Natesan Ramamurthy, Vijil Chenthamarakshan, Kush R. Varshney, Lav R. Varshney, Payel DasEqui-Tuning: Group Equivariant Fine-Tuning of Pretrained Models
AAAIHarry Rubin-Falcone, Joyce Lee, Jenna WiensForecasting with Sparse but Informative Variables: A Case Study in Predicting Blood Glucose
AAAIPierre Le Pelletier de Woillemont, Rémi Labory, Vincent CorrubleAutomated Play-Testing Through RL Based Human-Like Play-Styles Generation
AAAIKai Klede, Leo Schwinn, Dario Zanca, Björn EskofierFastAMI – a Monte Carlo Approach to the Adjustment for Chance in Clustering Comparison Metrics
NeurIPSDhananjay Bhaskar, Kincaid MacDonald, Oluwadamilola Fasina, Dawson Thomas, Bastian Rieck, Ian Adelstein, Smita KrishnaswamyDiffusion Curvature for Estimating Local Curvature in High Dimensional Data
NeurIPSShiro TakagiOn the Effect of Pre-training for Transformer in Different Modality on Offline Reinforcement Learning
NeurIPSYue Yu, Yuchen Zhuang, Jieyu Zhang, Yu Meng, Alexander Ratner, Ranjay Krishna, Jiaming Shen, Chao ZhangLarge Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias
NeurIPSLingfeng Sun, Haichao Zhang, Wei Xu, Masayoshi TomizukaPaCo: Parameter-Compositional Multi-Task Reinforcement Learning
NeurIPSYang Yue, Rui Lu, Bingyi Kang, Shiji Song, Gao HuangUnderstanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL
NeurIPSJiaqi Leng, Yuxiang Peng, Yi-Ling Qiao, Ming Lin, Xiaodi WuDifferentiable Analog Quantum Computing for Optimization and Control
NeurIPSKyriakos Flouris, Ender KonukogluCanonical normalizing flows for manifold learning
NeurIPSYuchen Bai, Jean-Baptiste Durand, Florence Forbes, Grégoire VincentSemantic segmentation of sparse irregular point clouds for leaf wood discrimination
NeurIPSLorenzo Giambagli, Lorenzo Buffoni, Lorenzo Chicchi, Duccio FanelliHow a student becomes a teacher: learning and forgetting through Spectral methods
NeurIPSHanbyul Lee, Qifan Song, Jean HonorioSupport Recovery in Sparse PCA with Incomplete Data
NeurIPSZhang-Wei Hong, Aviral Kumar, Sathwik Karnik, Abhishek Bhandwaldar, Akash Srivastava, Joni Pajarinen, Romain Laroche, Abhishek Gupta, Pulkit AgrawalBeyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets
NeurIPSXiang Zhang, Ziyuan Zhao, Theodoros Tsiligkaridis, Marinka ZitnikSelf-Supervised Contrastive Pre-Training For Time Series via Time-Frequency Consistency
NeurIPSAntonin Schrab, Ilmun Kim, Benjamin Guedj, Arthur GrettonEfficient Aggregated Kernel Tests using Incomplete U𝑈Uitalic_U-statistics
NeurIPSWanyun Cui, Xingran ChenInstance-based Learning for Knowledge Base Completion
NeurIPSAurelien Lucchi, Frank Proske, Antonio Orvieto, Francis Bach, Hans KerstingOn the Theoretical Properties of Noise Correlation in Stochastic Optimization
NeurIPSMinsik Cho, Saurabh Adya, Devang NaikPDP: Parameter-free Differentiable Pruning is All You Need
NeurIPSGuangxi Li, Ruilin Ye, Xuanqiang Zhao, Xin WangConcentration of Data Encoding in Parameterized Quantum Circuits
NeurIPSXinrui Wang, Wenhai Wan, Chuanxin Geng, Shaoyuan LI, Songcan ChenBeyond Myopia: Learning from Positive and Unlabeled Data through Holistic Predictive Trends
NeurIPSZihan Liu, Yun Luo, Lirong Wu, Zicheng Liu, Stan Z. LiTowards Reasonable Budget Allocation in Untargeted Graph Structure Attacks via Gradient Debias
NeurIPSDingfan Chen, Raouf Kerkouche, Mario FritzPrivate Set Generation with Discriminative Information
NeurIPSZhan Yu, Hongshun Yao, Mujin Li, Xin WangPower and limitations of single-qubit native quantum neural networks
NeurIPSIbrahim Alabdulmohsin, Xiaohua Zhai, Alexander Kolesnikov, Lucas BeyerGetting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
NeurIPSManzil Zaheer, Kenneth Marino, Will Grathwohl, John Schultz, Wendy Shang, Sheila Babayan, Arun Ahuja, Ish*ta Dasgupta, Christine Kaeser-Chen, Rob FergusLearning to Navigate Wikipedia by Taking Random Walks
NeurIPSDohyun Kwon, Ying Fan, Kangwook LeeScore-based Generative Modeling Secretly Minimizes the Wasserstein Distance
NeurIPSZhaoqi Li, Lillian Ratliff, Houssam Nassif, Kevin Jamieson, Lalit JainInstance-optimal PAC Algorithms for Contextual Bandits
NeurIPSMasaki Adachi, Satoshi Hayakawa, Martin Jørgensen, Harald Oberhauser, Michael A. OsborneFast Bayesian Inference with Batch Bayesian Quadrature via Kernel Recombination
NeurIPSZhiqin Yang, Yonggang Zhang, Yu Zheng, Xinmei Tian, Hao Peng, Tongliang Liu, Bo HanFedFed: Feature Distillation against Data Heterogeneity in Federated Learning
NeurIPSDaniel Vial, Sujay Sanghavi, Sanjay Shakkottai, R. SrikantMinimax Regret for Cascading Bandits
NeurIPSFabian Zaiser, Andrzej S. Murawski, Luke OngExact Bayesian Inference on Discrete Models via Probability Generating Functions: A Probabilistic Programming Approach
NeurIPSCheng Chi, Amine Mohamed Aboussalah, Elias B. Khalil, Juyoung Wang, Zoha Sherkat-MasoumiA Deep Reinforcement Learning Framework For Column Generation
NeurIPSMathieu Molina, Patrick LoiseauBounding and Approximating Intersectional Fairness through Marginal Fairness
NeurIPSShuai Zhang, Hongkang Li, Meng Wang, Miao Liu, Pin-Yu Chen, Songtao Lu, Sijia Liu, Keerthiram Murugesan, Subhajit ChaudhuryOn the Convergence and Sample Complexity Analysis of Deep Q-Networks with ε𝜀\varepsilonitalic_ε-Greedy Exploration
NeurIPSChanglong Wu, Mohsen Heidari, Ananth Grama, Wojciech SzpankowskiPrecise Regret Bounds for Log-loss via a Truncated Bayesian Algorithm
NeurIPSChing-Yao Chuang, Stefanie JegelkaTree Mover’s Distance: Bridging Graph Metrics and Stability of Graph Neural Networks
NeurIPSFelix Biggs, Antonin Schrab, Arthur GrettonMMD-FUSE: Learning and Combining Kernels for Two-Sample Testing Without Data Splitting
NeurIPSThomas Fel, Victor Boutin, Mazda Moayeri, Rémi Cadène, Louis Bethune, Léo andéol, Mathieu Chalvidal, Thomas SerreA Holistic Approach to Unifying Automatic Concept Extraction and Concept Importance Estimation
NeurIPSManel Baradad, Chun-Fu Chen, Jonas Wulff, Tongzhou Wang, Rogerio Feris, Antonio Torralba, Phillip IsolaProcedural Image Programs for Representation Learning
NeurIPSYang NiBivariate Causal Discovery for Categorical Data via Classification with Optimal Label Permutation
NeurIPSGauthier Guinet, Saurabh Amin, Patrick JailletEffective Dimension in Bandit Problems under Censorship
NeurIPSKyungmin Lee, Jinwoo ShinRenyiCL: Contrastive Representation Learning with Skew Renyi Divergence
NeurIPSYihe Wang, Yu Han, Haishuai Wang, Xiang ZhangContrast Everything: A Hierarchical Contrastive Framework for Medical Time-Series
NeurIPSArtyom Sorokin, Nazar Buzun, Leonid Pugachev, Mikhail BurtsevExplain My Surprise: Learning Efficient Long-Term Memory by Predicting Uncertain Outcomes
NeurIPSYipeng Kang, Tonghan Wang, Xiaoran Wu, Qianlan Yang, Chongjie ZhangNon-Linear Coordination Graphs
NeurIPSNiv Giladi, Shahar Gottlieb, Moran Shkolnik, Asaf Karnieli, Ron Banner, Elad Hoffer, Kfir Yehuda Levy, Daniel SoudryDropCompute: simple and more robust distributed synchronous training via compute variance reduction
NeurIPSJack Richter-Powell, Yaron Lipman, Ricky T. Q. ChenNeural Conservation Laws: A Divergence-Free Perspective
NeurIPSPeide Huang, Mengdi Xu, Jiacheng Zhu, Laixi Shi, Fei Fang, Ding ZhaoCurriculum Reinforcement Learning using Optimal Transport via Gradual Domain Adaptation
NeurIPSMark D. McDonnell, Dong Gong, Amin Parveneh, Ehsan Abbasnejad, Anton van den HengelRanPAC: Random Projections and Pre-trained Models for Continual Learning
NeurIPSHaoyuan Sun, Kwangjun Ahn, Christos Thrampoulidis, Navid AzizanMirror Descent Maximizes Generalized Margin and Can Be Implemented Efficiently
NeurIPSRui M. Castro, Fredrik Hellström, Tim van ErvenAdaptive Selective Sampling for Online Prediction with Experts
NeurIPSTonghan Wang, Paul Dütting, Dmitry Ivanov, Inbal Talgam-Cohen, David C. ParkesDeep Contract Design via Discontinuous Networks
NeurIPSSourya Basu, Pulkit Katdare, Prasanna Sattigeri, Vijil Chenthamarakshan, Katherine Driggs-Campbell, Payel Das, Lav R. VarshneyEfficient Equivariant Transfer Learning from Pretrained Models
NeurIPSQianyi Li, Haim SompolinskyGlobally Gated Deep Linear Networks
NeurIPSJonatha Anselmi, Bruno Gaujal, Louis-Sébastien RebuffiReinforcement Learning in a Birth and Death Process: Breaking the Dependence on the State Space
NeurIPSZhanpeng Zhou, Yongyi Yang, Xiaojiang Yang, Junchi Yan, Wei HuGoing Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity
NeurIPSJinyu Cai, Jicong FanPerturbation Learning Based Anomaly Detection
NeurIPSDan ZhaoCombining Explicit and Implicit Regularization for Efficient Learning in Deep Networks
NeurIPSLeonard Papenmeier, Luigi Nardi, Matthias PoloczekIncreasing the Scope as You Learn: Adaptive Bayesian Optimization in Nested Subspaces
NeurIPSYang Song, Qiyu Kang, Sijie Wang, Zhao Kai, Wee Peng TayOn the Robustness of Graph Neural Diffusion to Topology Perturbations
NeurIPSIbrahim Alabdulmohsin, Behnam Neyshabur, Xiaohua ZhaiRevisiting Neural Scaling Laws in Language and Vision
NeurIPSSalva Rühling Cachay, Bo Zhao, Hailey Joren, Rose YuDYffusion: A Dynamics-informed Diffusion Model for Spatiotemporal Forecasting
NeurIPSIndradyumna Roy, Soumen Chakrabarti, Abir DeMaximum Common Subgraph Guided Graph Retrieval: Late and Early Interaction Networks
NeurIPSDivin Yan, Gengchen Wei, Chen Yang, Shengzhong Zhang, Zengfeng HuangRethinking Semi-Supervised Imbalanced Node Classification from Bias-Variance Decomposition
NeurIPSZhiying Lu, Hongtao Xie, Chuanbin Liu, Yongdong ZhangBridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets
NeurIPSRémi Leluc, François Portier, Johan Segers, Aigerim ZhumanA Quadrature Rule combining Control Variates and Adaptive Importance Sampling
NeurIPSKwangjun Ahn, Xiang Cheng, Hadi Daneshmand, Suvrit SraTransformers learn to implement preconditioned gradient descent for in-context learning
NeurIPSAnnie S. Chen, Archit Sharma, Sergey Levine, Chelsea FinnYou Only Live Once: Single-Life Reinforcement Learning
NeurIPSSen Lin, Daouda Sow, Kaiyi Ji, Yingbin Liang, Ness ShroffNon-Convex Bilevel Optimization with Time-Varying Objective Functions
NeurIPSCarl Hvarfner, Erik Hellsten, Frank Hutter, Luigi NardiSelf-Correcting Bayesian Optimization through Bayesian Active Learning
NeurIPSAbir De, Soumen ChakrabartiNeural Estimation of Submodular Functions with Applications to Differentiable Subset Selection
NeurIPSXiming Lu, Sean Welleck, Jack Hessel, Liwei Jiang, Lianhui Qin, Peter West, Prithviraj Ammanabrolu, Yejin ChoiQuark: Controllable Text Generation with Reinforced Unlearning
NeurIPSWeirui Ye, Pieter Abbeel, Yang GaoSpending Thinking Time Wisely: Accelerating MCTS with Virtual Expansions
NeurIPSAxel Levy, Gordon Wetzstein, Julien Martel, Frederic Poitevin, Ellen D. ZhongAmortized Inference for Heterogeneous Reconstruction in Cryo-EM
ICLRTim Pearce, Tabish Rashid, Anssi Kanervisto, Dave Bignell, Mingfei Sun, Raluca Georgescu, Sergio Valcarcel Macua, Shan Zheng Tan, Ida Momennejad, Katja Hofmann, Sam DevlinImitating Human Behaviour with Diffusion Models
ICLRYi Ren, Shangmin Guo, Wonho Bae, Danica J. SutherlandHow to prepare your task head for finetuning
ICLRKieran A. Murphy, Dani S. BassettInterpretability with full complexity by constraining feature information
ICLRJulius Adebayo, Michael Muelly, Hal Abelson, Been KimPost hoc Explanations may be Ineffective for Detecting Unknown Spurious Correlation
ICLRRoman Levin, Valeriia Cherepanova, Avi Schwarzschild, Arpit Bansal, C. Bayan Bruss, Tom Goldstein, Andrew Gordon Wilson, Micah GoldblumTransfer Learning with Deep Tabular Models
ICLRAviv A. Rosenberg, Sanketh Vedula, Yaniv Romano, Alex M. BronsteinFast Nonlinear Vector Quantile Regression
ICLREdward De Brouwer, Rahul G. KrishnanAnamnesic Neural Differential Equations with Orthogonal Polynomial Projections
ICLRIlya Trofimov, Daniil Cherniavskii, Eduard Tulchinskii, Nikita Balabin, Evgeny Burnaev, Serguei BarannikovLearning Topology-Preserving Data Representations
ICLRTrenton Bricken, Xander Davies, Deepak Singh, Dmitry Krotov, Gabriel KreimanSparse Distributed Memory is a Continual Learner
ICLRSteeven Janny, Aurélien Béneteau, Madiha Nadri, Julie Digne, Nicolas Thome, Christian WolfEagle: Large-Scale Learning of Turbulent Fluid Dynamics with Mesh Transformers
ICLRClement Vignac, Igor Krawczuk, Antoine Siraudin, Bohan Wang, Volkan Cevher, Pascal FrossardDiGress: Discrete Denoising diffusion for graph generation
ICLRXinting Hu, Yulei Niu, Chunyan Miao, Xian-Sheng Hua, Hanwang ZhangOn Non-Random Missing Labels in Semi-Supervised Learning
ICLRJiefeng Chen, Timothy Nguyen, Dilan Gorur, Arslan ChaudhryIs forgetting less a good inductive bias for forward transfer?
ICLRMatthew J. Tilley, Michelle Miller, David J. FreedmanArtificial Neuronal Ensembles with Learned Context Dependent Gating
ICLRZhang-Wei Hong, Tao Chen, Yen-Chen Lin, Joni Pajarinen, Pulkit AgrawalTopological Experience Replay
ICLRLingkai Kong, Yuqing Wang, Molei TaoMomentum Stiefel Optimizer, with Applications to Suitably-Orthogonal Attention, and Optimal Transport
ICLRZhang-Wei Hong, Pulkit Agrawal, Rémi Tachet des Combes, Romain LarocheHarnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting
ICLRMeng Cao, Mehdi Fatemi, Jackie Chi Kit Cheung, Samira ShabanianSystematic Rectification of Language Models via Dead-end Analysis
ICLRWentao Zhang, Yexin Wang, Zhenbang You, Meng Cao, Ping Huang, Jiulong Shan, Zhi Yang, Bin CuiInformation Gain Propagation: a new way to Graph Active Learning with Soft Labels
ICLRAlexandre Perez-Lebel, Marine Le Morvan, Gaël VaroquauxBeyond calibration: estimating the grouping loss of modern neural networks
ICLRJianwen Xie, Yaxuan Zhu, Jun Li, Ping LiA Tale of Two Flows: Cooperative Learning of Langevin Flow and Normalizing Flow Toward Energy-Based Model
ICLRAmrith Setlur, Don Dennis, Benjamin Eysenbach, Aditi Raghunathan, Chelsea Finn, Virginia Smith, Sergey LevineBitrate-Constrained DRO: Beyond Worst Case Robustness To Unknown Group Shifts
ICLRTim Z. Xiao, Robert BamlerTrading Information between Latents in Hierarchical Variational Autoencoders
ICLRZixuan Ke, Yijia Shao, Haowei Lin, Tatsuya Konishi, Gyuhak Kim, Bing LiuContinual Pre-training of Language Models
ICLRZhang-Wei Hong, Ge Yang, Pulkit AgrawalBilinear value networks
ICLRHanrong Ye, Dan XuJoint 2D-3D Multi-Task Learning on Cityscapes-3D: 3D Detection, Segmentation, and Depth Estimation
ICLRMohit Vaishnav, Thomas SerreGAMR: A Guided Attention Model for (visual) Reasoning
ICLRNoam Levi, Itay M. Bloch, Marat Freytsis, Tomer VolanskyNoise Injection Node Regularization for Robust Learning
ICLRPaul F. Jaeger, Carsten T. Lüth, Lukas Klein, Till J. BungertA Call to Reflect on Evaluation Practices for Failure Detection in Image Classification
ICLRThomas M. Sutter, Laura Manduchi, Alain Ryser, Julia E. VogtLearning Group Importance using the Differentiable Hypergeometric Distribution
ICLRAmirEhsan Khorashadizadeh, Anadi Chaman, Valentin Debarnot, Ivan DokmanićFunkNN: Neural Interpolation for Functional Generation
ICMLRamki Gummadi, Saurabh Kumar, Junfeng Wen, Dale SchuurmansA Parametric Class of Approximate Gradient Updates for Policy Optimization
ICMLJoshua P. Zitovsky, Daniel de Marchi, Rishabh Agarwal, Michael R. KosorokRevisiting Bellman Errors for Offline Model Selection
ICMLJiayin Jin, Zeru Zhang, Yang Zhou, Lingfei WuInput-agnostic Certified Group Fairness via Gaussian Parameter Smoothing
ICMLChing-Yao Chuang, Stefanie Jegelka, David Alvarez-MelisInfoOT: Information Maximizing Optimal Transport
ICMLIlgee Hong, Sen Na, Michael W. Mahoney, Mladen KolarConstrained Optimization via Exact Augmented Lagrangian and Randomized Iterative Sketching
ICMLMatthew Fahrbach, Adel Javanmard, Vahab Mirrokni, Pratik WorahLearning Rate Schedules in the Presence of Distribution Shift
ICMLSamuele Marro, Michele LombardiComputational Asymmetries in Robust Classification
ICMLNicolas Chopin, Andras Fulop, Jeremy Heng, Alexandre H. ThieryComputational Doob’s h-transforms for Online Filtering of Discretely Observed Diffusions
ICMLWentao Zhang, Zeang Sheng, Mingyu Yang, Yang Li, Yu Shen, Zhi Yang, Bin CuiNAFS: A Simple yet Tough-to-beat Baseline for Graph Representation Learning
ICMLDisha Shrivastava, Hugo Larochelle, Daniel TarlowRepository-Level Prompt Generation for Large Language Models of Code
ICMLChenlu Ye, Wei Xiong, Quanquan Gu, Tong ZhangCorruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes
ICMLAnas Barakat, Ilyas Fatkhullin, Niao HeReinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space
ICMLAlberto Maria Metelli, Francesco Trovò, Matteo Pirola, Marcello RestelliStochastic Rising Bandits
ICMLIdan Shenfeld, Zhang-Wei Hong, Aviv Tamar, Pulkit AgrawalTGRL: An Algorithm for Teacher Guided Reinforcement Learning
ICMLBenjamin Dupuis, George Deligiannidis, Umut ŞimşekliGeneralization Bounds with Data-dependent Fractal Dimensions
ICMLWanrong Zhang, Ruqi ZhangDP-Fast MH: Private, Fast, and Accurate Metropolis-Hastings for Large-Scale Bayesian Inference
ICMLZixuan Ni, Longhui Wei, Siliang Tang, Yueting Zhuang, Qi TianContinual Vision-Language Representation Learning with Off-Diagonal Information
ICMLManuel Nonnenmacher, Lukas Oldenburg, Ingo Steinwart, David ReebUtilizing Expert Features for Contrastive Learning of Time-Series Representations
ICMLShih-Yang Liu, Zechun Liu, Kwang-Ting ChengOscillation-free Quantization for Low-bit Vision Transformers
ICMLSiqi Liu, Marc Lanctot, Luke Marris, Nicolas HeessSimplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games
ICMLGuanghui Qin, Benjamin Van DurmeNugget: Neural Agglomerative Embeddings of Text
ICMLMarc Härkönen, Markus Lange-Hegermann, Bogdan RaiţăGaussian Process Priors for Systems of Linear Partial Differential Equations with Constant Coefficients
ICMLXiyao Wang, Wichayap*rn Wongkamjan, Furong HuangLive in the Moment: Learning Dynamics Model Adapted to Evolving Policy
ICMLTanvir Islam, Peter WashingtonPersonalized Prediction of Recurrent Stress Events Using Self-Supervised Learning on Multimodal Time-Series Data
ICMLKrishna Pillutla, Ksh*tiz Malik, Abdelrahman Mohamed, Michael Rabbat, Maziar Sanjabi, Lin XiaoFederated Learning with Partial Model Personalization
ICMLJaesik Yoon, Yi-Fu Wu, Heechul Bae, Sungjin AhnAn Investigation into Pre-Training Object-Centric Representations for Reinforcement Learning
ICMLMehrdad Ghadiri, Matthew Fahrbach, Gang Fu, Vahab MirrokniApproximately Optimal Core Shapes for Tensor Decompositions
ICMLArpit Bansal, Ping-yeh Chiang, Michael Curry, Rajiv Jain, Curtis Wigington, Varun Manjunatha, John P Dickerson, Tom GoldsteinCertified Neural Network Watermarks with Randomized Smoothing
ICMLMohamad Amin Mohamadi, Wonho Bae, Danica J. SutherlandA Fast, Well-Founded Approximation to the Empirical Neural Tangent Kernel
ICMLChuyang Ke, Jean HonorioExact Inference in High-order Structured Prediction
ICMLWentao Zhang, Zheyu Lin, Yu Shen, Yang Li, Zhi Yang, Bin CuiDFG-NAS: Deep and Flexible Graph Neural Architecture Search
ICMLTongzhou Wang, Antonio Torralba, Phillip Isola, Amy ZhangOptimal Goal-Reaching Reinforcement Learning via Quasimetric Learning
ICMLYi-Fan Zhang, Xue Wang, Kexin Jin, Kun Yuan, Zhang Zhang, Liang Wang, Rong Jin, Tieniu TanAdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation
ICMLLitian Liang, Yaosheng Xu, Stephen McAleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy FoxReducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks
ICMLMohammed Nowaz Rabbani Chowdhury, Shuai Zhang, Meng Wang, Sijia Liu, Pin-Yu ChenPatch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural Networks
ICMLGal Leibovich, Guy Jacob, Or Avner, Gal Novik, Aviv TamarLearning Control by Iterative Inversion
ICMLJiayin Jin, Jiaxiang Ren, Yang Zhou, Lingjuan Lyu, Ji Liu, Dejing DouAccelerated Federated Learning with Decoupled Adaptive Optimization
ICMLKrzysztof Choromanski, Arijit Sehanobish, Han Lin, Yunfan Zhao, Eli Berger, Tetiana Parshakova, Alvin Pan, David Watkins, Tianyi Zhang, Valerii Likhosherstov, Somnath Basu Roy Chowdhury, Avinava Dubey, Deepali Jain, Tamas Sarlos, Snigdha Chaturvedi, Adrian WellerEfficient Graph Field Integrators Meet Point Clouds
ICMLSimone Parisi, Aravind Rajeswaran, Senthil Purushwalkam, Abhinav GuptaThe Unsurprising Effectiveness of Pre-Trained Vision Models for Control
ConferencePaper Title
AAAINeuro-symbolic Rule Learning in Real-world Classification Tasks
Generalization Bounds for Inductive Matrix Completion in Low-noise Settings
NeurIPSA General Framework for Robust G-Invariance in G-Equivariant Networks
CLIFT: Analysing Natural Distribution Shift on Question Answering Models in Clinical Domain
Partial Counterfactual Identification of Continuous Outcomes with a Curvature Sensitivity Model
Attacks on Online Learners: a Teacher-Student Analysis
Learning Feynman Diagrams using Graph Neural Networks
Function Classes for Identifiable Nonlinear Independent Component Analysis
Blackbox Attacks via Surrogate Ensemble Search
Censored Quantile Regression Neural Networks for Distribution-Free Survival Analysis
Semi-Discrete Normalizing Flows through Differentiable Tessellation
Online Decision Mediation
Exact Generalization Guarantees for (Regularized) Wasserstein Distributionally Robust Models
Deep Learning with Kernels through RKHM and the Perron-Frobenius Operator
Bridging RL Theory and Practice with the Effective Horizon
Reliable learning in challenging environments
ICLRBrain-like representational straightening of natural movies in robust feedforward neural networks
Broken Neural Scaling Laws
Parametrizing Product Shape Manifolds by Composite Networks
Tier Balancing: Towards Dynamic Fairness over Underlying Causal Factors
Guiding continuous operator learning through Physics-based boundary constraints
Scaling Laws For Deep Learning Based Image Reconstruction
Probabilistically Robust Recourse: Navigating the Trade-offs between Costs and Robustness in Algorithmic Recourse
Domain Adaptation via Minimax Entropy for Real/Bogus Classification of Astronomical Alerts
ICMLWhy Target Networks Stabilise Temporal Difference Methods
Nonlinear Advantage: Trained Networks Might Not Be As Complex as You Think
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection

Note: The papers listed are excluded from the analysis due to tex compilation errors, such as bibtex errors.

Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias (2024)

FAQs

What is citation bias? ›

Citation bias occurs when authors preferentially cite research that supports their own findings or claims, or research that showed what they had hoped to find but didn't find in their research.

Which type of citation style is commonly used in language and Humanities disciplines? ›

MLA (Modern Language Association) style is used by the Humanities.

What is an example of a reference bias? ›

Reference bias affects clinically significant regions of the genome. For example, because human leukocyte antigen (HLA) genes have the highest diversity of any region in the genome, HLA genotyping is susceptible to reference bias.

What is one example of a bias? ›

For example, if Joe hires a man for a particular job because he believes that men are better workers than women, he could accurately be described as having a bias against women in the workplace.

Which citation style is used mainly for humanities disciplines? ›

MLA is most often used in the Humanities disciplines including, but not limited to: English Language & Literature. Comparative Literature. Cultural Studies.

What citation method does humanities use? ›

The Notes and bibliography (or Humanities) style uses footnotes or endnotes for in-text citations along with a bibliography at the end. The Author-date style uses parenthetical author-date references for in-text citations and a reference list at the end.

What is the most common citation style for linguistics? ›

citations, linguists usually follow the Author-Date style, where in a work is referred to by the author's last name followed by the year of publication. It is sometimes acceptable to refer to the work of an author, citing only the year of publication immediately after the author.

What is an example of bias in court? ›

This bias can manifest in various forms, such as favoring one party over another based on factors unrelated to the law, displaying partiality towards certain arguments or evidence, or holding preconceived notions that affect the fairness of the proceedings.

What is citation bias in systematic review? ›

If positive studies are more likely to be cited, they may be more likely to be located and, thus, more likely to be included in a systematic review, thus biasing the findings of the review.

What are the 3 types of bias definition? ›

Three types of bias can be distinguished: information bias, selection bias, and confounding. These three types of bias and their potential solutions are discussed using various examples.

What is an example of publication bias? ›

Publication bias occurs when one type of study result is more likely to be published than another. For example, publishing results of studies that show a new treatment provides significant benefits, but not always publishing studies that show less or no benefit, will lead to publication bias.

Top Articles
Dart News - Dartsport Informationen - Startseite
Winmau Steel Darts Simon Whitlock Spezial Special Edition Steeltip Da, 34,95 €
Swissport Timecard
12 Beginner Tips for Raid: Shadow Legends
NO CLUE: deutsche Übersetzung von NCT 127
0.0Gomovies
Umc Webmail
Umc Webmail
Defense Immunity 2K23 Meaning
Halo AU/Crossover Recommendations & Ideas Thread
Leccion 4 Lesson Test
Ups Cc Center
Homepoint Financial Wholesale Login
Craigs List High Rockies
Otr Cross Reference
R Umineko
Craigslist Shelves
Sodexo Northern Portal
Ip Address Issue Nad 3303
Dive into Hearts and Adventure: Top 10 Lexi Heart Books to Experience
Does Publix Sell Sephora Gift Cards
Dash Ag Grid
craigslist: northern MI jobs, apartments, for sale, services, community, and events
Quantumonline
Ohio State Football Wiki
Camwhor*s Bypass 2022
Osrs Toby
Unit 9 Exam Joshua'S Law - dawson
REGULAMENTUL CAMPANIEI "Extra Smart Week" valabil in perioada 12-18 septembrie 2024
What Does Spd2 Mean On Whirlpool Microwave
Matrix Skilled Nursing Login
Red Dragon Fort Mohave Az
Lincoln Access Rewards Redemption
Tulare Lake’s ghostly rebirth brings wonder — and hardship. Inside a community's resilience
Mikayla Campinos: The Rising Star Of EromeCom
15 Best Things to Do in Tulare, CA - Travel Lens
Mesmerized Nyt Crossword
Sound Of Freedom Showtimes Near Cinergy Midland
Craigslist/Lakeland
Vhl Spanish 2 Answer Key
13 The Musical Common Sense Media
Wells Fargo Careers Log In
Thotsbay New Site
Dinar Guru Recaps Updates
Linden Creek Golden Retrievers
Houston Gun Traders
2024 USAF & USSF Almanac: DAF Personnel | Air & Space Forces Magazine
Autozone On 7 Mile And Hubbell
Jami Lafay Gofundme
James in Spanish | Spanish to Go
Perolamartinezts
Navy Qrs Supervisor Answers
Latest Posts
Article information

Author: Patricia Veum II

Last Updated:

Views: 6150

Rating: 4.3 / 5 (44 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Patricia Veum II

Birthday: 1994-12-16

Address: 2064 Little Summit, Goldieton, MS 97651-0862

Phone: +6873952696715

Job: Principal Officer

Hobby: Rafting, Cabaret, Candle making, Jigsaw puzzles, Inline skating, Magic, Graffiti

Introduction: My name is Patricia Veum II, I am a vast, combative, smiling, famous, inexpensive, zealous, sparkling person who loves writing and wants to share my knowledge and understanding with you.